Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the rob...Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods.展开更多
Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper intro...Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.展开更多
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral cr...Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral criterion in which the expected rewards, either average or discounted, are maximized. There exists some literature on MDPs that takes risks into account. Much of this addresses the exponential utility (EU) function and mechanisms to penalize different forms of variance of the rewards. EU functions have some numerical deficiencies, while variance measures variability both above and below the mean rewards; the variability above mean rewards is usually beneficial and should not be penalized/avoided. As such, risk metrics that account for pre-specified targets (thresholds) for rewards have been considered in the literature, where the goal is to penalize the risks of revenues falling below those targets. Existing work on MDPs that takes targets into account seeks to minimize risks of this nature. Minimizing risks can lead to poor solutions where the risk is zero or near zero, but the average rewards are also rather low. In this paper, hence, we study a risk-averse criterion, in particular the so-called downside risk, which equals the probability of the revenues falling below a given target, where, in contrast to minimizing such risks, we only reduce this risk at the cost of slightly lowered average rewards. A solution where the risk is low and the average reward is quite high, although not at its maximum attainable value, is very attractive in practice. To be more specific, in our formulation, the objective function is the expected value of the rewards minus a scalar times the downside risk. In this setting, we analyze the infinite horizon MDP, the finite horizon MDP, and the infinite horizon semi-MDP (SMDP). We develop dynamic programming and reinforcement learning algorithms for the finite and infinite horizon. The algorithms are tested in numerical studies and show encouraging performance.展开更多
This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance mi...This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions.展开更多
In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply ...In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%.展开更多
Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiv...Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiveness of mangrove as an ecosystem, and thus the attendant socio-economic and governance ramifications, causes the idea of decision making to become relatively distinct from other decision making process As a result, the purpose of this research was to evaluate the impact that community engagement plays in the decision-making process as it relates to the establishment of governance norms for sustainable mangrove management in Lamu County. In this study, a correlational research design was applied, and the researchers employed a mixed techniques approach. The target population was 296 respondents. The research used questionnaires and interviews to collect data. A descriptive statistical technique was utilized to perform an inspection and analysis on the data that was gathered. The findings indicated that having awareness about governance standards is beneficial during the process of making decisions. In addition, the findings demonstrated that respondents had the impression that the decision-making process was not done properly. On the other hand, the participants pointed out the positive aspects of the decision-making process and agreed that the participation of both gender was essential for the sustainable management of mangroves. Based on these data, it appeared that full community engagement in decision-making is necessary for sustainable management of mangrove forests.展开更多
The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most cr...The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most critical design and budgetary decisions-shaping the essential traits of the project,hence emerge the need and necessity to create and integrate mechanisms to support the decision-making process.Design decisions should not be based on assumptions,past experiences,or imagination.An example of the numerous problems that are a result of uninformed design decisions is“change orders”,known as the deviation from the original scope of work,which leads to an increase of the overall cost,and changes to the construction schedule of the project.The long-term aim of this inquiry is to understand the user’s behavior,and establish evidence-based control measures,which are actions and processes that can be implemented in practice to decrease the volume and frequency of the occurrence of change orders.The current study developed a foundation for further examination by proposing potential control measures,and testing their efficiency,such as integrating Virtual Reality(VR).The specific aim was to examine the effect of different visualization methods(i.e.,VR vs.construction drawings)on,(1)how well the subjects understand the information presented about the future/planned environment;(2)the subjects’perceived confidence in what the future environment will look like;(3)the likelihood of changing the built environment;(4)design review time;and(5)accuracy in reviewing and understanding the design.展开更多
A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Consideri...A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs.展开更多
Alunite is the most important non bauxite resource for alumina. Various methods have been proposed and patented for processing alunite, but none has been performed at industrial scale and no technical,operational and ...Alunite is the most important non bauxite resource for alumina. Various methods have been proposed and patented for processing alunite, but none has been performed at industrial scale and no technical,operational and economic data is available to evaluate methods. In addition, selecting the right approach for alunite beneficiation, requires introducing a wide range of criteria and careful analysis of alternatives.In this research, after studying the existing processes, 13 methods were considered and evaluated by 14 technical, economic and environmental analyzing criteria. Due to multiplicity of processing methods and attributes, in this paper, Multi Attribute Decision Making methods were employed to examine the appropriateness of choices. The Delphi Analytical Hierarchy Process(DAHP) was used for weighting selection criteria and Fuzzy TOPSIS approach was used to determine the most profitable candidates. Among 13 studied methods, Spanish, Svoronos and Hazan methods were respectively recognized to be the best choices.展开更多
Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of s...Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of software systems,they have seen an increasing level of interest and are covering a variety of applications,e.g.,autonomous car systems and adaptive network systems.Many approaches for the construction of self-adaptive systems have been developed,and probabilistic models,such as Markov decision processes(MDPs),are one of the favoured.However,the majority of them do not deal with the problems of the underlying MDP being obsolete under new environments or unsatisfactory to the given properties.This results in the generated policies from such MDP failing to guide the self-adaptive system to run correctly and meet goals.In this article,we propose a systematic approach to updating an obsolete MDP by exploring new states and transitions and removing obsolete ones,and repairing an unsatisfactory MDP by adjusting its structure in a more meaningful way rather than arbitrarily changing the transition probabilities to values not in line with reality.Experimental results show that the MDPs updated and repaired by our approach are more competent in guiding the self-adaptive systems’correct running compared with the original ones.展开更多
Decision in reality often have the characteristic of hierarchy because of the hierarchy of an organization's structure. In this paper, we propose a two-level hierarchic Markov decision model that considers the intera...Decision in reality often have the characteristic of hierarchy because of the hierarchy of an organization's structure. In this paper, we propose a two-level hierarchic Markov decision model that considers the interactions of agents in different levels and different time scales of levels. A backward induction algo- rithm is given for the model to solve the optimal policy of finite stage hierarchic decision problem. The proposed model and its algorithm are illustrated with an example about two-level hierar- chical decision problem of infrastructure maintenance. The opti- mal policy of the example is solved and the impacts of interactions between levels on decision making are analyzed.展开更多
An AI-aided simulation system embedded in a model-based, aspiration-led decision support system NY-IEDSS is reported. The NY-IEDSS is designed for mid-term development strategic study of the Nanyang Region in Henan, C...An AI-aided simulation system embedded in a model-based, aspiration-led decision support system NY-IEDSS is reported. The NY-IEDSS is designed for mid-term development strategic study of the Nanyang Region in Henan, China, and is getting beyond its prototype stage under the decision maker's (the end user) orientation. The integration of simulation model system, decision analysis and expert system for decision support in the system implementation was reviewed. The intent of the paper is to provide insight as to how system capability and acceptability can be enhanced by this integration. Moreover, emphasis is placed on problem orientation in applying the method.展开更多
The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenario...The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenarios in the system,and scenario expectation to formulate the expected total discounted reward of a policy.We establish a new framework named as double-factored Markov decision process(DFMDP),in which the physical state and scenario belief are shown to be the double factors serving as the sufficient statistics for the history of the decision process.Four classes of policies for the finite horizon DFMDPs are studied and it is shown that there exists a double-factored Markovian deterministic policy which is optimal among all policies.We also formulate the infinite horizon DFMDPs and present its optimality equation in this paper.An exact solution method named as double-factored backward induction for the finite horizon DFMDPs is proposed.It is utilized to find the optimal policies for the numeric examples and then compared with policies derived from other methods from the related literatures.展开更多
1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused...1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused on the single-constraint issue,overlooking the more common multi-constraint setting which involves extensive computations and combinatorial optimization of multiple Lagrange multipliers.展开更多
The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant ...The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants.展开更多
This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved pr...This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.展开更多
An integrated CAD/CAPP/CAM system of tube manufacturing based on integration frame is presented. In this system, two kinds of data conventions describing tube shape are presented in tube CAD subsystem, the object-orie...An integrated CAD/CAPP/CAM system of tube manufacturing based on integration frame is presented. In this system, two kinds of data conventions describing tube shape are presented in tube CAD subsystem, the object-oriented concept and the goal-driven inference mechanism have been applied in the development of the knowledge-based CAPP subsystem and simulation of tube processing under tube bending simulation subsystem is performed based on the tube model's piecewise representation. A tube product case is considered to give the application of the integrated system, and the advantages of the system in the use of tube bending are revealed.展开更多
Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithm...Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.展开更多
基金Supported by the National Natural Science Foundation of China(71571019).
文摘Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities.In practice,some transition probabilities may be uncertain.The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities.Our research yields powerful contributions for Markov decision processes(MDPs)with uncertain transition probabilities.We first propose a method for estimating unknown transition probabilities based on maximum likelihood.Since the estimation may be far from accurate,and the highest expected total reward of the MDP may be sensitive to these transition probabilities,we analyze the robustness of an optimal policy and propose an approach for robust analysis.After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers,we formulate a model to obtain the optimal policy.Finally,we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds.Numerical examples are given to show the practicability of our methods.
基金partly supported by National Key R&D Program of China(No.2018YFA0306701)the Australian Research Council(Nos.DP160101652 and DP180100691)+1 种基金National Natural Science Foundation of China(No.61832015)the Key Research Program of Frontier Sciences,Chinese Academy of Sciences。
文摘Markov decision process(MDP)offers a general framework for modelling sequential decision making where outcomes are random.In particular,it serves as a mathematical framework for reinforcement learning.This paper introduces an extension of MDP,namely quantum MDP(q MDP),that can serve as a mathematical model of decision making about quantum systems.We develop dynamic programming algorithms for policy evaluation and finding optimal policies for q MDPs in the case of finite-horizon.The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.
文摘Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for stochastic discrete- event systems driven by Markov chains. Much of the literature focusses on the risk-neutral criterion in which the expected rewards, either average or discounted, are maximized. There exists some literature on MDPs that takes risks into account. Much of this addresses the exponential utility (EU) function and mechanisms to penalize different forms of variance of the rewards. EU functions have some numerical deficiencies, while variance measures variability both above and below the mean rewards; the variability above mean rewards is usually beneficial and should not be penalized/avoided. As such, risk metrics that account for pre-specified targets (thresholds) for rewards have been considered in the literature, where the goal is to penalize the risks of revenues falling below those targets. Existing work on MDPs that takes targets into account seeks to minimize risks of this nature. Minimizing risks can lead to poor solutions where the risk is zero or near zero, but the average rewards are also rather low. In this paper, hence, we study a risk-averse criterion, in particular the so-called downside risk, which equals the probability of the revenues falling below a given target, where, in contrast to minimizing such risks, we only reduce this risk at the cost of slightly lowered average rewards. A solution where the risk is low and the average reward is quite high, although not at its maximum attainable value, is very attractive in practice. To be more specific, in our formulation, the objective function is the expected value of the rewards minus a scalar times the downside risk. In this setting, we analyze the infinite horizon MDP, the finite horizon MDP, and the infinite horizon semi-MDP (SMDP). We develop dynamic programming and reinforcement learning algorithms for the finite and infinite horizon. The algorithms are tested in numerical studies and show encouraging performance.
基金supported by the National Natural Science Foundation of China(10801056)the Natural Science Foundation of Ningbo(2010A610094)
文摘This paper studies the limit average variance criterion for continuous-time Markov decision processes in Polish spaces. Based on two approaches, this paper proves not only the existence of solutions to the variance minimization optimality equation and the existence of a variance minimal policy that is canonical, but also the existence of solutions to the two variance minimization optimality inequalities and the existence of a variance minimal policy which may not be canonical. An example is given to illustrate all of our conditions.
文摘In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%.
文摘Decision-making is the process of deciding between two or more options in order to take the most appropriate and successful course of action in order to achieve sustainable mangrove management. However, the distinctiveness of mangrove as an ecosystem, and thus the attendant socio-economic and governance ramifications, causes the idea of decision making to become relatively distinct from other decision making process As a result, the purpose of this research was to evaluate the impact that community engagement plays in the decision-making process as it relates to the establishment of governance norms for sustainable mangrove management in Lamu County. In this study, a correlational research design was applied, and the researchers employed a mixed techniques approach. The target population was 296 respondents. The research used questionnaires and interviews to collect data. A descriptive statistical technique was utilized to perform an inspection and analysis on the data that was gathered. The findings indicated that having awareness about governance standards is beneficial during the process of making decisions. In addition, the findings demonstrated that respondents had the impression that the decision-making process was not done properly. On the other hand, the participants pointed out the positive aspects of the decision-making process and agreed that the participation of both gender was essential for the sustainable management of mangroves. Based on these data, it appeared that full community engagement in decision-making is necessary for sustainable management of mangrove forests.
文摘The design process of the built environment relies on the collaborative effort of all parties involved in the project.During the design phase,owners,end users,and their representatives are expected to make the most critical design and budgetary decisions-shaping the essential traits of the project,hence emerge the need and necessity to create and integrate mechanisms to support the decision-making process.Design decisions should not be based on assumptions,past experiences,or imagination.An example of the numerous problems that are a result of uninformed design decisions is“change orders”,known as the deviation from the original scope of work,which leads to an increase of the overall cost,and changes to the construction schedule of the project.The long-term aim of this inquiry is to understand the user’s behavior,and establish evidence-based control measures,which are actions and processes that can be implemented in practice to decrease the volume and frequency of the occurrence of change orders.The current study developed a foundation for further examination by proposing potential control measures,and testing their efficiency,such as integrating Virtual Reality(VR).The specific aim was to examine the effect of different visualization methods(i.e.,VR vs.construction drawings)on,(1)how well the subjects understand the information presented about the future/planned environment;(2)the subjects’perceived confidence in what the future environment will look like;(3)the likelihood of changing the built environment;(4)design review time;and(5)accuracy in reviewing and understanding the design.
基金partially supported by Nation Science Foundation of China (61661025, 61661026)Foundation of A hundred Youth Talents Training Program of Lanzhou Jiaotong University (152022)
文摘A network selection optimization algorithm based on the Markov decision process(MDP)is proposed so that mobile terminals can always connect to the best wireless network in a heterogeneous network environment.Considering the different types of service requirements,the MDP model and its reward function are constructed based on the quality of service(QoS)attribute parameters of the mobile users,and the network attribute weights are calculated by using the analytic hierarchy process(AHP).The network handoff decision condition is designed according to the different types of user services and the time-varying characteristics of the network,and the MDP model is solved by using the genetic algorithm and simulated annealing(GA-SA),thus,users can seamlessly switch to the network with the best long-term expected reward value.Simulation results show that the proposed algorithm has good convergence performance,and can guarantee that users with different service types will obtain satisfactory expected total reward values and have low numbers of network handoffs.
文摘Alunite is the most important non bauxite resource for alumina. Various methods have been proposed and patented for processing alunite, but none has been performed at industrial scale and no technical,operational and economic data is available to evaluate methods. In addition, selecting the right approach for alunite beneficiation, requires introducing a wide range of criteria and careful analysis of alternatives.In this research, after studying the existing processes, 13 methods were considered and evaluated by 14 technical, economic and environmental analyzing criteria. Due to multiplicity of processing methods and attributes, in this paper, Multi Attribute Decision Making methods were employed to examine the appropriateness of choices. The Delphi Analytical Hierarchy Process(DAHP) was used for weighting selection criteria and Fuzzy TOPSIS approach was used to determine the most profitable candidates. Among 13 studied methods, Spanish, Svoronos and Hazan methods were respectively recognized to be the best choices.
基金supported by the National Natural Science Foundation of China under Grant Nos.61802179,61972193 and 61972197the Fundamental Research Funds for the Central Universities of China under Grant No.NS2021069the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20201292.
文摘Self-adaptive systems are able to adjust their behaviour in response to environmental condition changes and are widely deployed as Internetwares.Considered as a promising way to handle the ever-growing complexity of software systems,they have seen an increasing level of interest and are covering a variety of applications,e.g.,autonomous car systems and adaptive network systems.Many approaches for the construction of self-adaptive systems have been developed,and probabilistic models,such as Markov decision processes(MDPs),are one of the favoured.However,the majority of them do not deal with the problems of the underlying MDP being obsolete under new environments or unsatisfactory to the given properties.This results in the generated policies from such MDP failing to guide the self-adaptive system to run correctly and meet goals.In this article,we propose a systematic approach to updating an obsolete MDP by exploring new states and transitions and removing obsolete ones,and repairing an unsatisfactory MDP by adjusting its structure in a more meaningful way rather than arbitrarily changing the transition probabilities to values not in line with reality.Experimental results show that the MDPs updated and repaired by our approach are more competent in guiding the self-adaptive systems’correct running compared with the original ones.
基金Supported by the National Natural Science Foundation of China (70971048)
文摘Decision in reality often have the characteristic of hierarchy because of the hierarchy of an organization's structure. In this paper, we propose a two-level hierarchic Markov decision model that considers the interactions of agents in different levels and different time scales of levels. A backward induction algo- rithm is given for the model to solve the optimal policy of finite stage hierarchic decision problem. The proposed model and its algorithm are illustrated with an example about two-level hierar- chical decision problem of infrastructure maintenance. The opti- mal policy of the example is solved and the impacts of interactions between levels on decision making are analyzed.
文摘An AI-aided simulation system embedded in a model-based, aspiration-led decision support system NY-IEDSS is reported. The NY-IEDSS is designed for mid-term development strategic study of the Nanyang Region in Henan, China, and is getting beyond its prototype stage under the decision maker's (the end user) orientation. The integration of simulation model system, decision analysis and expert system for decision support in the system implementation was reviewed. The intent of the paper is to provide insight as to how system capability and acceptability can be enhanced by this integration. Moreover, emphasis is placed on problem orientation in applying the method.
基金supported by the(United States)National Science Foundation(No.1409214)。
文摘The double-factored decision theory for Markov decision processes with multiple scenarios of the parameters is proposed in this article.We introduce scenario belief to describe the probability distribution of scenarios in the system,and scenario expectation to formulate the expected total discounted reward of a policy.We establish a new framework named as double-factored Markov decision process(DFMDP),in which the physical state and scenario belief are shown to be the double factors serving as the sufficient statistics for the history of the decision process.Four classes of policies for the finite horizon DFMDPs are studied and it is shown that there exists a double-factored Markovian deterministic policy which is optimal among all policies.We also formulate the infinite horizon DFMDPs and present its optimality equation in this paper.An exact solution method named as double-factored backward induction for the finite horizon DFMDPs is proposed.It is utilized to find the optimal policies for the numeric examples and then compared with policies derived from other methods from the related literatures.
基金supported by the Fundamental Research Funds for the Central Universities(No.2023JBZX011)the Aeronautical Science Foundation of China(No.202300010M5001).
文摘1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused on the single-constraint issue,overlooking the more common multi-constraint setting which involves extensive computations and combinatorial optimization of multiple Lagrange multipliers.
基金supported by the National Key Research and Development Program of China,Grant No.2020YFB0905900.
文摘The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants.
基金partially supported by the National Key Research and Development Program of the Ministry of Science and Technology of China(2022YFE0114200)the National Natural Science Foundation of China(U20A6004).
文摘This paper investigates a distributed heterogeneous hybrid blocking flow-shop scheduling problem(DHHBFSP)designed to minimize the total tardiness and total energy consumption simultaneously,and proposes an improved proximal policy optimization(IPPO)method to make real-time decisions for the DHHBFSP.A multi-objective Markov decision process is modeled for the DHHBFSP,where the reward function is represented by a vector with dynamic weights instead of the common objectiverelated scalar value.A factory agent(FA)is formulated for each factory to select unscheduled jobs and is trained by the proposed IPPO to improve the decision quality.Multiple FAs work asynchronously to allocate jobs that arrive randomly at the shop.A two-stage training strategy is introduced in the IPPO,which learns from both single-and dual-policy data for better data utilization.The proposed IPPO is tested on randomly generated instances and compared with variants of the basic proximal policy optimization(PPO),dispatch rules,multi-objective metaheuristics,and multi-agent reinforcement learning methods.Extensive experimental results suggest that the proposed strategies offer significant improvements to the basic PPO,and the proposed IPPO outperforms the state-of-the-art scheduling methods in both convergence and solution quality.
基金Sponsored bythe Ministerial Level Research Foundation(T29483939)
文摘An integrated CAD/CAPP/CAM system of tube manufacturing based on integration frame is presented. In this system, two kinds of data conventions describing tube shape are presented in tube CAD subsystem, the object-oriented concept and the goal-driven inference mechanism have been applied in the development of the knowledge-based CAPP subsystem and simulation of tube processing under tube bending simulation subsystem is performed based on the tube model's piecewise representation. A tube product case is considered to give the application of the integrated system, and the advantages of the system in the use of tube bending are revealed.
文摘Aim To investigate the model free multi step average reward reinforcement learning algorithm. Methods By combining the R learning algorithms with the temporal difference learning (TD( λ ) learning) algorithms for average reward problems, a novel incremental algorithm, called R( λ ) learning, was proposed. Results and Conclusion The proposed algorithm is a natural extension of the Q( λ) learning, the multi step discounted reward reinforcement learning algorithm, to the average reward cases. Simulation results show that the R( λ ) learning with intermediate λ values makes significant performance improvement over the simple R learning.