This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain e...This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain environment. For adaptive selection of appropriate ESMs, we generalize an approximate dynamic programming(ADP) framework to the dynamic case. We define the environment model and agent model, respectively. To handle the partially observable challenge, we apply the unsented Kalman filter(UKF) algorithm for belief state estimation. To reduce the computational burden, a simulation-based approach rollout with a redesigned base policy is proposed to approximate the long-term cumulative reward. Meanwhile, Monte Carlo sampling is combined into the rollout to estimate the expectation of the rewards. The experiments indicate that our method outperforms other strategies due to its better performance in larger-scale problems.展开更多
A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource...A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.展开更多
This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynami...This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, the applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the performance index in dynamic programming. The optimal control signal can then be determined by minimizing (or maximizing) the performance index. Artificial neural networks are very efficient tools in representing the performance index in dynamic programming. This paper assumes the use of neural networks for estimating the performance index in dynamic programming and for generating optimal control signals, thus to achieve optimal control through self-learning.展开更多
In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas ...In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas networks.The demands between pipelines and compressor stations are uncertain with a budget parameter,since it is unlikely that all the uncertain demands reach the maximal deviation simultaneously.During solving the two-stage robust model we encounter a bilevel problem which is challenging to solve.We formulate it as a multi-dimensional dynamic programming problem and propose approximate dynamic programming methods to accelerate the calculation.Numerical results based on real network in China show that we obtain a speed gain of 7 times faster in average without compromising optimality compared with original dynamic programming algorithm.Numerical results also verify the advantage of robust model compared with deterministic model when facing uncertainties.These findings offer short-term operation methods for gunbarrel natural gas network management to handle with uncertainties.展开更多
Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking prob...A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.展开更多
An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their...An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.展开更多
This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a tran...This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a transfer-based approximate dynamic programming(TADP)algorithm proposed in this paper.Different from traditional approximate dynamic programming(ADP)algorithms,TADP can obtain the commitment states of most units in advance through a decision transfer technique,thus reducing the action space of TADP significantly.Moreover,compared with traditional ADP algorithms,which require to determine the commitment state of each unit,TADP only needs determine the unit with the smallest on-state probability among all on-state units,thus further reducing the action space.The proposed algorithm can also prevent the iter-ative update of value functions and the reliance on rolling forecast information,which makes more sense in the rolling decision-making process of RSCUC.Finally,nu-merical simulations are carried out on a modified IEEE 39-bus system and a real 2778-bus system to demonstrate the effectiveness of the proposed algorithm.展开更多
We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. Acco...We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the rain-max optimization problem. The off-policy adaptive dynamic programming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton-Jacobi- Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.展开更多
The real-time risk-averse dispatch problem of an integrated electricity and natural gas system(IEGS)is studied in this paper.It is formulated as a real-time conditional value-at-risk(CVaR)-based risk-averse dis-patch ...The real-time risk-averse dispatch problem of an integrated electricity and natural gas system(IEGS)is studied in this paper.It is formulated as a real-time conditional value-at-risk(CVaR)-based risk-averse dis-patch model in the Markov decision process framework.Because of its stochasticity,nonconvexity and nonlinearity,the model is difficult to analyze by traditional algorithms in an acceptable time.To address this non-deterministic polynomial-hard problem,a CVaR-based lookup-table approximate dynamic programming(CVaR-ADP)algo-rithm is proposed,and the risk-averse dispatch problem is decoupled into a series of tractable subproblems.The line pack is used as the state variable to describe the impact of one period’s decision on the future.This facilitates the reduction of load shedding and wind power curtailment.Through the proposed method,real-time decisions can be made according to the current information,while the value functions can be used to overview the whole opti-mization horizon to balance the current cost and future risk loss.Numerical simulations indicate that the pro-posed method can effectively measure and control the risk costs in extreme scenarios.Moreover,the decisions can be made within 10 s,which meets the requirement of the real-time dispatch of an IEGS.Index Terms—Integrated electricity and natural gas system,approximate dynamic programming,real-time dispatch,risk-averse,conditional value-at-risk.展开更多
基金supported by the National Natural Science Foundation of China(6157328561305133)
文摘This paper researches the adaptive scheduling problem of multiple electronic support measures(multi-ESM) in a ground moving radar targets tracking application. It is a sequential decision-making problem in uncertain environment. For adaptive selection of appropriate ESMs, we generalize an approximate dynamic programming(ADP) framework to the dynamic case. We define the environment model and agent model, respectively. To handle the partially observable challenge, we apply the unsented Kalman filter(UKF) algorithm for belief state estimation. To reduce the computational burden, a simulation-based approach rollout with a redesigned base policy is proposed to approximate the long-term cumulative reward. Meanwhile, Monte Carlo sampling is combined into the rollout to estimate the expectation of the rewards. The experiments indicate that our method outperforms other strategies due to its better performance in larger-scale problems.
文摘A stochastic resource allocation model, based on the principles of Markov decision processes(MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations(i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming(DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations,occurs. In particular, an approximate dynamic programming(ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.
基金Supported by the National Science Foundation (U.S.A.) under Grant ECS-0355364
文摘This paper introduces a self-learning control approach based on approximate dynamic programming. Dynamic programming was introduced by Bellman in the 1950's for solving optimal control problems of nonlinear dynamical systems. Due to its high computational complexity, the applications of dynamic programming have been limited to simple and small problems. The key step in finding approximate solutions to dynamic programming is to estimate the performance index in dynamic programming. The optimal control signal can then be determined by minimizing (or maximizing) the performance index. Artificial neural networks are very efficient tools in representing the performance index in dynamic programming. This paper assumes the use of neural networks for estimating the performance index in dynamic programming and for generating optimal control signals, thus to achieve optimal control through self-learning.
基金partially supported by the National Science Foundation of China(Grants 71822105 and 91746210)。
文摘In short-term operation of natural gas network,the impact of demand uncertainty is not negligible.To address this issue we propose a two-stage robust model for power cost minimization problem in gunbarrel natural gas networks.The demands between pipelines and compressor stations are uncertain with a budget parameter,since it is unlikely that all the uncertain demands reach the maximal deviation simultaneously.During solving the two-stage robust model we encounter a bilevel problem which is challenging to solve.We formulate it as a multi-dimensional dynamic programming problem and propose approximate dynamic programming methods to accelerate the calculation.Numerical results based on real network in China show that we obtain a speed gain of 7 times faster in average without compromising optimality compared with original dynamic programming algorithm.Numerical results also verify the advantage of robust model compared with deterministic model when facing uncertainties.These findings offer short-term operation methods for gunbarrel natural gas network management to handle with uncertainties.
文摘Approximate dynamic programming (ADP) is a general and effective approach for solving optimal control and estimation problems by adapting to uncertain and nonconvex environments over time.
基金supported by the National Natural Science Foundation of China(Grant Nos.61034002,61233001,61273140,61304086,and 61374105)the Beijing Natural Science Foundation,China(Grant No.4132078)
文摘A policy iteration algorithm of adaptive dynamic programming(ADP) is developed to solve the optimal tracking control for a class of discrete-time chaotic systems. By system transformations, the optimal tracking problem is transformed into an optimal regulation one. The policy iteration algorithm for discrete-time chaotic systems is first described. Then,the convergence and admissibility properties of the developed policy iteration algorithm are presented, which show that the transformed chaotic system can be stabilized under an arbitrary iterative control law and the iterative performance index function simultaneously converges to the optimum. By implementing the policy iteration algorithm via neural networks,the developed optimal tracking control scheme for chaotic systems is verified by a simulation.
基金supported in part by National Natural Science Foundation of China(61533017,61273140,61304079,61374105,61379099,61233001)Fundamental Research Funds for the Central Universities(FRF-TP-15-056A3)the Open Research Project from SKLMCCS(20150104)
基金Supported by National High Technology Research and Development Program of China (863 Program) (2006AA04Z183), National Nat- ural Science Foundation of China (60621001, 60534010, 60572070, 60774048, 60728307), and the Program for Changjiang Scholars and Innovative Research Groups of China (60728307, 4031002)
文摘An alpha-uniformized Markov chain is defined by the concept of equivalent infinitesimalgenerator for a semi-Markov decision process (SMDP) with both average- and discounted-criteria.According to the relations of their performance measures and performance potentials, the optimiza-tion of an SMDP can be realized by simulating the chain. For the critic model of neuro-dynamicprogramming (NDP), a neuro-policy iteration (NPI) algorithm is presented, and the performanceerror bound is shown as there are approximate error and improvement error in each iteration step.The obtained results may be extended to Markov systems, and have much applicability. Finally, anumerical example is provided.
基金supported in part by the State Key Laboratory of HVDC(No.SKLHVDC-2021-KF-09)in part by the National Natural Science Foundation of China(No.51977081).
文摘This paper studies the rolling security-constrained unit commitment(RSCUC)problem with AC power flow and uncertainties.For this NP-hard problem,it is modeled as a Markov decision process,which is then solved by a transfer-based approximate dynamic programming(TADP)algorithm proposed in this paper.Different from traditional approximate dynamic programming(ADP)algorithms,TADP can obtain the commitment states of most units in advance through a decision transfer technique,thus reducing the action space of TADP significantly.Moreover,compared with traditional ADP algorithms,which require to determine the commitment state of each unit,TADP only needs determine the unit with the smallest on-state probability among all on-state units,thus further reducing the action space.The proposed algorithm can also prevent the iter-ative update of value functions and the reliance on rolling forecast information,which makes more sense in the rolling decision-making process of RSCUC.Finally,nu-merical simulations are carried out on a modified IEEE 39-bus system and a real 2778-bus system to demonstrate the effectiveness of the proposed algorithm.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61304079,61673054,and 61374105)the Fundamental Research Funds for the Central Universities,China(Grant No.FRF-TP-15-056A3)the Open Research Project from SKLMCCS,China(Grant No.20150104)
文摘We develop an optimal tracking control method for chaotic system with unknown dynamics and disturbances. The method allows the optimal cost function and the corresponding tracking control to update synchronously. According to the tracking error and the reference dynamics, the augmented system is constructed. Then the optimal tracking control problem is defined. The policy iteration (PI) is introduced to solve the rain-max optimization problem. The off-policy adaptive dynamic programming (ADP) algorithm is then proposed to find the solution of the tracking Hamilton-Jacobi- Isaacs (HJI) equation online only using measured data and without any knowledge about the system dynamics. Critic neural network (CNN), action neural network (ANN), and disturbance neural network (DNN) are used to approximate the cost function, control, and disturbance. The weights of these networks compose the augmented weight matrix, and the uniformly ultimately bounded (UUB) of which is proven. The convergence of the tracking error system is also proven. Two examples are given to show the effectiveness of the proposed synchronous solution method for the chaotic system tracking problem.
基金supported by State Key Laboratory of HVDC under Grant SKLHVDC-2021-KF-09.
文摘The real-time risk-averse dispatch problem of an integrated electricity and natural gas system(IEGS)is studied in this paper.It is formulated as a real-time conditional value-at-risk(CVaR)-based risk-averse dis-patch model in the Markov decision process framework.Because of its stochasticity,nonconvexity and nonlinearity,the model is difficult to analyze by traditional algorithms in an acceptable time.To address this non-deterministic polynomial-hard problem,a CVaR-based lookup-table approximate dynamic programming(CVaR-ADP)algo-rithm is proposed,and the risk-averse dispatch problem is decoupled into a series of tractable subproblems.The line pack is used as the state variable to describe the impact of one period’s decision on the future.This facilitates the reduction of load shedding and wind power curtailment.Through the proposed method,real-time decisions can be made according to the current information,while the value functions can be used to overview the whole opti-mization horizon to balance the current cost and future risk loss.Numerical simulations indicate that the pro-posed method can effectively measure and control the risk costs in extreme scenarios.Moreover,the decisions can be made within 10 s,which meets the requirement of the real-time dispatch of an IEGS.Index Terms—Integrated electricity and natural gas system,approximate dynamic programming,real-time dispatch,risk-averse,conditional value-at-risk.