Grasping is one of the most fundamental operations in modern robotics applications.While deep rein-forcement learning(DRL)has demonstrated strong potential in robotics,there is too much emphasis on maximizing the cumu...Grasping is one of the most fundamental operations in modern robotics applications.While deep rein-forcement learning(DRL)has demonstrated strong potential in robotics,there is too much emphasis on maximizing the cumulative reward in executing tasks,and the potential safety risks are often ignored.In this paper,an optimization method based on safe reinforcement learning(Safe RL)is proposed to address the robotic grasping problem under safety constraints.Specifically,considering the obstacle avoidance constraints of the system,the grasping problem of the manipulator is modeled as a Constrained Markov Decision Process(CMDP).The Lagrange multiplier and a dynamic weighted mechanism are introduced into the Proximal Policy Optimization(PPO)framework,leading to the development of the dynamic weighted Lagrange PPO(DWL-PPO)algorithm.The behavior of violating safety constraints is punished while the policy is optimized in this proposed method.In addition,the orientation control of the end-effector is included in the reward function,and a compound reward function adapted to changes in pose is designed.Ultimately,the efficacy and advantages of the suggested method are proved by extensive training and testing in the Pybullet simulator.The results of grasping experiments reveal that the recommended approach provides superior safety and efficiency compared with other advanced RL methods and achieves a good trade-off between model learning and risk aversion.展开更多
This study investigates a safe reinforcement learning algorithm for grid-forming(GFM)inverter based frequency regulation.To guarantee the stability of the inverter-based resource(IBR)system under the learned control p...This study investigates a safe reinforcement learning algorithm for grid-forming(GFM)inverter based frequency regulation.To guarantee the stability of the inverter-based resource(IBR)system under the learned control policy,a modelbased reinforcement learning(MBRL)algorithm is combined with Lyapunov approach,which determines the safe region of states and actions.To obtain near optimal control policy,the control performance is safely improved by approximate dynamic programming(ADP)using data sampled from the region of attraction(ROA).Moreover,to enhance the control robustness against parameter uncertainty in the inverter,a Gaussian process(GP)model is adopted by the proposed algorithm to effectively learn system dynamics from measurements.Numerical simulations validate the effectiveness of the proposed algorithm.展开更多
The integration of distributed energy resources(DERs)has escalated the challenge of voltage magnitude regulation in distribution networks.Model-based approaches,which rely on complex sequential mathematical formulatio...The integration of distributed energy resources(DERs)has escalated the challenge of voltage magnitude regulation in distribution networks.Model-based approaches,which rely on complex sequential mathematical formulations,cannot meet the real-time demand.Deep reinforcement learning(DRL)offers an alternative by utilizing offline training with distribution network simulators and then executing online without computation.However,DRL algorithms fail to enforce voltage magnitude constraints during training and testing,potentially leading to serious operational violations.To tackle these challenges,we introduce a novel safe-guaranteed reinforcement learning algorithm,the Dist Flow safe reinforcement learning(DF-SRL),designed specifically for real-time voltage magnitude regulation in distribution networks.The DF-SRL algorithm incorporates a Dist Flow linearization to construct an expert-knowledge-based safety layer.Subsequently,the DF-SRL algorithm overlays this safety layer on top of the agent policy,recalibrating unsafe actions to safe domains through a quadratic programming formulation.Simulation results show the DF-SRL algorithm consistently ensures voltage magnitude constraints during training and real-time operation(test)phases,achieving faster convergence and higher performance,which differentiates it apart from(safe)DRL benchmark algorithms.展开更多
This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-f...This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.展开更多
In recent years,reinforcement learning(RL)has emerged as a solution for model-free dynamic programming problem that cannot be effectively solved by traditional optimization methods.It has gradually been applied in the...In recent years,reinforcement learning(RL)has emerged as a solution for model-free dynamic programming problem that cannot be effectively solved by traditional optimization methods.It has gradually been applied in the fields such as economic dispatch of power systems due to its strong selflearning and self-optimizing capabilities.However,existing economic scheduling methods based on RL ignore security risks that the agent may bring during exploration,which poses a risk of issuing instructions that threaten the safe operation of power system.Therefore,we propose an improved proximal policy optimization algorithm for sequential security-constrained optimal power flow(SCOPF)based on expert knowledge and safety layer to determine active power dispatch strategy,voltage optimization scheme of the units,and charging/discharging dispatch of energy storage systems.The expert experience is introduced to improve the ability to enforce constraints such as power balance in training process while guiding agent to effectively improve the utilization rate of renewable energy.Additionally,to avoid line overload,we add a safety layer at the end of the policy network by introducing transmission constraints to avoid dangerous actions and tackle sequential SCOPF problem.Simulation results on an improved IEEE 118-bus system verify the effectiveness of the proposed algorithm.展开更多
文摘Grasping is one of the most fundamental operations in modern robotics applications.While deep rein-forcement learning(DRL)has demonstrated strong potential in robotics,there is too much emphasis on maximizing the cumulative reward in executing tasks,and the potential safety risks are often ignored.In this paper,an optimization method based on safe reinforcement learning(Safe RL)is proposed to address the robotic grasping problem under safety constraints.Specifically,considering the obstacle avoidance constraints of the system,the grasping problem of the manipulator is modeled as a Constrained Markov Decision Process(CMDP).The Lagrange multiplier and a dynamic weighted mechanism are introduced into the Proximal Policy Optimization(PPO)framework,leading to the development of the dynamic weighted Lagrange PPO(DWL-PPO)algorithm.The behavior of violating safety constraints is punished while the policy is optimized in this proposed method.In addition,the orientation control of the end-effector is included in the reward function,and a compound reward function adapted to changes in pose is designed.Ultimately,the efficacy and advantages of the suggested method are proved by extensive training and testing in the Pybullet simulator.The results of grasping experiments reveal that the recommended approach provides superior safety and efficiency compared with other advanced RL methods and achieves a good trade-off between model learning and risk aversion.
基金funded in part by the CURENT Research Center and in part by the National Science Foundation(NSF)(No.ECCS-2033910)。
文摘This study investigates a safe reinforcement learning algorithm for grid-forming(GFM)inverter based frequency regulation.To guarantee the stability of the inverter-based resource(IBR)system under the learned control policy,a modelbased reinforcement learning(MBRL)algorithm is combined with Lyapunov approach,which determines the safe region of states and actions.To obtain near optimal control policy,the control performance is safely improved by approximate dynamic programming(ADP)using data sampled from the region of attraction(ROA).Moreover,to enhance the control robustness against parameter uncertainty in the inverter,a Gaussian process(GP)model is adopted by the proposed algorithm to effectively learn system dynamics from measurements.Numerical simulations validate the effectiveness of the proposed algorithm.
基金part of the DATALESs project(with project number 482.20.602)jointly financed by the Netherlands Organization for Scientific Research(NWO)the National Natural Science Foundation of China(NSFC)。
文摘The integration of distributed energy resources(DERs)has escalated the challenge of voltage magnitude regulation in distribution networks.Model-based approaches,which rely on complex sequential mathematical formulations,cannot meet the real-time demand.Deep reinforcement learning(DRL)offers an alternative by utilizing offline training with distribution network simulators and then executing online without computation.However,DRL algorithms fail to enforce voltage magnitude constraints during training and testing,potentially leading to serious operational violations.To tackle these challenges,we introduce a novel safe-guaranteed reinforcement learning algorithm,the Dist Flow safe reinforcement learning(DF-SRL),designed specifically for real-time voltage magnitude regulation in distribution networks.The DF-SRL algorithm incorporates a Dist Flow linearization to construct an expert-knowledge-based safety layer.Subsequently,the DF-SRL algorithm overlays this safety layer on top of the agent policy,recalibrating unsafe actions to safe domains through a quadratic programming formulation.Simulation results show the DF-SRL algorithm consistently ensures voltage magnitude constraints during training and real-time operation(test)phases,achieving faster convergence and higher performance,which differentiates it apart from(safe)DRL benchmark algorithms.
基金supported by the National Natural Science Foundation of China(No.62002016)the Science and Technology Development Fund,Macao S.A.R.(No.0137/2019/A3)+1 种基金the Beijing Natural Science Foundation(No.9204028)the Guangdong Basic and Applied Basic Research Foundation(No.2019A1515111165)。
文摘This paper develops deep reinforcement learning(DRL)algorithms for optimizing the operation of home energy system which consists of photovoltaic(PV)panels,battery energy storage system,and household appliances.Model-free DRL algorithms can efficiently handle the difficulty of energy system modeling and uncertainty of PV generation.However,discretecontinuous hybrid action space of the considered home energy system challenges existing DRL algorithms for either discrete actions or continuous actions.Thus,a mixed deep reinforcement learning(MDRL)algorithm is proposed,which integrates deep Q-learning(DQL)algorithm and deep deterministic policy gradient(DDPG)algorithm.The DQL algorithm deals with discrete actions,while the DDPG algorithm handles continuous actions.The MDRL algorithm learns optimal strategy by trialand-error interactions with the environment.However,unsafe actions,which violate system constraints,can give rise to great cost.To handle such problem,a safe-MDRL algorithm is further proposed.Simulation studies demonstrate that the proposed MDRL algorithm can efficiently handle the challenge from discrete-continuous hybrid action space for home energy management.The proposed MDRL algorithm reduces the operation cost while maintaining the human thermal comfort by comparing with benchmark algorithms on the test dataset.Moreover,the safe-MDRL algorithm greatly reduces the loss of thermal comfort in the learning stage by the proposed MDRL algorithm.
基金supported in part by National Natural Science Foundation of China(No.52077076)in part by the National Key R&D Plan(No.2021YFB2601502)。
文摘In recent years,reinforcement learning(RL)has emerged as a solution for model-free dynamic programming problem that cannot be effectively solved by traditional optimization methods.It has gradually been applied in the fields such as economic dispatch of power systems due to its strong selflearning and self-optimizing capabilities.However,existing economic scheduling methods based on RL ignore security risks that the agent may bring during exploration,which poses a risk of issuing instructions that threaten the safe operation of power system.Therefore,we propose an improved proximal policy optimization algorithm for sequential security-constrained optimal power flow(SCOPF)based on expert knowledge and safety layer to determine active power dispatch strategy,voltage optimization scheme of the units,and charging/discharging dispatch of energy storage systems.The expert experience is introduced to improve the ability to enforce constraints such as power balance in training process while guiding agent to effectively improve the utilization rate of renewable energy.Additionally,to avoid line overload,we add a safety layer at the end of the policy network by introducing transmission constraints to avoid dangerous actions and tackle sequential SCOPF problem.Simulation results on an improved IEEE 118-bus system verify the effectiveness of the proposed algorithm.