An integrated energy system includes multiple subsystems of electricity,heating,and gas.It is difficult to achieve dynamic dispatch of multi-energy flows due to challenges in establishing detailed mathematical models ...An integrated energy system includes multiple subsystems of electricity,heating,and gas.It is difficult to achieve dynamic dispatch of multi-energy flows due to challenges in establishing detailed mathematical models and the impacts of uncertain factors.Reinforcement learning has the ability to fit non-convex and non-linear problems.It avoids simplifying detailed mathematical models and complex,uncertain factors in the solution process,like traditional methods,which provides a new idea for achieving dynamic energy dispatch.However,existing research generally relies on soft constraints,such as reward function penalties,to obtain the dispatch strategy,which may violate the system’s operational safety.This paper ensures the security of dynamic energy dispatch strategies by adding two constraint mechanisms on reinforcement learning.First,boundary conditions of the action security domain are established through continuous interaction and feedback between the agent and the environment.It can effectively restrict infeasible actions.Second,the strategy is constrained by the truncation function in the trust domain,and strategy update is always kept within a controllable range to improve convergence and stability of the IES dynamic energy dispatch model.Finally,simulation results indicate that the proposed method can effectively constrain agent action selections and strategy updates during the dynamic energy dispatch process.This is of great significance to ensure the safe and stable operation of the system.展开更多
1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused...1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused on the single-constraint issue,overlooking the more common multi-constraint setting which involves extensive computations and combinatorial optimization of multiple Lagrange multipliers.展开更多
Existing control methods for humanoid robots,such as Model Predictive Control(MPC)and Reinforcement Learning(RL),generally lack the modeling and exploitation of rhythmic mechanisms.As a result,they struggle to balance...Existing control methods for humanoid robots,such as Model Predictive Control(MPC)and Reinforcement Learning(RL),generally lack the modeling and exploitation of rhythmic mechanisms.As a result,they struggle to balance stability,energy efficiency,and gait transition capability during typical rhythmic motions like walking and running.To address this limitation,we propose Walk2Run,a unified control framework inspired by biological rhythmicity.The method introduces control priors based on the frequency modulation observed in human walk-run transitions.Specifically,we extract rhythmic parameters from motion capture data to construct a Rhythm Generator grounded in Central Pattern Generator(CPG)principles,which guides the policy to produce speed-adaptive periodic motion.This rhythmic guidance is further integrated with a constrained reinforcement learning framework using barrier function optimization,enhancing training stability and output feasibility.Experimental results demonstrate that our method outperforms traditional approaches across multiple metrics,achieving more natural rhythmic motion with improved energy efficiency in medium-to high-speed scenarios,while also enhancing gait stability and adaptability to the robotic platform.展开更多
基金supported by National Natural Science Foundation of China(51977005).
文摘An integrated energy system includes multiple subsystems of electricity,heating,and gas.It is difficult to achieve dynamic dispatch of multi-energy flows due to challenges in establishing detailed mathematical models and the impacts of uncertain factors.Reinforcement learning has the ability to fit non-convex and non-linear problems.It avoids simplifying detailed mathematical models and complex,uncertain factors in the solution process,like traditional methods,which provides a new idea for achieving dynamic energy dispatch.However,existing research generally relies on soft constraints,such as reward function penalties,to obtain the dispatch strategy,which may violate the system’s operational safety.This paper ensures the security of dynamic energy dispatch strategies by adding two constraint mechanisms on reinforcement learning.First,boundary conditions of the action security domain are established through continuous interaction and feedback between the agent and the environment.It can effectively restrict infeasible actions.Second,the strategy is constrained by the truncation function in the trust domain,and strategy update is always kept within a controllable range to improve convergence and stability of the IES dynamic energy dispatch model.Finally,simulation results indicate that the proposed method can effectively constrain agent action selections and strategy updates during the dynamic energy dispatch process.This is of great significance to ensure the safe and stable operation of the system.
基金supported by the Fundamental Research Funds for the Central Universities(No.2023JBZX011)the Aeronautical Science Foundation of China(No.202300010M5001).
文摘1 Introduction Constrained Reinforcement Learning(CRL),modeled as a Constrained Markov Decision Process(CMDP)[1,2],is commonly used to address applications with security restrictions.Previous works[3]primarily focused on the single-constraint issue,overlooking the more common multi-constraint setting which involves extensive computations and combinatorial optimization of multiple Lagrange multipliers.
基金supported in part by the National Natural Science Foundation of China(Grant Numbers:U2013602)the National Key R&D Program of China(Grant Number:2022YFB4601802)+1 种基金the Self-Planned Task of the State Key Laboratory of Robotics and System(Grant Number:2023FRFK01001)the National Independent Project of China(Grant Number:SKLR202301A12).
文摘Existing control methods for humanoid robots,such as Model Predictive Control(MPC)and Reinforcement Learning(RL),generally lack the modeling and exploitation of rhythmic mechanisms.As a result,they struggle to balance stability,energy efficiency,and gait transition capability during typical rhythmic motions like walking and running.To address this limitation,we propose Walk2Run,a unified control framework inspired by biological rhythmicity.The method introduces control priors based on the frequency modulation observed in human walk-run transitions.Specifically,we extract rhythmic parameters from motion capture data to construct a Rhythm Generator grounded in Central Pattern Generator(CPG)principles,which guides the policy to produce speed-adaptive periodic motion.This rhythmic guidance is further integrated with a constrained reinforcement learning framework using barrier function optimization,enhancing training stability and output feasibility.Experimental results demonstrate that our method outperforms traditional approaches across multiple metrics,achieving more natural rhythmic motion with improved energy efficiency in medium-to high-speed scenarios,while also enhancing gait stability and adaptability to the robotic platform.