Energy efficiency stands as an essential factor when implementing deep reinforcement learning(DRL)policies for robotic control systems.Standard algorithms,including Deep Deterministic Policy Gradient(DDPG),primarily o...Energy efficiency stands as an essential factor when implementing deep reinforcement learning(DRL)policies for robotic control systems.Standard algorithms,including Deep Deterministic Policy Gradient(DDPG),primarily optimize task rewards but at the cost of excessively high energy consumption,making them impractical for real-world robotic systems.To address this limitation,we propose Physics-Informed DDPG(PI-DDPG),which integrates physics-based energy penalties to develop energy-efficient yet high-performing control policies.The proposed method introduces adaptive physics-informed constraints through a dynamic weighting factor(λ),enabling policies that balance reward maximization with energy savings.Our motivation is to overcome the impracticality of rewardonly optimization by designing controllers that achieve competitive performance while substantially reducing energy consumption.PI-DDPG was evaluated in nine MuJoCo continuous control environments,where it demonstrated significant improvements in energy efficiency without compromising stability or performance.Experimental results confirm that PI-DDPG substantially reduces energy consumption compared to standard DDPG,while maintaining competitive task performance.For instance,energy costs decreased from 5542.98 to 3119.02 in HalfCheetah-v4 and from1909.13 to 1586.75 in Ant-v4,with stable performance in Hopper-v4(205.95 vs.130.82)and InvertedPendulum-v4(322.97 vs.311.29).Although DDPG sometimes yields higher rewards,such as in HalfCheetah-v4(5695.37 vs.4894.59),it requires significantly greater energy expenditure.These results highlight PI-DDPG as a promising energy-conscious alternative for robotic control.展开更多
文摘Energy efficiency stands as an essential factor when implementing deep reinforcement learning(DRL)policies for robotic control systems.Standard algorithms,including Deep Deterministic Policy Gradient(DDPG),primarily optimize task rewards but at the cost of excessively high energy consumption,making them impractical for real-world robotic systems.To address this limitation,we propose Physics-Informed DDPG(PI-DDPG),which integrates physics-based energy penalties to develop energy-efficient yet high-performing control policies.The proposed method introduces adaptive physics-informed constraints through a dynamic weighting factor(λ),enabling policies that balance reward maximization with energy savings.Our motivation is to overcome the impracticality of rewardonly optimization by designing controllers that achieve competitive performance while substantially reducing energy consumption.PI-DDPG was evaluated in nine MuJoCo continuous control environments,where it demonstrated significant improvements in energy efficiency without compromising stability or performance.Experimental results confirm that PI-DDPG substantially reduces energy consumption compared to standard DDPG,while maintaining competitive task performance.For instance,energy costs decreased from 5542.98 to 3119.02 in HalfCheetah-v4 and from1909.13 to 1586.75 in Ant-v4,with stable performance in Hopper-v4(205.95 vs.130.82)and InvertedPendulum-v4(322.97 vs.311.29).Although DDPG sometimes yields higher rewards,such as in HalfCheetah-v4(5695.37 vs.4894.59),it requires significantly greater energy expenditure.These results highlight PI-DDPG as a promising energy-conscious alternative for robotic control.