Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of diff...Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of different reset strategies,normal,non-randomized,and randomized,on agent performance using the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed DDPG(TD3)algorithms within the CarRacing-v2 environment.Two experimental setups were conducted:an extended training regime with DDPG for 1000 steps per episode across 1000 episodes,and a fast execution setup comparing DDPG and TD3 for 30 episodes with 50 steps per episode under constrained computational resources.A step-based reward scaling mechanism was applied under the randomized reset condition to promote broader state exploration.Experimental results showthat randomized resets significantly enhance learning efficiency and generalization,with DDPG demonstrating superior performance across all reset strategies.In particular,DDPG combined with randomized resets achieves the highest smoothed rewards(reaching approximately 15),best stability,and fastest convergence.These differences are statistically significant,as confirmed by t-tests:DDPG outperforms TD3 under randomized(t=−101.91,p<0.0001),normal(t=−21.59,p<0.0001),and non-randomized(t=−62.46,p<0.0001)reset conditions.The findings underscore the critical role of reset strategy and reward shaping in enhancing the robustness and adaptability of DRL agents in continuous control tasks,particularly in environments where computational efficiency and training stability are crucial.展开更多
The study of exoskeletons has been a popular topic worldwide.However,there is still a long way to go before exoskeletons can be widely used.One of the major challenges is control,and there is no specific research tren...The study of exoskeletons has been a popular topic worldwide.However,there is still a long way to go before exoskeletons can be widely used.One of the major challenges is control,and there is no specific research trend for controlling exoskeletons.In this paper,we propose a novel exoskeleton control strategy that combines Active Disturbance Rejection Control(ADRC)and Deep Reinforcement Learning(DRL).The dynamic model of the exoskeleton is constructed,followed with the design of the ADRC.To automatically adjust the control parameters of the ADRC,the Twin-Delayed Deep Deterministic Policy Gradient(TD3)is utilized.Then a reward function is defined in terms of the joint angle,angular velocity,and their errors to the desired values,to maximize the accuracy of the joint angle.In the simulations and experiments,a conventional ADRC,and ADRC based on Genetic Algorithm(GA)and Particle Swarm Optimization(PSO)were carried out for comparison with the proposed control method.The results of the tests show that TD3-ADRC has a rapid response,small overshoot,and low Mean Absolute Error(MAE)and Root Mean Square Error(RMSE)followed with the desired,demonstrating the superiority of the proposed control method for the self-learning control of exoskeleton.展开更多
基金supported by the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia(Project No.MoE-IF-UJ-R2-22-04220773-1).
文摘Domain randomization is a widely adopted technique in deep reinforcement learning(DRL)to improve agent generalization by exposing policies to diverse environmental conditions.This paper investigates the impact of different reset strategies,normal,non-randomized,and randomized,on agent performance using the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed DDPG(TD3)algorithms within the CarRacing-v2 environment.Two experimental setups were conducted:an extended training regime with DDPG for 1000 steps per episode across 1000 episodes,and a fast execution setup comparing DDPG and TD3 for 30 episodes with 50 steps per episode under constrained computational resources.A step-based reward scaling mechanism was applied under the randomized reset condition to promote broader state exploration.Experimental results showthat randomized resets significantly enhance learning efficiency and generalization,with DDPG demonstrating superior performance across all reset strategies.In particular,DDPG combined with randomized resets achieves the highest smoothed rewards(reaching approximately 15),best stability,and fastest convergence.These differences are statistically significant,as confirmed by t-tests:DDPG outperforms TD3 under randomized(t=−101.91,p<0.0001),normal(t=−21.59,p<0.0001),and non-randomized(t=−62.46,p<0.0001)reset conditions.The findings underscore the critical role of reset strategy and reward shaping in enhancing the robustness and adaptability of DRL agents in continuous control tasks,particularly in environments where computational efficiency and training stability are crucial.
基金funded by Zhiyuan Laboratory(Grant NO.ZYL2024017a).
文摘The study of exoskeletons has been a popular topic worldwide.However,there is still a long way to go before exoskeletons can be widely used.One of the major challenges is control,and there is no specific research trend for controlling exoskeletons.In this paper,we propose a novel exoskeleton control strategy that combines Active Disturbance Rejection Control(ADRC)and Deep Reinforcement Learning(DRL).The dynamic model of the exoskeleton is constructed,followed with the design of the ADRC.To automatically adjust the control parameters of the ADRC,the Twin-Delayed Deep Deterministic Policy Gradient(TD3)is utilized.Then a reward function is defined in terms of the joint angle,angular velocity,and their errors to the desired values,to maximize the accuracy of the joint angle.In the simulations and experiments,a conventional ADRC,and ADRC based on Genetic Algorithm(GA)and Particle Swarm Optimization(PSO)were carried out for comparison with the proposed control method.The results of the tests show that TD3-ADRC has a rapid response,small overshoot,and low Mean Absolute Error(MAE)and Root Mean Square Error(RMSE)followed with the desired,demonstrating the superiority of the proposed control method for the self-learning control of exoskeleton.