摘要
深度确定性策略梯度(deep deterministic policy gradient,DDPG)算法采用Actor-Critic框架结构,保证移动机器人运动的连续性。但Critic网络在计算值函数(Q值)时,没有充分考虑各种状态和动作的差异,导致Q值估计不准确;其次,DDPG奖励函数设置过于稀疏,容易导致模型训练时收敛慢;另外,随机均匀采样方式无法高效且充分地利用样本数据。针对上述问题,该文在DDPG的基础上,引入决斗网络来提高Q值的估计精度;优化设计奖励函数以引导移动机器人更加高效合理地运动;将单一经验池分离为双经验池,并采用动态自适应采样机制来提高经验回放的效率。最后,利用机器人操作系统和Gazebo平台搭建的仿真环境进行实验,结果表明,所提算法与DDPG算法相比,训练时间缩短了17.8%,收敛速度提高了57.46%,成功率提高了3%;与其他算法相比,该文所提算法提高了模型训练过程的稳定性,大大提升了移动机器人路径规划的效率和成功率。
The deep deterministic policy gradient(DDPG)algorithm utilizes an actorcritic framework to ensure smooth motion of mobile robots.However,the critic network tends to fail to distinguish effectively between different states and actions,leading to inaccurate Q-value estimates.Additionally,the sparse reward function in DDPG slows down convergence during model training,while the random uniform sampling approach utilizes the sample data inefficiently.To address these challenges,this paper introduces dueling networks to improve Q-value estimation accuracy within DDPG framework.The reward function is optimized to guide the mobile robot toward more efficient and effective movement.Furthermore,the single experience replay buffer is split into two parts,and a dynamic adaptive sampling mechanism is adopted to enhance replay efficiency.Finally,the proposed algorithm is evaluated in a simulation environment built with the robot operating system(ROS)system and Gazebo platform.Experimental results demonstrate that compared to the standard DDPG algorithm,the proposed approach reduces training time by 17.8%,improves convergence speed by 57.46%,and increases the success rate by 3%.Moreover,the proposed method outperforms other algorithms in terms of stability during model training,significantly improving the efficiency and success rate of mobile robot path planning.
作者
张庆玲
倪翠
王朋
巩慧
ZHANG Qingling;NI Cui;WANG Peng;GONG Hui(School of Information Science and Electric Engineering,Shandong Jiaotong University,Jinan 250357,Shandong,China;Institute of Automation,Shandong Academy of Sciences,Jinan 250013,Shandong,China)
出处
《应用科学学报》
北大核心
2025年第3期415-436,共22页
Journal of Applied Sciences
基金
中国博士后科学基金(No.2021M702030)
山东省交通运输厅科技计划项目基金(No.2021B120)。
关键词
路径规划
深度确定性策略梯度
决斗网络
经验池分离
动态自适应采样
path planning
deep deterministic policy gradient(DDPG)
dueling network
experience pool separation
dynamic adaptive sampling