摘要
随着自动化技术和机器人领域的快速发展,移动机器人路径规划的精确性要求日益提高.针对深度强化学习在复杂环境下路径规划存在的收敛稳定性差、样本效率低及环境适应性不足等问题,提出了一种改进的基于决斗深度双Q网络的路径规划算法(R-D3QN).通过构建双网络架构解耦动作选择与价值估计过程,有效缓解Q值过估计问题,提高收敛稳定性;设计时序优先经验回放机制,结合长短期记忆网络(LSTM)的时空特征提取能力,改进样本利用效率;提出基于模拟退火的多阶段探索策略,平衡了探索与利用,增强环境适应性.实验结果表明,与传统DQN算法相比, R-D3QN算法在简单环境下平均奖励值提高了9.25%,收敛次数减少了24.39%,碰撞次数减少了41.20%;在复杂环境下,平均奖励值提升了12.98%,收敛次数减少了11.86%,碰撞次数减少了42.14%.同时与其他改进的DQN算法对比也具有明显的优势,验证了所提算法的有效性.
The rapid advancement of automation technology and robotics requires more precision in mobile robot path planning.To address the problems of poor convergence stability,low sample efficiency,and insufficient environmental adaptability in deep reinforcement learning for path planning in complex environments,this study proposes an enhanced path planning algorithm based on dueling double deep Q-network(R-D3QN).By constructing a dual-network architecture to decouple the action selection and value estimation processes,this method effectively alleviates the Q-value overestimation problem,thereby improving convergence stability.In addition,this method designs a temporal-prioritized experience replay mechanism combined with the spatiotemporal feature extraction capabilities of long short-term memory(LSTM)networks to improve sample utilization efficiency.Finally,this method proposes a multi-stage exploration strategy based on simulated annealing to balance exploration and exploitation,thereby enhancing environmental adaptability.Experimental results demonstrate that,compared to the traditional DQN algorithm,the R-D3QN algorithm achieves a 9.25%increase in average reward value,a 24.39%reduction in convergence iterations,and a 41.20%decrease in collision frequency in simple environments.In complex environments,it shows a 12.98%increase in average reward value,an 11.86%reduction in convergence iterations,and a 42.14%decrease in collision frequency.Furthermore,the effectiveness of the proposed algorithm is validated when compared with other enhanced DQN algorithms.
作者
谢天
周毅
邱宇峰
XIE Tian;ZHOU Yi;QIU Yu-Feng(School of Artificial Intelligence and Automation,Wuhan University of Science and Technology,Wuhan 430081,China;Baosight Software(Wuhan)Co.Ltd.,Wuhan 430080,China)
出处
《计算机系统应用》
2025年第7期37-47,共11页
Computer Systems & Applications
基金
国家自然科学基金(62372343)。
关键词
移动机器人
路径规划
深度Q网络
强化学习
mobile robot
path planning
deep Q-network(DQN)
reinforcement learning