期刊文献+

基于多层注意力机制—柔性AC算法的机器人路径规划 被引量:6

Robot path planning based on soft AC algorithm for multilayer attention mechanism
在线阅读 下载PDF
导出
摘要 针对行动者—评论家算法存在的经验学习样本维度高、策略梯度模型鲁棒性低等问题,依据多代理系统的信息协作优势,构建注意力机制网络并作为代理体,引入多层并行注意力机制网络模型对AC算法进行改进,提出一种基于多层并行注意力机制的柔性AC算法。将其用于解决动态未知环境下的机器人路径规划问题,可增强行动者的策略梯度鲁棒性并降低评论家的回归误差,实现机器人路径规划最优方案的快速收敛。实验结果表明,该算法有效克服机器人路径规划的局部最优,具有计算速度快、稳定收敛的优点。 Aiming at the high dimensionality of the empirical learning sample and the low robustness of the strategy gradient model in the actor-critic algorithm,this paper constructed the attention mechanism network and acted as a proxy based on the information cooperation advantages of the multi-agent systems,introducing a multi-layer parallel attention mechanism.By adding the network model and the soft function to the actor-critic algorithm,this paper proposed a soft actor-critic algorithm based on multi-layer parallel attention mechanism to solve the problem of robot path planning,enhance the actors’strategy gradient robustness and reduce regression error of the critics,and achieved the fast convergence of robot path planning.The experimental results show that this method can effectively overcome the local optimization problem of robot path planning,and has the advantages of fast computation speed and stable convergence.
作者 韩金亮 任海菁 吴淞玮 蒋欣欣 刘凤凯 Han Jinliang;Ren Haijing;Wu Songwei;Jiang Xinxin;Liu Fengkai(School of Mathematics,China University of Mining&Technology,Xuzhou Jiangsu 221116,China;School of Environment&Spatial Informatics,China University of Mining&Technology,Xuzhou Jiangsu 221116,China;School of Safety Engineering,China University of Mining&Technology,Xuzhou Jiangsu 221116,China;School of Information&Control Engineering,China University of Mining&Technology,Xuzhou Jiangsu 221116,China)
出处 《计算机应用研究》 CSCD 北大核心 2020年第12期3650-3655,共6页 Application Research of Computers
基金 国家自然科学基金资助项目(61501465) 国家大学生创新训练项目(201910290053Z)。
关键词 行动者—评论家算法 注意力机制 深度强化学习 机器人路径规划 actor-critic algorithm attention mechanism deep reinforcement learning robot path planning
  • 相关文献

参考文献4

二级参考文献24

  • 1王健,张汝波.基于POMDP模型的机器人导航控制方法[J].华中科技大学学报(自然科学版),2008,36(S1):12-15. 被引量:2
  • 2吴峰.基于决策理论的多智能体系统规划问题研究[D].合肥:中国科学技术大学,2011.
  • 3CHAKRABORTY I G, DAS P K, KONAR A, et al. Extended Q-lear ning algorithm for path-planning of a mobile robot [ C ]//Proe of the 8th International Conference on Simulated Evolution and Learning. Berlin : Springer-Verlag,2010 : 379- 383.
  • 4MOHAMMAD A K J, MOHAMMAD A R, LARA Q. Reinforcement based mobile robot navigation in dynamic environment [ J : Robotics and Computer-lntngrated Manufacturing ,2011,27( 1 ) : 135-149.
  • 5ViET H H, KYAW PH, CHUNG T C. Simulation-based evaluations of reinforcement learning algorithms for autonomous mobile robot path planningl Cl//Proc of the 3rd FTRA Intemati0rm:l Conference onIn- formation Technology Convergence and Services. 2012:467-476.
  • 6SANTOS M, MARTIN H J A, LOPEZ V,et a/. Dyna-H: a heuristia planning reinforcement learning algorithm applied to role-playing game strategy decision systems [ J ]. Knowledge-Based Systems, 2012, 32:28:36.
  • 7VIET H H ,AN S H,CHUNG T C. Extended Dyna-Q algorithm for path planning of mobile robots[ J]. Journal of rernent Science and !nstrumentation :2011,2 (3) 283-287. .
  • 8MARTIN H J A, LOPE J D, MARAVALL D. The kNN.TD reinforce- ment learning algorithmi C ]//Proc of the 3rd International Work-Con- ference on the Interplay Between Natural and Artificial Computation. Berlin : Spfinger-Verlag, 2009 : 305- 312.
  • 9符云清,王松健,吴中福.基于链路状态加权的无线Mesh网络路由协议[J].计算机研究与发展,2009,46(1):137-143. 被引量:9
  • 10赵冬斌,刘德荣,易建强.基于自适应动态规划的城市交通信号优化控制方法综述[J].自动化学报,2009,35(6):676-681. 被引量:44

共引文献179

同被引文献68

引证文献6

二级引证文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部