摘要
针对行动者—评论家算法存在的经验学习样本维度高、策略梯度模型鲁棒性低等问题,依据多代理系统的信息协作优势,构建注意力机制网络并作为代理体,引入多层并行注意力机制网络模型对AC算法进行改进,提出一种基于多层并行注意力机制的柔性AC算法。将其用于解决动态未知环境下的机器人路径规划问题,可增强行动者的策略梯度鲁棒性并降低评论家的回归误差,实现机器人路径规划最优方案的快速收敛。实验结果表明,该算法有效克服机器人路径规划的局部最优,具有计算速度快、稳定收敛的优点。
Aiming at the high dimensionality of the empirical learning sample and the low robustness of the strategy gradient model in the actor-critic algorithm,this paper constructed the attention mechanism network and acted as a proxy based on the information cooperation advantages of the multi-agent systems,introducing a multi-layer parallel attention mechanism.By adding the network model and the soft function to the actor-critic algorithm,this paper proposed a soft actor-critic algorithm based on multi-layer parallel attention mechanism to solve the problem of robot path planning,enhance the actors’strategy gradient robustness and reduce regression error of the critics,and achieved the fast convergence of robot path planning.The experimental results show that this method can effectively overcome the local optimization problem of robot path planning,and has the advantages of fast computation speed and stable convergence.
作者
韩金亮
任海菁
吴淞玮
蒋欣欣
刘凤凯
Han Jinliang;Ren Haijing;Wu Songwei;Jiang Xinxin;Liu Fengkai(School of Mathematics,China University of Mining&Technology,Xuzhou Jiangsu 221116,China;School of Environment&Spatial Informatics,China University of Mining&Technology,Xuzhou Jiangsu 221116,China;School of Safety Engineering,China University of Mining&Technology,Xuzhou Jiangsu 221116,China;School of Information&Control Engineering,China University of Mining&Technology,Xuzhou Jiangsu 221116,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第12期3650-3655,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(61501465)
国家大学生创新训练项目(201910290053Z)。