摘要
在军事或民用仓库进行货物搬运和仓储物流时,重点是解决自主移动机器人(AMR)的路径规划问题。为了更快地得到最优路径,提出了一种基于KL散度的迁移强化学习(KL-TRL)算法。KL-TRL算法通过计算源任务与目标任务之间的Kullback-Leibler(KL)散度,将此散度与一项衰减因子整合进目标任务的Q值更新过程中,以此来引导目标任务的学习。这种算法更充分地利用了先前任务的经验,加快了目标任务的学习速度,从而更快地得到最优路径。KL-TRL算法的有效性在AMR协作搬运任务仿真中得到了证实。相比其他传统迁移强化学习算法,该算法具有更快的启动速度和收敛速度。通过快速找到最优路径,AMR可以在军事领域中快速部署装备,在民用领域提高物流效率。
In the context of military or civilian warehouse cargo handling and warehousing logistics,addressing the path planning issue for Autonomous Mobile Robots(AMR)is a key focus.To obtain the optimal path more rapidly,a Kullback Leibler-Transfer Reinforcement Learning(KL-TRL)algorithm is proposed based on Kullback Leibler(KL)divergence.The KL-TRL algorithm calculates the KL divergence between the source task and the target task,integrating this divergence with a decay factor into the Q-value update process of the target task.This approach fully leverages the experience of previous tasks,accelerates the learning speed of the target task and thereby obtains the optimal path more rapidly.The effectiveness of the KL-TRL algorithm is demonstrated in simulations of AMR collaborative handling tasks.Compared with other traditional transfer reinforcement learning algorithms,this algorithm has faster start-up and convergence speeds.By rapidly finding the optimal path,the AMRs can quickly deploy equipment in the military fields and improve logistics efficiency in the civilian fields.
作者
李聪
张震
刘鹏昌
LI Cong;ZHANG Zhen;LIU Pengchang(School of Automation,Qingdao University,Qingdao 266000,China;Shandong Key Laboratory of Industrial Control Technology,Qingdao 266000,China;Third Operation Center,Qingdao Metro Operation Co.,Ltd.,Qingdao 266000,China)
出处
《电光与控制》
北大核心
2026年第1期78-83,90,共7页
Electronics Optics & Control
基金
国家自然科学基金(61903209)。
关键词
自主移动机器人
机器人控制
迁移学习
强化学习
KL散度
autonomous mobile robot
robot control
transfer learning
reinforcement learning
KL divergence