摘要
深度强化学习算法在机器人控制领域应用越来越广泛,但用于连续动作空间的算法,如DDPG,一直存在估值高估的问题,在机器人控制领域应用尚不成熟。本文为了提高深度强化学习算法中目标值估值的准确性,得到更适用于机器人控制的深度强化学习算法,提出了一种基于三值估算法的深度双确定性策略梯度算法,该算法采用三值估算法来估计目标评论家网络的估值,去计算目标值作为当前网络的评估标准,采用双确定性策略网络在当前时间步数下生成最优策略,采用更适用于机械臂深度强化学习控制的OU噪声加到动作策略中。实验证明,该算法在复杂模型和环境中能够表现更好的性能。
Deep reinforcement learning algorithms are increasingly widely used in the field of robot control,but algorithms used in continuous action spaces,such as DDPG,have always had the problem of overestimation,and their application in the field of robot control is not yet mature.In order to improve the accuracy of target value estimation in deep reinforcement learning algorithms and obtain a more suitable deep reinforcement learning algorithm for robot control,this paper proposes a deep doubly deterministic strategy gradient algorithm based on the three-value estimation method.The algorithm uses the three-value estimation method to estimate the target critic network's estimation,and calculates the target value as the evaluation standard for the current network,adopting a dual deterministic strategy network to generate the optimal strategy at the current time step,and incorporating OU noise that is more suitable for deep reinforcement learning control of robotic arms into the action strategy.Experimental results have shown that this algorithm can perform better in complex models and environments.
作者
王文龙
张帆
唐超
李徐
郝正阳
张帆扬
WANG Wenlong;ZHANG Fan;TANG Chao;LI Xu;HAO Zhengyang;ZHANG Fanyang(School of Mechanical and Automotive Engineering,Shanghai University of Engineering Science,Shanghai 201620,China)
出处
《智能计算机与应用》
2024年第5期75-82,共8页
Intelligent Computer and Applications
基金
上海市科委生物医药领域科技支撑计划(17441901200)。
关键词
深度强化学习
三值估算法
双确定性策略
机器人控制
deep reinforcement learning
three-value estimation method
double-deterministic strategy
robot control