期刊文献+

改进MADDPG多智能体的方法

Methodology for improving MADDPG algorithm
在线阅读 下载PDF
导出
摘要 为研究多智能体深度确定性策略梯度算法(MADDPG)通过共享观察信息和历史经验解决多智能体环境不稳定的问题,通过改进多智能体深度确定性梯度算法MADDPG提出了IMMADDPG算法,分析改进网络结构更有效地降低环境不稳定性和值函数高估对策略网络训练的影响。结果表明:IMMADDPG算法在合作导航环境下,智能体到达目标的概率高于MADDPG算法3.7%;在协同和竞争的捕食环境下,IMMADDPG的智能体策略其捕食智能体捕食到被捕食智能体的平均次数为5.79,被捕食者智能体到达目标地标的平均次数为2.23,而MADDPG的捕食的平均次数为4.82、到达目标地标的平均次数为1.76。IMMADDPG相对于MADDPG多智能体在深度强化学习环境中具有更好的表现。 This paper intends to address the environment instability of multi-agent by sharing observation information and historical experiences to some extent by using multi-agent deep deterministic policy gradient algorithm(MADDPG)and proposes an IMMADDPG algorithm by improving MADDPG.The study works by analyzing and improving the network structure for effectively reducing the influence of environment instability and overestimation of value function on network training policy.The results demonstrate that the tests are conducted in two different environments.With the IMMADDPG algorithm in Cooperative Navigation environment,the probability of agents reaching their targets using IMMADDPG is 3.7%higher than that of MADDPG,and in the Predator-Prey environment,which involves both cooperation and competition,the strategy of agents trained with IMMADDPG results in an average of 5.79 prey captures by predator agents,and an average of 2.23 times for prey agents to their target landmarks.In contrast,with MADDPG algorithm,the average number of prey captures is 4.82,and the average times prey agents to their target landmarks is 1.76.IMMADDPG demonstrates better performance than MADDPG in multi-agent deep reinforcement learning environments.
作者 宁姗 赵秋多 丁毓龙 郭嘉承 Ning Shan;Zhao Qiuduo;Ding Yulong;Guo Jiacheng(Innovation&Entrepreneurship,Heilongjiang University of Science&Technology,Harbin 150022,China;School of Electronics&Information Engineering,Heilongjiang University of Science&Technology,Harbin 150022,China)
出处 《黑龙江科技大学学报》 2025年第1期160-165,172,共7页 Journal of Heilongjiang University of Science And Technology
关键词 深度强化学习 多智能体协作 多智能体竞争 中心化训练 去中心化执行 deep reinforcement learning multi-agent collaboration multi-agent competition centralized training decentralized execution
  • 相关文献

参考文献9

二级参考文献35

共引文献178

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部