摘要
本文研究了机器人足球赛中利用增强学习进行角色分工的问题 ,通过仿真试验和理论分析 ,指出文 [1]中采取无限作用范围衰减奖励优化模型 ( infinite- horizon discounted model)的 Q学习算法对该任务不合适 ,并用平均奖励模型 ( average- reward model)对算法进行了改进 ,实验表明改进后学习的收敛速度以及系统的性能都提高了近一倍 .
In this paper, the role diversity based on reinforcement learning in robot soccer is studied. Through simulation and analysis, it is shown that the Q algorithm infinite horizon discounted model in \ is not suitable to this task. Instead of that, average reward model is used for improving the algorithm. Simulation experiments show that the convergence rate in learning and the system performance are twice increased after improvement.
出处
《机器人》
EI
CSCD
北大核心
2000年第6期482-489,共8页
Robot
基金
86 3项目!(86 3- 5 12 - 980 5 - 18)
国家自然科学基金!(6 98895 0 1)
关键词
机器人足球赛
增强学习
Q算法
任务分工
Q algorithm, infinite horizon discounted model, average reward model