期刊文献+

一种改进的强化学习方法在RoboCup中应用研究 被引量:2

Improved Reinforcement Learning Algorithm and Its Application in RoboCup
在线阅读 下载PDF
导出
摘要 基于CMAC(cerebella model articulation controller)提出一种动态强化学习方法(dynamic cerebellamodel articulation controller-advantage learning,DCMAC-AL)。该方法利用advantage(λ)learning计算状态-动作函数,强化不同动作的值函数差异,以避免动作抖动;然后在CMAC函数拟合基础上,利用Bellman误差动态添加特征值,提高CMAC函数拟合的自适应性。同时,在RoboCup仿真平台上对多智能体防守任务(takeaway)进行建模,利用新提出的算法进行学习实验。实验结果表明,DCMAC-AL比采用CMAC的advantage(λ)learning方法有更好的学习效果。 An improved algorithm based on CMAC (cerebella model articulation controller) and named DCMAC-AL is proposed. It uses advantage (λ) learning to calculate the state-action function,emphasizes the differences among action values and shuns action oscillation. It creates novel features based on Bellman error to improve the adaption of CMAC. Besides,it provides a mathematic model for takeaway in RoboCup Soccer Simulation and experiment with DCMAC-AL. The results demonstrate that DCMAC-AL outperforms advantage(λ) learning in regard to learning effort.
作者 程显毅 朱倩
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2010年第3期99-103,共5页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家自然科学基金资助项目(60702056) 江苏省研究生创新项目(ZX09B2042)
关键词 强化学习 AGENT ROBOCUP CMAC reinforcement learning agent RoboCup CMAC
  • 相关文献

参考文献7

  • 1SUTTON S R, BARTO A G. Reinforcement learning [M]. Cambridge, MA : MIT Press, 1998 : 24-26.
  • 2BAKKER B. Reinforcement learning with long short-term memory[C]//Advances in Neural Information Processing System 14. Cambridge ,MA :MIT Press ,2002 : 987-990.
  • 3PHILIPP W K,SHIE M,DOINA P. Automatic basis function construction for approximate dynamic programming and reinforcement learning[C]//Proceedings of the 23rd International Conference on Machine learning. Cambridge:MIT Press,2006 : 1103-1115.
  • 4高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量:13
  • 5李明爱,焦利芳,郝冬梅,乔俊飞.基于多个并行CMAC神经网络的强化学习方法[J].系统仿真学报,2008,20(24):6683-6685. 被引量:2
  • 6STONE P,SUTTON R S,KUHLMANN G. Reinforcement learning for RoboCup-soceer keepaway[J]. Adaptive Behavior ,2005,13(3) : 165-188.
  • 7ATIL S,TOLEDO C B. A new perspective to the keepaway soccer:the takers (ShortPaper)[C]//ISCEN A,EROGUL U. Proc of 7th Int Conf on Autonomous Agents and Multiagent Systems (AAMAS 2008). Estoril,Portugal: Springer Press, 2008 : 566-569.

二级参考文献18

  • 1Barto A G, Suton S, Anderson C W. Neuron like adaptive elements that can solve difficult learning control problems [J]. IEEE Trans on Systems Man and Cybernetics (ISBN: 0-262-01097-6), 1983, 13(5): 834-846.
  • 2Anderson C W. Learning to control an inverted pendulum using neural networks [J]. IEEE Control System Magazine (S0272-1708), 1989, 9(4): 31-35.
  • 3J C H Watkins. Learning from Delayed Rewards PHD [D]. England: University of Cambridge, 1989.
  • 4Si J N, Wang Y T. On-line learning control by association and reinforcement [J]. IEEE Transactions on Neural Net-works ($1045- 9227), 2001, 12(2): 264-276.
  • 5Lin C T, Lee C S G. Reinforcement structure / parameter learning for neural network-based fuzzy logic control systems [J]. IEEE Transaction on Fuzzy systems (S1063-6706), 1994, 2(1): 46-63.
  • 6Y Sakai, K Kurosawa. Develop of elevator supervisory group control system with artificial intelligence[ J] .Hitachi Review, 1984,33:25 - 30.
  • 7M L Siikonen. Elevator traffic simulafion[J]. Simulation, 1993, 61 : 257 - 267.
  • 8H Ujihara,S Tsuji. The revolutionary AI-2100 elevator-group control system and the new intelligent option series [ J ]. Mitsubishi Electric Advance, 1988,45: 5 - 8.
  • 9H Ujihara, M Amano. The latest elevator group-control system [J]. Mitsubishi Electric Advance, 1994,67:10 - 12.
  • 10Cdtes R H, Barto A G. Elevator group control using multiple reinforcement learning agents[ J ]. Machine Learning, 1998, 33 (2) :235 - 262.

共引文献13

同被引文献32

  • 1李瑾,刘全,杨旭东,杨凯,翁东良.一种改进的平均奖赏强化学习方法在RoboCup训练中的应用[J].苏州大学学报(自然科学版),2012,28(2):21-26. 被引量:2
  • 2Chen M,Klaus D,Ehsan F.User Manual:RoboCup Soccer ServerManual for Soccer Server Version 7.07 and Later[EB/OL].http://sourceforge.net/projects/sserver/files.
  • 3Stone P,Kuhlmann G,Taylor M E,et al.Keepaway Soccer:from Machine Learning Testbed to Benchmark[M].RoboCup 2005:Robot Soccer World Cup IX .Berlin:Springer Verlag,2006:93-105.
  • 4Stone P,Sutton R S,Kuhlmann G.Reinforcement Learning for RoboCup Soccer Keepaway [J].Adaptive Behavior,2005,13(3):165-188.
  • 5Sutton R S,Barto A G.Reinforcement Learning:an Introduction [M].Cambridge,MA:The MIT Press,2012.
  • 6Taylor M,Stone P,Liu Y.Transfer Learning via Inter-taskMappings for Temporal Difference Learning [J].Journal of Machine Learning Research,2007,8(1):2125-2167.
  • 7Fernández F,García J,Veloso M.Probabilistic Policy Reuse for Inter-task Transfer Learning [J].Robotics and Autonomous Systems,2010,58(7):866-871.
  • 8Fernández F,Veloso M.Probabilistic Policy Reuse in a Rein-forcement Learning Agent[C]∥Nakashima H,Wellman M.AAMAS’06 Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-agent Systems.New York:ACM Press,2006:720-727.
  • 9Rummery G A,Niranjan M.On-Line Q-learning using Connectionist Systems[R].Cambridge,England:Cambridge University Engineering Department,1994.
  • 10Walsh T J,Li L,Littman M.Transferring State ions between MDPs[C]∥Proceedings of the ICML’06 Workshop on Structural Knowledge Transfer for Machine Learning.2006.

引证文献2

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部