期刊文献+

基于TD(λ)的自然梯度强化学习算法 被引量:2

Natural Gradient Reinforcement Learning Algorithm with TD(λ)
在线阅读 下载PDF
导出
摘要 近年来强化学习中的策略梯度方法以其良好的收敛性能吸引了广泛的关注。研究了平均模型中的自然梯度算法,针对现有算法估计梯度时效率较低的问题,在梯度估计的值函数逼近中采用了TD(λ)方法。TD(λ)中的资格迹使学习经验的传播更加高效,从而能够降低梯度估计的方差,提升算法的收敛速度。车杆平衡系统仿真实验验证了所提算法的有效性。 In recent years,policy gradient methods arouse extensive interests in reinforcement learning with its excellent convergence property.Natural gradient algorithms were investigated in this paper.To resolve the problem of low efficiency when estimating the gradient in present algorithms,TD(λ) method was used to approximate the value functions when estimating the gradient.The eligibility traces in TD(λ) make the propagation of learning experience more efficient.As a result,the variance in gradient estimation can be decreased and the convergence speed can be improved.The simulation experiment in cart-pole balancing system demonstrates the effectiveness of the algorithm.
出处 《计算机科学》 CSCD 北大核心 2010年第12期186-189,共4页 Computer Science
基金 国家自然科学基金项目(70971067 60905002) 江苏省高校自然科学重大基础研究项目(08KJA520001) 江苏省六大人才高峰项目(2007148)资助
关键词 策略梯度 自然梯度 TD(λ) 资格迹 Policy gradient Natural gradient TD(λ) Eligibility trace
  • 相关文献

参考文献5

二级参考文献80

  • 1阎平凡.再励学习——原理、算法及其在智能控制中的应用[J].信息与控制,1996,25(1):28-34. 被引量:30
  • 2Sutton R S, Barto A G. Reinforcement Learning: an Introduction [M]. Cambridge: The MIT Press, 1998.
  • 3Driessens K. Relational Reinforcement Learning [D]. Leuven, Belgium:Department of Computer Science, K. U. Leuven, May 2004.
  • 4Rasmussen C E , Kuss M. Gaussian processes in reinforcement learning[C]// Advances in Neural Information Processing Systems. vol16,MIT Press.
  • 5Driessens K, Ramon J, Gartner T. Graph kernels and Gaussian processes for relational reinforcement leaming[J]. Mach Learn, 2006,64 : 91-119.
  • 6Dzeroski S, Raedt L D, Blockeel H. Relational Reinforcement Leaming[C]//Shavlik J, eds. Proceedings ICML' 98. Berlin: Morgan Kaufmann, 2003 : 136-143.
  • 7Kaelbling L P, Littman M L, Moore A W. Reinforcement Learning: A Survey [J]. Journal of Artifical Intelligence Research 1996,4:237-285.
  • 8Gartner T, Driessens K, Ramon J. Graph kernels and Gaussian processes for relational reinforcement learning[C]//Proceeding of the International Conference on Inductive Logic Programming (ILP'03). 2003.
  • 9Mackay D. Introduction to Gaussian processes [OL]. http :// wol. ra. phy. cam. ac. uk/mackay.
  • 10Chu Wei, Ghahramani Z. Gaussian Processes for Ordinal Regression[J]. Journal of Machine Learning Research, 2005,6 : 1019-1041.

共引文献34

同被引文献24

  • 1王亚杰,王晓岩,邱虹坤,李飞.建设棋牌谱标准 构建计算机博弈竞赛持续发展新生态[J].实验技术与管理,2020,37(2):19-23. 被引量:4
  • 2童亮,陆际联,龚建伟.一种快速强化学习方法研究[J].北京理工大学学报,2005,25(4):328-331. 被引量:4
  • 3张汝波,施洋.基于模糊Q学习的多机器人系统研究[J].哈尔滨工程大学学报,2005,26(4):477-481. 被引量:4
  • 4郭锐,吴敏,彭军,彭姣,曹卫华.一种新的多智能体Q学习算法[J].自动化学报,2007,33(4):367-372. 被引量:13
  • 5Desouky S F, Schwartz H M. Schwartz. Q (A)-learn- ing fuzzy logic controller for a multi-robot system [ C ]// IEEE International Conference on Systems, Man and Cybernetics. Istanbul, Turkey, 2010:4075-4080.
  • 6Hu Zhaohui, Zhao Dongbiao. Reinforcement learning for multi-agent patrol policy [ C ]//The 9th IEEE Inter- national Conference on Cognitive Informatics. Beijing, China, 2010:530 - 535.
  • 7Martin J A H, de Lope J, Maravall D. Robust high per- formance reinforcement learning through weighted k-nearest neighbors [ J ]. Neurocomputing, 2011, 74 (8) : 1251 -1259.
  • 8A k-NN based perception scheme for reinforcement learning [ J ]. Lecture notes in Computer Science, 2007,4739 : 138 - 145.
  • 9Martin J A H, de Lope J. Ex < a > : an effective algo- rithm for continuous actions reinforcement learning problems [ C ]//The 35th IEEE Annual Conf on Indus- trial Electronics Society. Oporto, Portugal, 2009: 2063 - 2068.
  • 10Martin J A H, de Lope J, Maravall D. The kNN-TD reinforcement learning algorithm[J]. Lecture Notes in Computer Science, 2009, 5901:305-314.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部