期刊文献+

分层强化学习研究综述 被引量:7

A Survey of Hierarchical Reinforcement Learning
原文传递
导出
摘要 强化学习通过试错与环境交互获得策略的改进,其自学习和在线学习的特点使其成为机器学习研究的一个重要分支。但是,强化学习一直被"维数灾"问题所困扰。近年来,分层强化学习方法引入抽象(Abstraction)机制,在克服"维数灾"方面取得了显著进展。作为理论基础,本文首先介绍了强化学习的基本原理及基于半马氏过程的Q-学习算法,然后介绍了3种典型的单Agent分层强化学习方法(Option、HAM和MAXQ)的基本思想,Q-学习更新公式,概括了各方法的本质特征,并对这3种方法进行了对比分析评价。最后指出了将单Agent分层强化学习方法拓展到多Agent分层强化学习时需要解决的问题。 Reinforcement learning is an approach that an agent can learn its behaviors through trial-anderror interaction with a dynamic environment. It has been an important branch of machine learning for its self-learning and online learning capabilities. But reinforcement learning is bedeviled by the curse of dimensionality. Recently, hierarchical reinforcement learning has made great progresses in combatting with the curse of dimensionality by employing abstraction. As theoretical basis, the principle of reinforcement learning and Q-learning based on Semi-Markov Decision Process (SMDP) are introduced at first. Then, three typical single-agent hierarchical reinforcement learning approaches, namely, Option, HAM, and MAXQ, are reviewed, including their main ideas, Q-learning update formulas, commentaries, and the comparisons among them. At last, the open challenges in the process of the single-agent hierarchical reinforcement learning approaches being extended to multi-agent system are discussed.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2005年第5期574-581,共8页 Pattern Recognition and Artificial Intelligence
关键词 分层强化学习 半马氏过程 Q-学习 多智能体系统 Hierarchical Reinforcement Learning, Semi-Markov Decision Process, Q-Learning, Multi-Agent System
  • 相关文献

参考文献45

  • 1高阳,陈世福,陆鑫.强化学习研究综述[J].自动化学报,2004,30(1):86-100. 被引量:295
  • 2Singh S P, Jaakola T, Jordan M I. Reinforcement Learning with Soft State Aggregation. In:Tesauro G, Touretzky D S, Leen T K, eds. Advances in Neural Information Processing Systems 7.Cambridge, USA:MIT Press, 1995, 361-368.
  • 3Moriarty D, Sehultz A, Grefenstette J. Evolutionary Algorithms for Reinforcement Learning. Journal of Artificial Intelligence Research, 1999, 11:241-276.
  • 4Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming.Belmont, USA: Athena Scientific, 1996.
  • 5Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems:Theory and Applications, 2003, 13(4), 41-77.
  • 6Sutton R S, Precup D, Singh S P. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 1999, 112(1-2): 181-211.
  • 7Parr R. Hierarchical Control and Learning for Markov Decision Processes. Ph. D Dissertation. University of California, Berkeley, USA, 1998.
  • 8Dietterich T G. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 2000, 13 : 227- 303.
  • 9Minsky M L. Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain-Model Problem. Ph. D Dissertation. Princeton University, Princeton, USA, 1954.
  • 10Bellman R E, Dreyfus S E. Applied Dynamic Programming. Princeton, USA: Princeton University Press, 1962.

二级参考文献52

  • 1Hewitt C. Viewing Control Ctructures as Patterns of Passing Messages. Artificial Intelligence, 1977,8(3) :323-364
  • 2Wooldridge M,Jennings N R. Agent Theories,Architectures,and Languages: a Survey. In: Wooldridge, Jennings, eds. Intelligent Agents,Berlin: Springer-Verlag, 1995. 1-22.
  • 3Wei β G. Learning to Coordinate Actions in Multi-Agent Systems Proceedings of IJCAI'93, 1993
  • 4Dworman,Garett,Kimbrough S,Laing J. Bargaining by Artificial Agents in Two Coalition Games: A Study in Genetic Programming for Electronic Commerce. In: Proc. of the AAAI Genetic Programming Conf. Stanford,CA,Aug. 1996
  • 5Kaelbling L P. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 1996,4: 237-285
  • 6Singh S. Agents and Reinforcement Learning. Miller freeman publish Inc,San Mateo,CA,USA,1997
  • 7Bellman R. Dynamic Programming. Prentice-Hall, Englewood Cliffs, NJ, 1957
  • 8Sutton R S. Learning to predict by the methods of temporal differences. Machine Learning, 1988,3: 9 - 44
  • 9Sutton R S. Convergence theory for a new kind of prediction learning. In:Proc. of the 1988 Workshop on Computational Learning Theory, 1988. 421-442
  • 10Watkins C J C H,Dayan P. Q-Learning. Machine Learning,8(3):279-292

共引文献300

同被引文献87

引证文献7

二级引证文献36

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部