期刊文献+

利用无环状态路径加速强化学习收敛

Speeding up reinforcement learning convergence with acyclic state trajectory
在线阅读 下载PDF
导出
摘要 在强化学习过程中,Agent访问1个状态动作转换对只能更新1项值函数,使得学习收敛速度极慢。本文提出了一种利用无环状态路径来加速强化学习收敛速度的方法。通过获得训练情节中每个状态到达目标状态的无环状态路径,使得Agent可以沿最短无环路径逆序地传播当前更新的值函数,实现了Agent访问1个状态动作转换对可以更新1批值函数,从而加快学习收敛速度。从实验对比结果看,该方法可显著地加速学习收敛,缩短学习时间。 In reinforcement learning, only one item value function can be refined when Agent visits one state-action transition, which makes the convergence of learning being very slow. An approach is proposed to speed up reinforcement learning convergence by using acyclic state trajectory. By discovering the acyclic state trajectory of each state to the goal state form training episodes, the value function Agent currently refined can be propagated back along the shortest acyclic state trajectory, which makes a batch of value functions can be refined when Agent visits one state-action transition. So the convergence of reinforcement learning is sped up. From the comparisons of experiment, this approach can significantly speed up learning convergence and shorten learning time.
作者 宋炯
出处 《微计算机信息》 2011年第12期151-154,共4页 Control & Automation
关键词 强化学习 值函数 加速收敛 训练情节 无环状态路径 reinforcement learning value function speeding up convergence episode acyclic state trajectory
  • 相关文献

参考文献8

  • 1T.M.Mitchell. Machine Learning. NewYork: McGraw-Hill, 1997.
  • 2L.P.Kaelbling. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 1996, Vol.4:237.
  • 3R.S.Sutton, A.G.Barto. Reinforcement Learning: An Introduction. London :MIT Press,1998.
  • 4S.Girgin, F.Polat. Improving reinforcement learning by using sequence trees. Machine Learning, 2010, Vol. 4:871.
  • 5T.G.Dietterich, Hierarchical Reinforcement Learning with the MAXQ Value Fuction decomposition. Journal od Artificial Intelligence Research,2000, Vol. 13:227.
  • 6T.Goto, N.Homma, M.Yoshizawa, A phased reinforcement learning algorithm for complex control problems. Artificial Life and Robotics, 2007, 11(2): 190.
  • 7Fernando Fernandez, Daniel Borrajo, Two steps reinforcement Learning, International Journal of Intelligent Systems, 2008, Vol. 23:213.
  • 8王晓勇.基于遗传算法和神经网络的故障诊断研究[J].微计算机信息,2011,27(2):219-220. 被引量:5

二级参考文献9

共引文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部