期刊文献+

一种基于模型的可分解贝叶斯在线强化学习 被引量:2

Model-Based Factored Bayesian Online Reinforcement Learning
在线阅读 下载PDF
导出
摘要 针对贝叶斯强化学习中参数个数巨大,收敛速度慢,无法实现在线学习的问题,提出一种基于模型的可分解贝叶斯强化学习方法.首先,将学习参数进行可分解表示,降低学习参数的个数;然后,根据先验知识和观察数据采用贝叶斯方法来学习,最优化探索和利用二者之间的平衡关系;最后,采用基于点的贝叶斯强化学习方法实现学习过程的快速收敛,从而达到在线学习的目的.仿真结果表明该算法能够满足实时系统性能的要求. Due to the enormous number of parameters and slow convergence which are the major obstacles for online learn- ing in model-based Bayesian reinforcement learning, the paper presents a model-based factored Bayesian reinforcement learning ap- proach. Firstly, factored representations are made to represent the dynamics with fewer parameters. Then, according to prior knowl- edge and observable data, this paper exploits model-based reinforcement learning to provide an elegant solution to the optimal explo- ration-exploitation tradeoff. Finally, a pointed-based Bayesian reinforcement learning approach is proposed to speed up the conver- gence to achieve online learning. The experimental results show that the proposed approach can approximate the underlying Bayesian reinforcement learning task well with guaranteed real-time performance.
出处 《电子学报》 EI CAS CSCD 北大核心 2014年第7期1429-1434,共6页 Acta Electronica Sinica
基金 国家自然科学基金(No.61074058 No.60874042) 深圳市自然科学基金(No.JCYJ20120617134831736)
关键词 马尔可夫决策过程 贝叶斯强化学习 动态贝叶斯网路 Markov decision processes Bayesian reinforcement learning dynamic Bayesian networks
  • 相关文献

参考文献18

  • 1徐昕,沈栋,高岩青,王凯.基于马氏决策过程模型的动态系统学习控制:研究前沿与展望[J].自动化学报,2012,38(5):673-687. 被引量:21
  • 2刘海涛,洪炳熔,朴松昊,王雪梅.不确定性环境下基于进化算法的强化学习[J].电子学报,2006,34(7):1356-1360. 被引量:12
  • 3刘全,李瑾,傅启明,崔志明,伏玉琛.一种最大集合期望损失的多目标Sarsa(λ)算法[J].电子学报,2013,41(8):1469-1473. 被引量:3
  • 4Ross S,Pineau J,Chaib-draa B,et al.A Bayesian approach for learning and planning in partially observable Markov decision processes[J].Journal of Machine Learning Research,2011,12(1):1729-1770.
  • 5高阳,胡景凯,王本年,王冬黎.基于CMAC网络强化学习的电梯群控调度[J].电子学报,2007,35(2):362-365. 被引量:13
  • 6Doshi-VelezF,Pineau J,Roy N.Reinforcement learning with limited reinforcement:Using Bayes risk for active learning in POMDPs[J].Artificial Intelligence,2012,187-188(1):115-132.
  • 7Poupart P,Vlassis N.Model-based Bayesian reinforcement learning in partially observable domains[A].Proceedings of the International Joint Conference on Autonomous Agents and Multi Agent Systems[C].New York:ACM Press,2008.1025-1032.
  • 8Ross S,Pineau J.Model-based Bayesian reinforcement learning in large structured domains[A].Proceedings of the 24th conference annual conference on uncertainty in artificial intelligence[C].Cambridge,MA:AUAI Press,2008.476-483.
  • 9Poupart P,Vlassis N,Hoey J,et al.An analytic solution to discrete Bayesian reinforcement learning[A].Proceedings of the 23rd international conference on Machine learning[C].New York:ACM Press,2006.697-704.
  • 10Duff M.Optimal learning:Computational procedures for Bayes-adaptive Markov decision processes[D].USA:University of Massassachusetts Amherst,2002.

二级参考文献78

共引文献72

同被引文献6

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部