摘要
针对贝叶斯强化学习中参数个数巨大,收敛速度慢,无法实现在线学习的问题,提出一种基于模型的可分解贝叶斯强化学习方法.首先,将学习参数进行可分解表示,降低学习参数的个数;然后,根据先验知识和观察数据采用贝叶斯方法来学习,最优化探索和利用二者之间的平衡关系;最后,采用基于点的贝叶斯强化学习方法实现学习过程的快速收敛,从而达到在线学习的目的.仿真结果表明该算法能够满足实时系统性能的要求.
Due to the enormous number of parameters and slow convergence which are the major obstacles for online learn- ing in model-based Bayesian reinforcement learning, the paper presents a model-based factored Bayesian reinforcement learning ap- proach. Firstly, factored representations are made to represent the dynamics with fewer parameters. Then, according to prior knowl- edge and observable data, this paper exploits model-based reinforcement learning to provide an elegant solution to the optimal explo- ration-exploitation tradeoff. Finally, a pointed-based Bayesian reinforcement learning approach is proposed to speed up the conver- gence to achieve online learning. The experimental results show that the proposed approach can approximate the underlying Bayesian reinforcement learning task well with guaranteed real-time performance.
出处
《电子学报》
EI
CAS
CSCD
北大核心
2014年第7期1429-1434,共6页
Acta Electronica Sinica
基金
国家自然科学基金(No.61074058
No.60874042)
深圳市自然科学基金(No.JCYJ20120617134831736)
关键词
马尔可夫决策过程
贝叶斯强化学习
动态贝叶斯网路
Markov decision processes
Bayesian reinforcement learning
dynamic Bayesian networks