摘要
对增强学习中各种策略搜索算法进行了简单介绍,建立了策略梯度方法的理论框架,并且根据这个理论框架的指导,对一些现有的策略梯度算法进行了推广,讨论了近年来出现的提高策略梯度算法收敛速度的几种方法,对于非策略梯度搜索算法的最新进展进行了介绍,对进一步研究工作的方向进行了展望.
The direct policy search methods in reinforcement learning are described, and the theoretic framework of policy gradient methods is presented. According to this framework, some current policy gradient algorithms are generalized. The new methods of speeding up the policy gradient algorithms are discussed. The new non-policy gradient search methods are also described. Finally, some future directions of research work are also given.
出处
《智能系统学报》
2007年第1期16-24,共9页
CAAI Transactions on Intelligent Systems
基金
国家自然科学基金资助项目(60234030
60303012).
关键词
增强学习
策略搜索
策略梯度
reinforcement learning
policy search
policy Gradient