摘要
直接策略搜索强化学习算法的一个主要问题是只执行局部搜索,趋向于收敛到一些局部次优解,因此不能保证收敛到全局最优。文章提出的是一种直接政策搜索强化学习的全局搜索算法,不会陷入局部最优。实验结果表明了RLPF在策略空间探索的有效性,能够在策略空间直接进行全局搜索。
The main problem of searching reinforcement learning algorithm(RL) with direct strategy is that only local search is performed, which tends to converge to some local suboptimal solutions, so it can not guarantee convergence to the global optimum. This paper presents a global search algorithm (RLPF) searching reinforcement learning with direct policy(RLPF) that does not fall into local optima. Experimental results show the effectiveness of RLPF in policy space exploration, and can make global search directly in policy space.
作者
董春利
王莉
Dong Chunli Wang Li(Electronic Information Engineering College of Nanjing Communications Institute of Technology, Nanjing 211188, China)
出处
《江苏科技信息》
2017年第7期71-73,共3页
Jiangsu Science and Technology Information
基金
南京交通职业技术学院高层次人才科研基金项目
项目编号:No.440105001
关键词
强化学习
粒子滤波
局部搜索
全局搜索
reinforcement learning
particle filter
local search
global search