摘要
在强化学习过程中,Agent访问1个状态动作转换对只能更新1项值函数,使得学习收敛速度极慢。本文提出了一种利用无环状态路径来加速强化学习收敛速度的方法。通过获得训练情节中每个状态到达目标状态的无环状态路径,使得Agent可以沿最短无环路径逆序地传播当前更新的值函数,实现了Agent访问1个状态动作转换对可以更新1批值函数,从而加快学习收敛速度。从实验对比结果看,该方法可显著地加速学习收敛,缩短学习时间。
In reinforcement learning, only one item value function can be refined when Agent visits one state-action transition, which makes the convergence of learning being very slow. An approach is proposed to speed up reinforcement learning convergence by using acyclic state trajectory. By discovering the acyclic state trajectory of each state to the goal state form training episodes, the value function Agent currently refined can be propagated back along the shortest acyclic state trajectory, which makes a batch of value functions can be refined when Agent visits one state-action transition. So the convergence of reinforcement learning is sped up. From the comparisons of experiment, this approach can significantly speed up learning convergence and shorten learning time.
出处
《微计算机信息》
2011年第12期151-154,共4页
Control & Automation
关键词
强化学习
值函数
加速收敛
训练情节
无环状态路径
reinforcement learning
value function
speeding up convergence
episode
acyclic state trajectory