期刊文献+

进化强化学习及其在机器人路径跟踪中的应用 被引量:6

Evolutionary reinforcement learning and its application in robot path tracking
原文传递
导出
摘要 研究了一种基于自适应启发评价(AHC)强化学习的移动机器人路径跟踪控制方法.AHC的评价单元(ACE)采用多层前向神经网络来实现,将TD(λ)算法和梯度下降法相结合来更新神经网络的权值.AHC的动作选择单元(ASE)由遗传算法优化的模糊推理系统(FIS)构成.ACE网络的输出构成二次强化信号,用于指导ASE的学习.最后将所提出的算法应用于移动机器人的行为学习,较好地解决了机器人的复杂路径跟踪问题. The control policy of robot path-tracking based on adaptive heuristic ctritic(AHC) reinforcement learning is researched. The adaptive critic element(ACE)of AHC is composed of a multi-layer feedforward network. TD(2) algorithm and gradient descent algorithm are integrated, which is used to update the weights of network. The output of the ACE generates the secondary reinforcement signal which can direct the learning of the action select element (ASE). ASE can be implemented by the fuzzy inference system (FIS) which is optimized by using the genetic algorithms. Finally, the method is used for learning the robot behavior. The experiment shows that the scheme can effectively solve the problem of the robot path-tracking.
出处 《控制与决策》 EI CSCD 北大核心 2009年第4期532-536,541,共6页 Control and Decision
基金 国家自然科学基金项目(60475036)
关键词 强化学习 自适应启发评价 遗传算法 路径跟踪 Reinforcement learning, AHC Genetic algorithm Path tracking
  • 相关文献

参考文献3

二级参考文献25

  • 1姜勇,董再励,孙茂相.移动机器人数学模型近似线性化及反馈镇定[J].控制工程,2005,12(1):90-93. 被引量:7
  • 2石鸿雁,孙昌志.一种基于混沌优化算法的机器人路径规划方法[J].机器人,2005,27(2):152-157. 被引量:14
  • 3郝红伟.Matlab6实例教程[M].北京:中国电力出版社,2002:91-111.
  • 4Islam M M,Murase K.Chaotic dynamics of a behavior-based miniature mobile robot:effects of environment and control structure[J ].Neural Networks,2005(18):123-144.
  • 5Wolf A,Swift J B,Harry L,et al.Determining Lyapunov exponents from a time series[J].Physica D:Nonlinear Phenomena,1985,16(3):285-317.
  • 6Doyle J C,Glover K,Khargonekar P P,et al.State space solutions to standard H2 and H∞ control problems[J].IEEE Trans Auto Control,1989,34:831 -847.
  • 7Francis B A.A course in H∞ control theory[M].Berlin:Springer-Verlag,1987.
  • 8谢学书,钟宜生.H∞控制理论[M].北京:清华大学出版社,1994.
  • 9Kimura H.Chain-scattering representation,J-lossless factorization and H∞ control[J].Journal of Mathematical Systems,Estimation and Control,1995 (5):204 -255.
  • 10Tadmor G.Worst-case design in the time domain:the maximum principle and the standard H∞ problem[J].Signal Processing,1989,28:1190-1208.

共引文献17

同被引文献59

引证文献6

二级引证文献38

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部