期刊文献+

一种在线自适应控制马氏链的强化学习算法 被引量:3

An on-line Adaptive Control Markov Chains by Using Reinforcement Learning
原文传递
导出
摘要 讨论平均准则控制马氏链的强化学习算法.目的是寻找使得长期每阶段期望平均报酬最大的最优控制策略.由于事先未知状态转移矩阵及报酬向量,故必需使用自适应控制方法.通过引入称之为行动器和评判器的神经网络构造,使得学习单元在不断学习中,最终能发现最优策略.行动器的参数在学习中不断被修正,每一时刻的参数的值均对应着一个随机控制策略. An average reward reinforcement learning algorithm for control Markov chains is presented.The objective is to find an optimal policy which maximizes the expected average reward per time step over infinite horizon.The transition matrices and payoff structures are not known a priori;so adaptive control methods are necessary.A neural networks structure,called actor and critic,is provided for the agent.The parameters of the actor,which determine a stochastic control strategy,are updated at each time step using a simple learning scheme.The adaptive critic is used to estimate these parameters for finding the optimal policy.
出处 《云南大学学报(自然科学版)》 CAS CSCD 2000年第1期9-12,共4页 Journal of Yunnan University(Natural Sciences Edition)
关键词 强化学习 自适应评判 马氏链 控制问题 reinforcement learning Markov decision processes average reward adaptive critic R learning
  • 相关文献

参考文献1

  • 1Christopher J.C.H. Watkins,Peter Dayan. Technical Note: Q-Learning[J] 1992,Machine Learning(3-4):279~292

同被引文献17

  • 1侯忠生.非参数模型及其自适应控制理论[M].北京:科学出版社,1998..
  • 2陈年春(主编).农药生物测定技术.北京:农业大学出版社,1990.123~124.
  • 3慕立义(主编).植物化学保护试验方法.北京:中国农业出版社,1991.208~223.
  • 4DurbinR EddyS KroghA MitchisonG.生物序列分析,蛋白质和核酸的概率论模型[M].北京:清华大学出版社,2002..
  • 5NARENDRA K S, PARTHASARATHY K. Identification and control for dynamic systems using neural networks[J]. IEEE Trans, Neural Networks, 1990, 1 (1): 4-27.
  • 6TAN K K, LEE T H, HUANG S N, et al. Adaptive predictive control of a class of SISO nonlinear systems[J]. Dynamics and Control, 2001, 11(2): 151-174.
  • 7TAN K K. Adaptive predictive PI control of a class of SISO systems[C]. Proc of ACC, San Diego, California, 1999:3 848-3 852.
  • 8NARENDRA K S, PARTHASARATHY K. Gradient methods for the optimization of dynamical systems containing neural networks[J]. IEEE Trans, Neural Networks, 1991, 2(2) :252-262.
  • 9LI Xia-lin,PARIZEAU M,PLAMONDON R.Training hidden Markov models with multiple observation-a combinatorial method[J].IEEE Transaction on Pattern Analysis and Machine Intelligence,2000,22(4):371-377.
  • 10MARI J F,HATON J P,KRIOUILE A.Automatic word recognition based on second-order hidden Markov models[J].IEEE Transactions on speech and Audio Processing,1997,5 (1):22-25.

引证文献3

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部