2MURRAY R M,ASTROM K M,BODY S P,et al.Future directions in control in an information-rich world[J].IEEE Control Systems Magazine,2003,23 (2):20-23.
3WIERING M,OTTERLO M V.Reinforcement learning state-of-the-art[M].Berlin:Springer-Verlag,2012:3-42.
4SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning,1988,3(1):9-44.
5CHEN Xingguo,GAO Yang,WANG Ruili.Online selective kernel-based temporal difference learning[J].IEEE Transactions on Neural Networks and Learning Systems,2013,24(12):1944-1956.
6ZOU Bin,ZHANG Hai,XU Zongben.Learning from uniformly ergodic Markov chains[J].Journal of Complexity,2009,25(2):188-200.
7YU Huizhen,BERTSEKAS D P.Convergence results for some temporal difference methods based on least squares[J].IEEE Transactions on Automatic Control,2009,54(7):1515-1531.
9CHEN Chunlin,DONG Daoyi,LI Hanxiong.Fidelitybased probabilistic Q-learning for control of quantum systems[J].IEEE Transactions on Neural Networks and Learning Systems,2014,25(5):920-933.
10RUMMERY G,NIRANJAN M.On-line Q-learning using connectionist systems[D].Cambridge:University of Cambridge,1994.