期刊文献+

一种基于划分和集成思想的多智能体强化学习 被引量:1

An Multiagent Reinforcement Learning Based on Partition and Integration
在线阅读 下载PDF
导出
摘要 针对Q学习状态空间非常大,导致收敛速度非常慢的问题,利用智能体在不同样本上分类性能不同,提出了基于样本的学习误差对样本空间进行划分,充分发掘了样本和智能体的匹配关系.以带障碍物的格子世界作为仿真环境,表明该算法提高了在线学习性能. To counter for the problem of slowly convergence of Q leaning when comeing to large state-space, the paper puts forward an algorithm which divide the states space according to learning errors. The basic idea of our algorithm is to discover the matching relationship between agents and the sub-space of states space. The simulations in grids with blocks indicate that the algorithm performs better when comeing to on-line learning.
作者 王云 韩伟
出处 《南京师范大学学报(工程技术版)》 CAS 2008年第4期59-62,共4页 Journal of Nanjing Normal University(Engineering and Technology Edition)
基金 国家自然科学基金(70802025)资助项目
关键词 多智能体系统 强化学习 状态空间划分 multiagent system, reinforcement learning, state-space partition
  • 相关文献

参考文献2

二级参考文献19

  • 1韩伟,王云,王成道,白治江.基于公共知识的电子市场定价算法[J].计算机应用,2005,25(8):1833-1835. 被引量:2
  • 2韩伟,王云,陈优广.基于多智能体协商的电子市场原料配置模型[J].计算机应用,2006,26(12):3008-3011. 被引量:5
  • 3Littman M L,Cassandra A,Kaelbling L.Learning policies for partially observable environments:Scaling up[A].Proceedings of the Twelfth International Conference on Machine Learning[C].San Francisco,CA:Morgan Kaufmann Publishers,1995.362-370.
  • 4Pineau J,Gordon G,Thrun S.Point-based value iteration:an anytime algorithm for POMDPs[A].Proceedings of International Joint Conference on Artificial Intelligence[C].Acapulco,Mexico:AAAI,2003.1025-1032.
  • 5Guo M,et al.A new Q-learning algorithm based on the metropolis criterion[J].IEEE Trans.on Systems,Man and Cybernetics,2004,34(5):2140-2143.
  • 6Moscato P.Memetic Algorithms:A Short Introduction New Ideas in Optimization[M].London,UK:McGraw-Hill,1999
  • 7Vapnik V N.Statistical Learning Theory[M].New York:Wiley-Inter Science,1998.
  • 8Frank H F.Tuning of the structure and parameters of a neural network using an improved genetic algorithm[J].IEEE Trans.on Neural Network,2003,14 (1):79-88.
  • 9Burago D,et al.On the complexity of partially observed Markov decision processes[J].Theoretical Computer Science,1996,157(2):161-183.
  • 10Kaelbling L P,Littman M L,and Cassandra A R.Planning and acting in partially observable stochastic domains[J].Artificial Intelligence,1998,101:99-134.

共引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部