期刊文献+

多代理模糊收益及策略学习

Fuzzy Reward and Policy Learning in Multi-Agent Systems
在线阅读 下载PDF
导出
摘要 本文研究了基于模糊知识的多代理决策问题。通过建立代理决策目标的模糊知识,我们给出了基于模糊收益的多代理决策模型,并研究了基于梯度的代理策略学习算法。 The multi-agent decision based on fuzzy knowledge is discussed. The agent's fuzzy reward is proposed under the fuzzy knowledge of different decision goals, and a gradient learning algorithm is described to learn the agent's action policy under fuzzy reward.
出处 《计算机科学》 CSCD 北大核心 2005年第8期128-130,共3页 Computer Science
  • 相关文献

参考文献11

  • 1Rao A S,Georgeff M P. Modeling rational agents with a BDI-architecture[A]. In: Proc. of 2nd Intl. conf. on principles of knowledge representation and reasoning, San Mateo CA, Morgan Kaufmann, 1991. 473~484.
  • 2张平安 高春华 译.神经-模糊和软计算[M].西安交通大学出版社,2000.8-63.
  • 3Song Q, Kandel A. A fuzzy approach to strategic games. IEEE Tran. on fuzzy systems, 1999,7(6) :634~642.
  • 4Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002,136: 215~250.
  • 5Kearns S M, Mansouv Y. Nash covergence of gradient dynamics in general-sum games. In: proc. of the 17th conf. on uncertainty in artificial intelligence, 2000. 541~548.
  • 6Banerjee B, Peng J. Convergent gradient ascent in general-sum games. In: Proc. of the 13th European Conf. on Machine Learning,Printed by LNCS, 2002. 1~9.
  • 7Mictchill T M. Machine Learning. The McGraw-Hill Companies,Inc. 1997.
  • 8Zhang Huaxiang, Huang Shangteng. Convergent Gradient Ascent with Momentum in General-Sum Games. Neurocomputing, 2004,61 : 449~454.
  • 9Littman M L. Friend-or-foe Q-learning in general-sum games. In:18th ICML,Williams college, MA, 2001. 332~328.
  • 10Hu J,Wellman M P. Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning research, 2003,1:1~30.

二级参考文献12

  • 1Sutton R S,Barto A. Reinforcement Learning: An Introduction [M]. MIT Press, Cambridge, MA
  • 2Singh S, Jaakkola T, Littman M L. Szpesvari C. Convergence results for single-step on policy reinforcement-learning algorithms. Machine Learning Journal ,2000,38(3): 287-308
  • 3Watkins C J C H. Learning from Delayed Rewards: [Ph. D.thesis]. Cambridge, UK: Cambridge University,1989
  • 4Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: 11th ICML, New Brunswick, 1994.157-163
  • 5Hu J, Wellman M P. Multiagent reinforcement learning: Theoretical framework and an algorithm. In: 15th ICML,p242-250
  • 6Hu J, Wellman M P. Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning research,2003, (1):1-30
  • 7Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems In: Proc. of the Fifteenth National Conf. on Artificial Intelligence, 1998
  • 8Greenwald A,Hall K,Serrano R. Correlated-Q learning. In:NIPS Workshop on Multiagent Learning,2002Craig Boutilier Sequential optimality and coordination in multiagent systems. In: 16th Intl.Joint Conf. on Artificial Intelligence,Stockholm, 1999. 478~485
  • 9Szepesvari C, Littman M L. A Unified Analysis of ValueFunction-Based reinforcement Learning algorithms [J]. Neural computation, 1999,11 (8): 2017-2060
  • 10Suematsu N, Hayashi A. A Multiagent Reinforcement Learning Algorithm using Extended Optimal Response. In: Proc. of the First Intl. Joint Conf. on Autonomous Agents & Multiagent Systems ,Bologna, Italy, 2002. 370~ 377

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部