多代理模糊收益及策略学习

Fuzzy Reward and Policy Learning in Multi-Agent Systems

下载PDF

导出

摘要本文研究了基于模糊知识的多代理决策问题。通过建立代理决策目标的模糊知识,我们给出了基于模糊收益的多代理决策模型,并研究了基于梯度的代理策略学习算法。 The multi-agent decision based on fuzzy knowledge is discussed. The agent＇s fuzzy reward is proposed under the fuzzy knowledge of different decision goals, and a gradient learning algorithm is described to learn the agent＇s action policy under fuzzy reward.

作者张化祥黄上腾

机构地区山东师范大学信息管理学院上海交通大学计算机科学与工程系

出处《计算机科学》 CSCD 北大核心 2005年第8期128-130,共3页 Computer Science

关键词模糊集合对策梯度学习多代理收益模糊知识决策问题决策目标决策模型学习算法 Fuzzy set, Game, Gradient learning

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论] TH166 [机械工程—机械制造及自动化]

引文网络
相关文献

参考文献11

1Rao A S,Georgeff M P. Modeling rational agents with a BDI-architecture[A]. In: Proc. of 2nd Intl. conf. on principles of knowledge representation and reasoning, San Mateo CA, Morgan Kaufmann, 1991. 473～484.
2张平安高春华译.神经-模糊和软计算[M].西安交通大学出版社,2000.8-63.
3Song Q, Kandel A. A fuzzy approach to strategic games. IEEE Tran. on fuzzy systems, 1999,7(6) :634～642.
4Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002,136: 215～250.
5Kearns S M, Mansouv Y. Nash covergence of gradient dynamics in general-sum games. In: proc. of the 17th conf. on uncertainty in artificial intelligence, 2000. 541～548.
6Banerjee B, Peng J. Convergent gradient ascent in general-sum games. In: Proc. of the 13th European Conf. on Machine Learning,Printed by LNCS, 2002. 1～9.
7Mictchill T M. Machine Learning. The McGraw-Hill Companies,Inc. 1997.
8Zhang Huaxiang, Huang Shangteng. Convergent Gradient Ascent with Momentum in General-Sum Games. Neurocomputing, 2004,61 : 449～454.
9Littman M L. Friend-or-foe Q-learning in general-sum games. In:18th ICML,Williams college, MA, 2001. 332～328.
10Hu J,Wellman M P. Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning research, 2003,1:1～30.

二级参考文献12

1Sutton R S,Barto A. Reinforcement Learning: An Introduction [M]. MIT Press, Cambridge, MA
2Singh S, Jaakkola T, Littman M L. Szpesvari C. Convergence results for single-step on policy reinforcement-learning algorithms. Machine Learning Journal ,2000,38(3): 287-308
3Watkins C J C H. Learning from Delayed Rewards: [Ph. D.thesis]. Cambridge, UK: Cambridge University,1989
4Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: 11th ICML, New Brunswick, 1994.157-163
5Hu J, Wellman M P. Multiagent reinforcement learning: Theoretical framework and an algorithm. In: 15th ICML,p242-250
6Hu J, Wellman M P. Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning research,2003, (1):1-30
7Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems In: Proc. of the Fifteenth National Conf. on Artificial Intelligence, 1998
8Greenwald A,Hall K,Serrano R. Correlated-Q learning. In:NIPS Workshop on Multiagent Learning,2002Craig Boutilier Sequential optimality and coordination in multiagent systems. In: 16th Intl.Joint Conf. on Artificial Intelligence,Stockholm, 1999. 478～485
9Szepesvari C, Littman M L. A Unified Analysis of ValueFunction-Based reinforcement Learning algorithms [J]. Neural computation, 1999,11 (8): 2017-2060
10Suematsu N, Hayashi A. A Multiagent Reinforcement Learning Algorithm using Extended Optimal Response. In: Proc. of the First Intl. Joint Conf. on Autonomous Agents & Multiagent Systems ,Bologna, Italy, 2002. 370～ 377

共引文献1

1梁铁柱,李建成,王晔.一种应用聚类技术检测网络入侵的新方法[J].国防科技大学学报,2002,24(2):59-63. 被引量：12

1邵红梅,安凤仙.一类训练前馈神经网络的梯度算法及收敛性[J].中国石油大学学报（自然科学版）,2010,34(4):176-178. 被引量：5
2秦童.基于CMAC的Q算法在机器人足球中的应用[J].电子测试,2012,23(4):76-80.
3徐杰.开放源码软件模型[J].Internet（共创软件）,2002(12):71-72.
4李学锋,严殿启.自适应模糊逻辑系统的PID型梯度学习算法研究[J].航天控制,1997,15(4):30-33.
5李琴,周井泉,黄亮亮.基于队列长度和速率的拥塞控制神经网络方法[J].计算机技术与发展,2014,24(2):107-110. 被引量：1
6童莉,平西建,马金全,李磊.基于DFD参数变形模型的平面足迹轮廓提取[J].计算机辅助设计与图形学学报,2007,19(4):521-527.
7祁之力,白中英.概念层次中的多策略学习[J].中央民族大学学报（自然科学版）,2000,9(1):80-83.
8阮宏一.一种存取控制管理权限的代理策略[J].湖北第二师范学院学报,2009,26(8):19-20.
9王建双,卢云达.基于二级倒立摆的改进算法的小波神经网络控制[J].科技资讯,2012,10(6):22-23.
10薛亚军,丁勇.基于相对熵函数准则的小波网络字符识别方法[J].计算机应用,2010,30(4):977-979.

计算机科学

2005年第8期

浏览历史

内容加载中请稍等...

多代理模糊收益及策略学习

参考文献11

二级参考文献12

共引文献1

相关作者

相关机构

相关主题

浏览历史