多代理模糊收益及策略学习
Fuzzy Reward and Policy Learning in Multi-Agent Systems
摘要
本文研究了基于模糊知识的多代理决策问题。通过建立代理决策目标的模糊知识,我们给出了基于模糊收益的多代理决策模型,并研究了基于梯度的代理策略学习算法。
The multi-agent decision based on fuzzy knowledge is discussed. The agent's fuzzy reward is proposed under the fuzzy knowledge of different decision goals, and a gradient learning algorithm is described to learn the agent's action policy under fuzzy reward.
出处
《计算机科学》
CSCD
北大核心
2005年第8期128-130,共3页
Computer Science
参考文献11
-
1Rao A S,Georgeff M P. Modeling rational agents with a BDI-architecture[A]. In: Proc. of 2nd Intl. conf. on principles of knowledge representation and reasoning, San Mateo CA, Morgan Kaufmann, 1991. 473~484.
-
2张平安 高春华 译.神经-模糊和软计算[M].西安交通大学出版社,2000.8-63.
-
3Song Q, Kandel A. A fuzzy approach to strategic games. IEEE Tran. on fuzzy systems, 1999,7(6) :634~642.
-
4Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artificial Intelligence, 2002,136: 215~250.
-
5Kearns S M, Mansouv Y. Nash covergence of gradient dynamics in general-sum games. In: proc. of the 17th conf. on uncertainty in artificial intelligence, 2000. 541~548.
-
6Banerjee B, Peng J. Convergent gradient ascent in general-sum games. In: Proc. of the 13th European Conf. on Machine Learning,Printed by LNCS, 2002. 1~9.
-
7Mictchill T M. Machine Learning. The McGraw-Hill Companies,Inc. 1997.
-
8Zhang Huaxiang, Huang Shangteng. Convergent Gradient Ascent with Momentum in General-Sum Games. Neurocomputing, 2004,61 : 449~454.
-
9Littman M L. Friend-or-foe Q-learning in general-sum games. In:18th ICML,Williams college, MA, 2001. 332~328.
-
10Hu J,Wellman M P. Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning research, 2003,1:1~30.
二级参考文献12
-
1Sutton R S,Barto A. Reinforcement Learning: An Introduction [M]. MIT Press, Cambridge, MA
-
2Singh S, Jaakkola T, Littman M L. Szpesvari C. Convergence results for single-step on policy reinforcement-learning algorithms. Machine Learning Journal ,2000,38(3): 287-308
-
3Watkins C J C H. Learning from Delayed Rewards: [Ph. D.thesis]. Cambridge, UK: Cambridge University,1989
-
4Littman M L. Markov games as a framework for multi-agent reinforcement learning. In: 11th ICML, New Brunswick, 1994.157-163
-
5Hu J, Wellman M P. Multiagent reinforcement learning: Theoretical framework and an algorithm. In: 15th ICML,p242-250
-
6Hu J, Wellman M P. Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning research,2003, (1):1-30
-
7Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems In: Proc. of the Fifteenth National Conf. on Artificial Intelligence, 1998
-
8Greenwald A,Hall K,Serrano R. Correlated-Q learning. In:NIPS Workshop on Multiagent Learning,2002Craig Boutilier Sequential optimality and coordination in multiagent systems. In: 16th Intl.Joint Conf. on Artificial Intelligence,Stockholm, 1999. 478~485
-
9Szepesvari C, Littman M L. A Unified Analysis of ValueFunction-Based reinforcement Learning algorithms [J]. Neural computation, 1999,11 (8): 2017-2060
-
10Suematsu N, Hayashi A. A Multiagent Reinforcement Learning Algorithm using Extended Optimal Response. In: Proc. of the First Intl. Joint Conf. on Autonomous Agents & Multiagent Systems ,Bologna, Italy, 2002. 370~ 377
-
1邵红梅,安凤仙.一类训练前馈神经网络的梯度算法及收敛性[J].中国石油大学学报(自然科学版),2010,34(4):176-178. 被引量:5
-
2秦童.基于CMAC的Q算法在机器人足球中的应用[J].电子测试,2012,23(4):76-80.
-
3徐杰.开放源码软件模型[J].Internet(共创软件),2002(12):71-72.
-
4李学锋,严殿启.自适应模糊逻辑系统的PID型梯度学习算法研究[J].航天控制,1997,15(4):30-33.
-
5李琴,周井泉,黄亮亮.基于队列长度和速率的拥塞控制神经网络方法[J].计算机技术与发展,2014,24(2):107-110. 被引量:1
-
6童莉,平西建,马金全,李磊.基于DFD参数变形模型的平面足迹轮廓提取[J].计算机辅助设计与图形学学报,2007,19(4):521-527.
-
7祁之力,白中英.概念层次中的多策略学习[J].中央民族大学学报(自然科学版),2000,9(1):80-83.
-
8阮宏一.一种存取控制管理权限的代理策略[J].湖北第二师范学院学报,2009,26(8):19-20.
-
9王建双,卢云达.基于二级倒立摆的改进算法的小波神经网络控制[J].科技资讯,2012,10(6):22-23.
-
10薛亚军,丁勇.基于相对熵函数准则的小波网络字符识别方法[J].计算机应用,2010,30(4):977-979.