一种交互式动态影响图的改进算法被引量：1

An Improved Algorithm for Interactive Dynamic Influence Diagrams

导出

摘要交互式动态影响图(I-DIDs)是基于概率图形理论的多智能体动态交互决策的图模型.为缓解该模型状态空间随时间片增加呈指数级增长的趋势,文中基于行为等价的基本思想压缩状态空间,提出构建Epsilon行为等价类的方法:利用有向无环图表示其它Agent可能的信度和行为,把信度在空间上接近的模型聚为一类,实现自顶向下合并行为等价模型.该过程避免求解状态空间中的所有候选模型,节省了存储空间和计算时间.模型实例上的仿真结果显示了该算法的有效性. Interactive Dynamic Influence Diagrams （I-DIDs）, as graphic models based on probabilistic graphical theory, are proposed to represent, the sequential decision-making problem over multiple time steps in the presence of other interacting agents. The algorithms for solving I-DIDs are haunted by the challenge of an exponentially growing space of candidate models ascribed to other agents over time. In this paper, in order to reduce the candidate model space according the behaviorally equivalent theory, a more efficient way to construct Epsilon behavior equivalence classes is discussed that using belief-behavior graph （BBG）. A method of solving I-DIDs approximately is presented, which avoids solving all candidate models by clustering models with beliefs that are spatially close and selecting a representative one from each cluster. The simulation results show the validity of the improved algorithm.

作者李波罗键尹华一田乐

机构地区厦门大学信息科学与技术学院

出处《模式识别与人工智能》 EI CSCD 北大核心 2011年第4期506-513,共8页 Pattern Recognition and Artificial Intelligence

基金国家自然科学基金资助项目(60975052)

关键词 AGENT建模交互式动态影响图动态决策 ε-行为等价信度-行为图 Agent Modeling, Interactive Dynamic Influence Diagrams （ I-DIDs）, Dynamic DecisionMaking, ε-Behavioral Equivalence, Belief-Behavior Graph （BBG）

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1Tatman J A, Shachter R D. Dynamic Programming and Influence Diagrams. IEEE Trans on Systems, Man and Cybernetics, 1990, 20 : 365 - 379.
2姚宏亮,王浩,张佑生,俞奎.多Agent动态影响图及其概率分布的近似方法[J].模式识别与人工智能,2007,20(4):525-532. 被引量：2
3姚宏亮,王浩,汪荣贵,李俊照.多Agent动态影响图的近似计算方法[J].计算机研究与发展,2008,45(3):487-495. 被引量：4
4Gmytrasiewicz P J, Doshi P. A Framework for Sequential Planning in Multi-Agent Settings. Journal of Artificial Intelligence Research, 2005, 24 ( 1 ) : 49 - 79.
5Doshi P, Zeng Y F, Chen Q Y. Graphical Models for Interactive POMDPs: Representation and Solutions. Journal of Autonomous Agents and Multi-Agent Systems, 2009, 18 (3) : 376 -416.
6Polich K, Gmytrasiewicz P J. Interactive Dynamic Influence Diagrams// Proc of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, New York, USA: ACM Press, 2007 : 147 - 149.
7Zeng Y F, Doshi P, Chen Q Y. Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering// Proc of the 22nd International Conference on Association for the Advancement of Artificial Intelligence. Vancouver, Canada: AAAI Press, 2007 : 782 - 787.
8Zeng Y F, Doshi P. Speeding up Exact Solutions of Interactive Dynamic Influence Diagrams Using Action Equivalence// Proc of the 21st International Joint Conference on Artificial Intelligence. Pasadena, USA, 2009:1996-2001.
9Doshi P, Zeng Y F. Improved Approximation of Interactive Dynamic Influence Diagrams Using Discriminative Model Updates// Proc of the 8th International Conference on Autonomous Agents and Multi Agent Systems. Budapest, Hungray, 2009 : 907 -914.
10Smallwood R D, Sondik E J. The Optimal Control of Partially Observable Markov Decision Processes over a Finite Horizon. Operations Research, 1973, 21(5) : 1071 - 1088.

二级参考文献33

1王红卫,李琛,刘会新.马尔可夫决策过程复杂性的熵测度[J].控制与决策,2004,19(9):983-987. 被引量：10
2吴志勇,蔡莲红.基于动态贝叶斯网络的音视频双模态说话人识别[J].计算机研究与发展,2006,43(3):470-475. 被引量：11
3Oliver N M, Rosario B, Pentland A P. A Bayesian Computer Vision System for Modeling Human Interactions. IEEE Trans on Pattern Analysis and Machine Intelligence, 2000, 22 (8): 831-843
4Boutilier C, Poole D. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations // Proc of the 13th National Conference on Artificial Intelligence. Portland, USA, 1996:1168-1175
5Barto A G, Mahadevan S. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems, 2003, 13(1/2): 41-77
6Dagum P, Luby M. Approximating Probabilistic Inference Using Bayesian Networks Is NP-Hard. Artificial Intelligence, 1993, 60(1): 141-153
7Howard R A, Matheson J E. Influence Diagrams. Readings on the Principles and Applications of Decision Analysis, 1984, 11 (2) : 719-762
8Koller D, Milch B. Multi-Agent Influence Diagrams for Representing and Solving Games. Games and Economic Behavior, 2003, 45(1): 181-221
9Gal Y, Pfeffer A. A Language for Modeling Agents Decision Making Processes in Games // Proc of the 2nd International Joint Conference on Autonomous Agents and Multiagent Sys terns. Melbourne, Australia, 2003: 265-272
10Boyen X, Kollen D. Tractable Inference for Complex Stochastic Processes // Proc of the 14th Annual Conference on Uncertainty in Artificial Intelligence. Madison, USA, 1998:33-42

共引文献4

1罗键,李波,潘颖慧,尹华一,吴长庆.基于多Agent的交互式动态影响图研究、应用与展望[J].厦门大学学报（自然科学版）,2011,50(2):253-260. 被引量：1
2姚宏亮,王秀芳,胡大伟,王浩,茆美琴.多Agent动态影响图的一种混合近似推理算法[J].计算机研究与发展,2011,48(4):584-591. 被引量：2
3李波,曹浪财,庄进发.交互式动态影响图及其精确求解算法[J].解放军理工大学学报（自然科学版）,2011,12(2):119-124. 被引量：1
4李波,罗键,庄进发,尹华一.交互式动态影响图的一种近似求解算法[J].华中科技大学学报（自然科学版）,2011,39(10):64-68. 被引量：3

同被引文献34

1邓有朋,范佳宣,郑岩,王振亚,吕勇梁,李雨霄.不完全信息下多智能体对手建模[J].航空学报,2023,44(S02):443-452. 被引量：5
2薛方正,方帅,徐心和.多机器人对抗系统仿真中的对手建模[J].系统仿真学报,2005,17(9):2138-2141. 被引量：8
3王磊,孙增圻.基于行为的多机器人对手意图识别二次估计方法[J].清华大学学报（自然科学版）,2005,45(10):1421-1424. 被引量：7
4黄新宇,向中凡.基于对手的足球机器人策略研究[J].西华大学学报（自然科学版）,2006,25(2):37-38. 被引量：2
5张成虎,岳鑫,乐晖.基于聚类方法的客户交易行为模式识别[J].计算机工程与应用,2007,43(10):195-198. 被引量：5
6陆俊,王崇骏,王珺,陈世福.基于对手思维建模的分布式入侵检测模型[J].计算机应用研究,2007,24(5):115-118. 被引量：2
7王蓁蓁,邢汉承,张志政,倪庆剑.模拟人类发散思维的测度值马尔可夫理论模型[J].南京大学学报（自然科学版）,2008,44(2):148-156. 被引量：2
8钱堃,马旭东,戴先中,房芳.预测行人运动的服务机器人POMDP导航[J].机器人,2010,32(1):18-24. 被引量：6
9李岩,曹琳,孙雷,刘景泰.竞争型网络机器人体系结构研究[J].机器人,2013,35(4):462-469. 被引量：5
10罗键,武鹤,曹浪财.多智能体对手建模及其真实模型的确定[J].华中科技大学学报（自然科学版）,2015,43(10):48-52. 被引量：2

引证文献1

1程恺,张金鹏,邵天浩,邹世辰,于本川.智能博弈领域中的对手建模方法综述[J].计算机技术与发展,2025,35(9):1-8.

1田乐,罗键,曹浪财.多Agent交互动态影响图的近似行为等价算法[J].华中科技大学学报（自然科学版）,2014,42(4):60-63. 被引量：2
2李波,罗键,庄进发,尹华一.交互式动态影响图的一种近似求解算法[J].华中科技大学学报（自然科学版）,2011,39(10):64-68. 被引量：3
3刘石坚,乐晓波,邹峥.关于Petri网系统S-补相关定理的补充证明及其分析[J].系统仿真学报,2008,20(S2):1-5. 被引量：1
4李波,曹浪财,庄进发.交互式动态影响图及其精确求解算法[J].解放军理工大学学报（自然科学版）,2011,12(2):119-124. 被引量：1
5罗键,李波,潘颖慧,尹华一,吴长庆.基于多Agent的交互式动态影响图研究、应用与展望[J].厦门大学学报（自然科学版）,2011,50(2):253-260. 被引量：1
6田乐,罗键,曹浪财,陈志平.基于KL距离的交互式动态影响图近似算法[J].系统工程与电子技术,2013,35(1):207-211. 被引量：2
7田乐,曹浪财.基于lookahead的交互式动态影响图的DMU改进算法[J].系统工程与电子技术,2014,36(6):1201-1206.
8王丽丽,方贤文,张苗苗.子网行为等价的特殊网系统的同步距离[J].安徽理工大学学报（自然科学版）,2014,34(1):19-23.
9梁志荣.基于行为等价的远程程序执行认证[J].智能计算机与应用,2013,3(2):77-79.
10罗键,武鹤.基于交互式动态影响图的对手建模[J].控制与决策,2016,31(4):635-639. 被引量：5

模式识别与人工智能

2011年第4期

浏览历史

内容加载中请稍等...

一种交互式动态影响图的改进算法被引量：1

参考文献11

二级参考文献33

共引文献4

同被引文献34

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种交互式动态影响图的改进算法 被引量：1

参考文献11

二级参考文献33

共引文献4

同被引文献34

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种交互式动态影响图的改进算法被引量：1