期刊文献+

结合PPO和蒙特卡洛树搜索的斗地主博弈模型

The improved DouDiZhu game model combining PPO with Monte Carlo Tree Search
在线阅读 下载PDF
导出
摘要 斗地主是一种典型的非完备信息博弈,由于具有多人博弈、动作空间庞大、合作与竞争并存等决策需求,单一的蒙特卡洛树搜索在应用时存在效率低的问题。为提升蒙特卡洛树搜索的策略效果和搜索效率,提出一种基于近端策略优化(proximal policy optimization,PPO)算法结合蒙特卡洛树搜索的斗地主博弈模型。利用PPO算法学习斗地主中的牌局和策略信息,训练出可根据当前局面提供动作概率的策略模型,为蒙特卡洛树搜索的选择和模拟阶段提供策略指导。在选择阶段,通过PPO策略模型输出的动作概率优化策略选择公式,指导高质量动作节点的选择。在模拟阶段,PPO替代了随机模拟过程,使模拟更加符合策略,减少低效路径的探索。实验结果表明:结合PPO优化后的蒙特卡洛树搜索不仅提高了决策的效率,还提升了模型的胜率,表现出较强的斗地主博弈决策优势。 DouDiZhu is a typical imperfect information game,whose decision-making involves multiple players,the huge action space,and the coexistence of cooperation and competition,leading to low efficiency in a single Monte Carlo Tree Search(MCTS).To improve the strategy and the efficiency of search for MCTS,the model for DouDiZhu game is proposed based on the Proximal Policy Optimization(PPO)algorithm combined with MCTS.First,PPO algorithm is employed to learn the game and strategy information and train a strategy model that provides action probability according to the current situation,offering strategy guidance for the selection and simulation stage of MCTS.Then,the selection formula is adjusted by the action probability output of PPO strategy model to guide the selection of high-quality action nodes.Finally,PPO replaces the random simulation process,which makes the simulation more consistent with the strategy and reduces the exploration of inefficient paths.Results show MCTS combined with PPO,the optimized MCTS not only improves the efficiency of decision-making,but also markedly increases the probability of victory,demonstrating its superiority in the decision-making process of the game of DouDiZhu.
作者 王世鹏 王亚杰 吴燕燕 郭其龙 赵甜宇 WANG Shipeng;WANG Yajie;WU Yanyan;GUO Qilong;ZHAO Tianyu(School of Computer,Shenyang Aerospace University,Shenyang 110136,China;Engineering Training Center,Shenyang Aerospace University,Shenyang 110136,China)
出处 《重庆理工大学学报(自然科学)》 北大核心 2025年第8期126-133,共8页 Journal of Chongqing University of Technology:Natural Science
基金 中国科协科普能力提升项目(KXYJS2022092)。
关键词 PPO算法 蒙特卡洛树搜索 斗地主 非完备信息博弈 PPO MCTS DouDiZhu imperfect information
  • 相关文献

参考文献3

二级参考文献13

  • 1[1]Von NEUMANN J,MORGENSTERN O.Theory of games and economic behavior[M].Princeton:Princeton University Press,1944.
  • 2[2]SHANNON C E.Programming a computer for playing chess[J].Philosophical Magazine,1950,41:256-275.
  • 3[3]TURING A.Digital computers applied to games[C]//Faster than Thought.London,1953:286-295.
  • 4[4]FULLER S H,GASCHING J G,GILLOGLY J J.An analysis of the alpha-beta pruning algorithm[D].Pittsburg:Carnegie-Mellon University,1973.
  • 5[5]KNUTH D E,MOORE R N.An analysis of alpha-beta pruning[J].Artificial Intelligence,1975(6):293-326.
  • 6[6]KORF R.Iterative deepening:an optimal admissible tree search[J].Artificial Intelligence,1985,27(1):97-109.
  • 7[7]ELIZABETH P.Breakthrough of the year:human genetic vaviation[J].Science,2007,318(5858):1842-1849.
  • 8[9]潘丽娟.打扑克人脑险胜电脑[EB/OL].[2007-07-27].http://sports.sohu.com.
  • 9[17]摩尔根与果蝇[EB/OL].[2008-01-06].http://basic.shsmu.edu.cn/jpkc/Marx_philosophy/yxyzx/12.ppt.
  • 10[18]何黎.扑克牌里的博弈之道[EB/OL].[2008-01-06].http://bbs.mso.com.cn/viewthread.php?tid=645174.

共引文献50

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部