一种融合注意力机制的德扑计算机博弈决策模型

A Texas Hold’em poker computer game decision-making model integrating attention mechanism

下载PDF

导出

摘要德州扑克是一种状态空间庞大的非完美信息博弈游戏,传统的决策模型依赖于人类先验知识,为此,以深度强化学习框架为基础,在零先验知识情况下,构造了一种德州扑克AI,较好地克服了神经虚拟自博弈(neural fictitious self play)算法的泛化能力差,通过不断与其他AI进行对弈,不断更新神经网络参数、持续提升决策精度。为进一步加快模型收敛速度、提高博弈能力,再引入注意力机制,赋予对手历史下注、弃牌等博弈动作权重知识,帮助德州扑克AI更好地分析对手打牌风格。实验数据表明:在与其他AI的对战结果中,德州扑克AI在5000局对战中,击败了先验知识型AI和使用CFR算法的AI、使用NFSP算法的AI,证明本文中的德州扑克AI有效性、先进性较好。 Texas Hold’em is a non-perfect information game with large state space.Traditional empirical algorithms rely heavily on human’s priori knowledge.The NFSP(neural fictitious self play)algorithm is poor in generalization and slow in convergence.Based on the deep reinforcement learning framework,the DQN algorithm is employed to play against other intelligence without human’s priori knowledge,and the neural network parameters in the model are constantly updated to improve the decision-making accuracy of the model.And the attention mechanism is introduced to give more weight to the opponent’s history of betting,folding and other actions,thus helping the intelligent body to better analyze the opponent’s playing style.The experimental data show playing against intelligence with other algorithms,the model beats the knowledge-based intelligence with traditional empirical algorithms,intelligence using CFR algorithms,and intelligence using NFSP algorithms in 5000 games of matchmaking.Therefore,the model builds Texas Hold’em poker bits of intelligence with certain strengths.

作者张小川梁渝卓彭丽蓉钱毅刘莉莉 ZHANG Xiaochuan;LIANG Yuzhuo;PENG Lirong;QIAN Yi;LIU Lili(School of Artificial Intelligence,Chongqing University of Technology,Chongqing 401135,China;School of Artificial Intelligence and Big Data,Chongqing Industry Polytechnic College,Chongqing 401120,China;Institute of Artificial Intelligence System,Chongqing University of Technology,Chongqing 400054,China)

机构地区重庆理工大学两江人工智能学院重庆工业职业技术学院人工智能与大数据学院重庆理工大学人工智能系统研究所

出处《重庆理工大学学报(自然科学)》北大核心 2025年第8期85-89,共5页 Journal of Chongqing University of Technology:Natural Science

基金国家自然科学基金项目(60443004) 重庆市技术创新与应用发展专项项目(cstc2021jscx-dxwtBX0019)。

关键词非完美信息博弈德州扑克深度强化学习注意力机制 non-perfect information games Texas Hold’em poker deep reinforcement learning attention mechanisms

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献6

1徐心和,王骄.中国象棋计算机博弈关键技术分析[J].小型微型计算机系统,2006,27(6):961-969. 被引量：62
2刘溜,张小川,彭丽蓉,田震,万家强,任越.一种结合策略价值网络的五子棋自博弈方法研究[J].重庆理工大学学报（自然科学）,2022,36(12):129-135. 被引量：5
3李淑琴,冯浩东.牌型预测与蒙特卡洛模拟结合的麻将博弈策略[J].重庆理工大学学报（自然科学）,2022,36(12):148-154. 被引量：9
4赵煜霖,沈强望,李淑琴,孟坤.竞技二打一中玩家叫牌风格划分方法[J].重庆理工大学学报（自然科学）,2022,36(12):155-161. 被引量：2
5张小川,杨小漫,涂飞,王鑫,严明珠,梁渝卓.融合经验知识与深度强化学习的久棋Alpha-Beta算法优化研究[J].重庆理工大学学报（自然科学）,2024,38(5):115-120. 被引量：4
6胡振震,陈少飞,袁唯淋,李鹏,陈璟.基于粒子群优化的德州扑克在线对手利用[J].控制与决策,2024,39(5):1687-1696. 被引量：2

二级参考文献50

1Newborn M. Recent progress in computer chess[J]. Advances in Computer, 1978, 18:59-117.
2Don F Beal. A generalised quiescence search algorithm, department if computer science, queen mary college[J]. London University, Artificial Intelligence, 1990,43: 85-98.
3Ed Schroder. How rebel plays chess [EB/OL]. http://membets. home. nl/matador/chess840.htm, 2002.
4Chrilly Donninger, Null move and deep search:selective-search heuristics for obtuse chess programs[J]. ICCA Journal, 1993,16(3):137-143.
5Zobrist A. A new hashing method with application for game playing[R]. Computer Science Department, University of Wisconsin, Madison, 1970.
6Wu R, Beal D F. A memory efficient retrograde algorithm and its application to Chinese chess cndgames[J]. More Games of No Chance MSRI Publications, 2002,42 :207-228.
7Yen S J, Chen J C, Yang T N. Computer Chinese chess[J]. ICGA Journal, September 2005, 28(3):182-184.
8Chen S H. Design and implementation of a practical endgame database for Chinese chess[D]. Department of Computer Science and Information Engineering, National Taiwan University,Taiwan, 1998.
9Fang H R, Hsu T S, Hsu S C. Indefinite sequence of moves in Chinese chess endgames[C]. In:Proceedings of the Third International Conference on Computers and Games, 2002:264-279.
10ftp://ftp. cis. uab. edu/pub/hyatt/src

共引文献75

1桂义勇.一种国际跳棋的博弈系统研究[J].智能计算机与应用,2020(4):32-34. 被引量：2
2徐长明,南晓斐,王骄,徐心和.中国象棋机器博弈的时间自适应分配策略研究[J].智能系统学报,2006,1(2):39-43. 被引量：2
3王晓鹏,王骄,徐心和,郑新颖.中国象棋与国际象棋比较分析[J].重庆工学院学报,2007,21(1):71-76. 被引量：7
4徐心和,郑新颖.棋牌游戏与事件对策[J].控制与决策,2007,22(7):787-790. 被引量：15
5赵吉文,张志伟,谢芳,刘永斌,程蒲.基于SVM的仿人对弈机器人视觉图像处理[J].系统仿真学报,2007,19(18):4235-4238. 被引量：3
6高伟,郭瑾,张昊.基于JAVA技术的中国象棋游戏设计与实现[J].大连民族学院学报,2007,9(5):17-19. 被引量：1
7付强,陈焕文.中国象棋人机对弈的自学习方法研究[J].计算机技术与发展,2007,17(12):76-79. 被引量：3
8陆慧,夏正友.四国军棋游戏中搜索算法的实验与分析[J].江南大学学报（自然科学版）,2007,6(6):744-748. 被引量：1
9付强,陈焕文.基于RL算法的自学习博弈程序设计及实现[J].长沙理工大学学报（自然科学版）,2007,4(4):73-78. 被引量：1
10陈向勇,李春吉,李宁.机器博弈中韩国象棋与中国象棋的比较[J].重庆工学院学报（自然科学版）,2008,22(1):110-114.

1王志明,胡洋成,蔡彪,陈宣儒,李欣蕊.爱恩斯坦棋博弈的图神经网络算法研究[J].重庆理工大学学报(自然科学),2025,39(8):111-117.
2苏炯铭,罗俊仁,陈少飞.智能博弈决策策略求解新视角实证分析[J].系统仿真学报,2025,37(2):345-361. 被引量：2
3刘华咏,朱婷.基于GAN的语义对齐网络半监督跨模态哈希方法[J].计算机科学,2025,52(6):159-166.
4康琦,高峰,刘硕,王倩,叶子文.主力资金异象和投资者信息博弈[J].金融研究,2025(1):189-206. 被引量：1
5李霞丽,顾旌世,高乔,张皓扬,何非凡.藏族久棋计算机博弈研究综述[J].重庆理工大学学报(自然科学),2025,39(8):90-96.
6马俊民.漕运旗丁的社交网络与清代社会信息博弈[J].收藏,2025(3):31-33.
7王世鹏,王亚杰,吴燕燕,郭其龙,赵甜宇.结合PPO和蒙特卡洛树搜索的斗地主博弈模型[J].重庆理工大学学报(自然科学),2025,39(8):126-133.
8冯晓兰.物联网技术在汽车自动驾驶环境感知与决策中的应用[J].汽车测试报告,2025(12):34-36. 被引量：1
9张毅超.骨科患者中医药科普知识认知情况调查[J].中医药管理杂志,2025,33(13):51-53.
10郭良田,刘泉鑫.山东威海:智慧服务赋能工伤保险全流程升级[J].中国社会保障,2025(8):54-55.

重庆理工大学学报(自然科学)

2025年第8期

浏览历史

内容加载中请稍等...

一种融合注意力机制的德扑计算机博弈决策模型

参考文献6

二级参考文献50

共引文献75

相关作者

相关机构

相关主题

浏览历史