摘要
德州扑克是一种状态空间庞大的非完美信息博弈游戏,传统的决策模型依赖于人类先验知识,为此,以深度强化学习框架为基础,在零先验知识情况下,构造了一种德州扑克AI,较好地克服了神经虚拟自博弈(neural fictitious self play)算法的泛化能力差,通过不断与其他AI进行对弈,不断更新神经网络参数、持续提升决策精度。为进一步加快模型收敛速度、提高博弈能力,再引入注意力机制,赋予对手历史下注、弃牌等博弈动作权重知识,帮助德州扑克AI更好地分析对手打牌风格。实验数据表明:在与其他AI的对战结果中,德州扑克AI在5000局对战中,击败了先验知识型AI和使用CFR算法的AI、使用NFSP算法的AI,证明本文中的德州扑克AI有效性、先进性较好。
Texas Hold’em is a non-perfect information game with large state space.Traditional empirical algorithms rely heavily on human’s priori knowledge.The NFSP(neural fictitious self play)algorithm is poor in generalization and slow in convergence.Based on the deep reinforcement learning framework,the DQN algorithm is employed to play against other intelligence without human’s priori knowledge,and the neural network parameters in the model are constantly updated to improve the decision-making accuracy of the model.And the attention mechanism is introduced to give more weight to the opponent’s history of betting,folding and other actions,thus helping the intelligent body to better analyze the opponent’s playing style.The experimental data show playing against intelligence with other algorithms,the model beats the knowledge-based intelligence with traditional empirical algorithms,intelligence using CFR algorithms,and intelligence using NFSP algorithms in 5000 games of matchmaking.Therefore,the model builds Texas Hold’em poker bits of intelligence with certain strengths.
作者
张小川
梁渝卓
彭丽蓉
钱毅
刘莉莉
ZHANG Xiaochuan;LIANG Yuzhuo;PENG Lirong;QIAN Yi;LIU Lili(School of Artificial Intelligence,Chongqing University of Technology,Chongqing 401135,China;School of Artificial Intelligence and Big Data,Chongqing Industry Polytechnic College,Chongqing 401120,China;Institute of Artificial Intelligence System,Chongqing University of Technology,Chongqing 400054,China)
出处
《重庆理工大学学报(自然科学)》
北大核心
2025年第8期85-89,共5页
Journal of Chongqing University of Technology:Natural Science
基金
国家自然科学基金项目(60443004)
重庆市技术创新与应用发展专项项目(cstc2021jscx-dxwtBX0019)。
关键词
非完美信息博弈
德州扑克
深度强化学习
注意力机制
non-perfect information games
Texas Hold’em poker
deep reinforcement learning
attention mechanisms