摘要
斗地主是中国民间最流行的纸牌玩法之一,具有不完全信息博弈和随机博弈的特征。在斗地主博弈过程中,既有攻守双方的对抗又有同伴之间的合作,是目前最复杂的博弈类型之一。分析了斗地主的博弈过程,从参与者、历史集合、参与者函数、信息空间、自然概率分布函数和参与者的偏好等6个方面描述了斗地主的博弈模型。模型为研究斗地主计算机博弈的理论或程序算法提供了理论依据和参考。研究将强化学习的DDQN(double deep Q-network)算法应用于叫牌出牌策略,针对实战中动态队友匹配机制造成的稳定性缺陷,引入监督学习的决策树策略优化,通过实验证明,强化学习和监督学习协同合作显著提升了系统的实战性能。
DouDiZhu is one of the most popular card games in China with the characteristics of both incomplete information games and stochastic games.With one of the most complex types,it has both confrontation and cooperation between players.This paper analyzes the game process of DouDiZhu,and describes the DouDiZhu game model from six aspects:participants,historical sets,participant functions,information space,natural probability distribution functions,and participants’preferences.The model provides a theoretical basis and reference for studying the theory or program algorithm of DouDiZhu computer game.In the study,the DDQN(double deep Q-network)algorithm of reinforcement learning is applied to the call-out strategy,and the decision tree strategy optimization of supervised learning is introduced to address the stability defects caused by the dynamic teammate matching mechanism in a real combat.Experimental results show the synergy of reinforcement learning and supervised learning markedly improves the performance of the system.
作者
梅险
姜彦新
赵一峰
王建东
于逸潇
郑子龙
MEI Xian;JIANG Yanxin;ZHAO Yifeng;WANG Jiandong;YU Yixiao;ZHENG Zilong(School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China;School of Civil Engineering,Heilongjiang University,Harbin 150080,China)
出处
《重庆理工大学学报(自然科学)》
北大核心
2025年第8期134-139,共6页
Journal of Chongqing University of Technology:Natural Science
基金
黑龙江省规划办重点课题(GJB1422071)
黑龙江省教育厅创新创业专项(SJGY20210718)
黑龙江省大学生创新创业训练计划项目(202410214015X)。
关键词
斗地主
博弈模型
合作博弈
牺牲策略
DouDiZhu
game model
cooperative game
heap valuation strategy