摘要
为了进一步提升布局的质量以提升藏族久棋博弈智能体程序棋力,提出了藏族久棋的一种两阶段计算机博弈算法,为藏族久棋的布局阶段设计了基于卷积神经网络和蒙特卡洛树搜索的自对弈算法,通过卷积神经网络指导蒙特卡洛树进行搜索,训练出最优模型并生成质量更高的着法;为战斗阶段设计了基于领域知识的Alpha-Beta剪枝算法。通过设计分阶段算法的方式将深度强化学习与领域知识相结合,试图解决藏族久棋博弈算法研究棋谱数据匮乏、博弈智能体的棋力水平较低等问题。实验结果表明:基于两阶段算法的博弈智能体程序与全局使用Alpha-Beta剪枝算法的博弈程序、人类一段棋手进行对弈,分别取得了65%、60%的胜率。基于两阶段算法的博弈智能体程序在一定程度上具备了“学习”和“思考”的能力,棋力得到了提升。
Tibetan Jiu Chess is a unique board game with huge state space and complex action space.It is divided into two sequential stages:preparation and battle.The layout of chess pieces during the preparation stage strongly influences the outcome of the game.In order to further improve the quality of the layout and realize the improvement of chess level of the game agent program of Tibetan Jiu Chess,this paper proposes a two-stage computer game algorithm.A self-play algorithm based on Convolutional Neural Network(CNN)and Monte Carlo Tree Search(MCTS)is designed for the preparation stage.With the guidance of CNN to MCTS,the optimal model is established and higher quality moves are generated.An Alpha-Beta pruning algorithm based on domain knowledge is designed for the battle stage.By designing a staged algorithm,deep reinforcement learning and domain knowledge are combined,trying to solve the problems of a lack of chess manual data and a low chess level of game agents.Experiment results show that,the game program,which is based on the two-stage algorithm and globally uses the Alpha-Beta pruning algorithm,and the primary human chess player have achieved 65%and 60%winning rates respectively.To sum up,the game program based on the two-stage algorithm has the ability of“learning”and“thinking”to a certain extent,and has an improved chess level.
作者
李霞丽
陈彦东
杨子熠
张焱垠
吴立成
LI Xiali;CHEN Yandong;YANG Ziyi;ZHANG Yanyin;WU Licheng(School of Information Engineering,Minzu University of China,Beijing 100081,China)
出处
《重庆理工大学学报(自然科学)》
CAS
北大核心
2022年第12期110-120,共11页
Journal of Chongqing University of Technology:Natural Science
基金
国家自然科学基金项目(61873291、61773416、62276285)。