期刊文献+

基于大语言模型的兵棋推演智能决策技术 被引量:1

Decision Technology Based on Large Language Model for Wargame
在线阅读 下载PDF
导出
摘要 兵棋推演通过控制棋子的行为来模拟真实的对抗场景,在智能决策领域具有重大研究意义.已有的研究大多聚焦于知识驱动的规则型智能体或数据驱动的学习型智能体.尽管这些方法在小规模兵棋推演上取得一定的进展,但是由于知识规则的高获取代价、弱泛化性,以及学习算法的低稳定性、学习过程的高算力需求,导致已有方法难以在更加贴近真实场景的大规模兵棋推演环境中灵活应用.为缓解上述问题,提出基于大语言模型的大规模多智能体分层任务规划框架,该框架利用大语言模型分别进行组队层次的粗粒度任务规划和个体层次的细粒度任务分解,围绕“规划−交流−记忆−反思”实现策略生成.相较于之前的工作,该方法能有效缓解泛化性的难题,同时在维持智能体一定的自我增强能力的情况下避免对智能体参数的高成本训练.实验表明,该模型能以较高胜率击败高水平AI,且具备自我增强能力、泛化能力以及可解释能力,在大规模对抗环境中具有显著优势. Wargame simulates real confrontations by controlling the behavior of agents,which has important research significance in the field of intelligent decision-making.Most existing research has focused on knowledge-driven rule-based agents or data-driven learning agents.Although these methods have made some progress in smallscale wargame,the high acquisition cost and weak generalization of knowledge rules,as well as the low stability of learning algorithms and the high computational requirements of the learning process,make it difficult to be flexibly applied in large-scale wargame that are closer to real scenarios.In order to alleviate the above problems,a largescale multi-agent hierarchical task planning framework based on large language model is proposed,which uses large language model to perfom coarse-grained task planning at the team level and fine-grained task decomposition at the individual level,which focuses on strategy generation through planning,communication,memory,and reflection.Compared to previous works,the proposed method alleviates the problem of generalization effectively and can maintain a certain degree of self-improvement ability while avoiding high cost training of agent parameters.Experiment shows that our model can defeat elite AI with a high winning rate.Furthermore,our model also has self-improve ability,generalization ability,and interpretability ability,which has significant advantages in large-scale adversarial environment.
作者 王彤 赵美静 徐沛 尹奇跃 焦建彬 黄凯奇 WANG Tong;ZHAO Mei-Jing;XU Pei;YIN Qi-Yue;JIAO Jian-Bin;HUANG Kai-Qi(University of Chinese Academy of Sciences,Beijing 100049;Key Laboratory of Cognition and Decision Intelligence for Complex Systems,Institute of Automation,Chinese Academy of Sciences,Beijing 100190;Center for Excellence in Brain Science and Intelligence Technology,Chinese Academy of Sciences,Shanghai 200031)
出处 《自动化学报》 北大核心 2025年第6期1205-1217,共13页 Acta Automatica Sinica
基金 中国科学院战略性先导科技专项基金(XDA27010103) 国家资助博士后研究人员计划(GZC20232995) 中国博士后科学基金(2024M763533)资助。
关键词 兵棋推演 策略生成 大语言模型 分层任务规划 Wargame policy generation large language model hierarchical task planning
  • 相关文献

参考文献8

二级参考文献73

共引文献134

同被引文献10

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部