期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
A sample selection mechanism for multi-UCAV air combat policy training using multi-agent reinforcement learning
1
作者 Zihui YAN Xiaolong LIANG +3 位作者 Yueqi HOU Aiwu YANG Jiaqiang ZHANG Ning WANG 《Chinese Journal of Aeronautics》 2025年第6期501-516,共16页
Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes... Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes a novel Dominant and Non-dominant strategy sample selection(DoNot)mechanism and a Local Observation Enhanced Multi-Agent Proximal Policy Optimization(LOE-MAPPO)algorithm to train the multi-UCAV air combat policy and improve its generalization.Specifically,the LOE-MAPPO algorithm adopts a mixed state that concatenates the global state and individual agent's local observation to enable efficient value function learning in multi-UCAV air combat.The DoNot mechanism classifies opponents into dominant or non-dominant strategy opponents,and samples from easier to more challenging opponents to form an adaptive training curriculum.Empirical results demonstrate that the proposed LOE-MAPPO algorithm outperforms baseline MARL algorithms in multi-UCAV air combat scenarios,and the DoNot mechanism leads to stronger policy generalization when facing diverse opponents.The results pave the way for the fast generation of cooperative strategies for air combat agents with MARLalgorithms. 展开更多
关键词 Unmanned combat aerial vehicle Air combat Sample selection Multi-agent reinforcement learning policyproximal optimization
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部