摘要
无人机集群(unmanned aerial vehicle,UAV)在电子战领域通过协同作业,能够有效执行侦察、干扰、打击等多种任务。当前,应用于无人机集群决策的强化学习算法在神经网络层面的多智能体动作协同和历史信息利用方面尚显不足。针对此问题,在多智能体近端策略优化(multi-agent proximal policy optimization,MAPPO)的基础上,提出了耦合时间和空间信息的强化学习算法SAT(sequence-aware transformer)。该算法通过将“隐状态”作为Transformer的query输入,同时提出了一种适应该算法的数据结构,保证了智能体训练过程中轨迹存储数据的一致性和完整性,有效地引入了相邻状态之间的耦合信息至网络中,从而弥补了传统神经网络在多智能体动作协同以及历史信息利用方面的不足。实验结果表明,SAT算法在2次典型任务中,任务完成次数比R2D2算法分别多21.7%和33%。
In the field of electronic warfare,unmanned aerial vehicle(UAV)swarms can effectively execute diverse missions such as reconnaissance,jamming,and strikes through collaborative operations.However,existing reinforcement learning(RL)algorithms for UAV swarm decision-making still face challenges in coordinating multi-agent actions and leveraging historical information at the neural network level.To address these limitations,the Sequence-Aware Transformer(SAT)algorithm is proposed,based on an enhanced variant of the Multi-Agent Proximal Policy Optimization(MAPPO)framework.The SAT algorithm introduces a“hidden state”mechanism into the Transformer architecture,where the hidden state dynamically encodes temporal dependencies between consecutive states and serves as the query input for the attention mechanism.Furthermore,a tailored trajectory storage structure is designed to ensure data consistency and integrity during training,enabling effective integration of spatiotemporal coupling information into the network.Experimental results demonstrate that SAT significantly outperforms baseline algorithms(R2D2 and QMIX)in two typical electronic warfare scenarios,achieving 21.7%and 33%higher task completion rates,respectively.
作者
赵华栋
李姜
张展赫
高远
王烨
ZHAO Huadong;LI Jiang;ZHANG Zhanhe;GAO Yuan;WANG Ye(Changchun Institute of Optics,Fine Mechanics and Physics,Chinese Academy of Sciences,Changchun 130033,China;University of Chinese Academy of Sciences,Beijing 100049,China)
出处
《兵器装备工程学报》
北大核心
2025年第8期36-44,共9页
Journal of Ordnance Equipment Engineering