In this paper,a deep deterministic policy gradient algorithm based on Partially Observable Weighted Mean Field Reinforcement Learning(PO-WMFRL)framework is designed to solve the problem of path planning in large-scale...In this paper,a deep deterministic policy gradient algorithm based on Partially Observable Weighted Mean Field Reinforcement Learning(PO-WMFRL)framework is designed to solve the problem of path planning in large-scale Unmanned Aerial Vehicle(UAV)swarm operations.We establish a motion control and detection communication model of UAVs.A simulation environment is carried out with No-Fly Zone(NFZ),the task assembly point is established,and the long-term reward and immediate reward functions are designed for large-scale UAV swarm path planning problem.Considering the combat characteristics of large-scale UAV swarm,we improve the traditional Deep Deterministic Policy Gradient(DDPG)algorithm and propose a Partially Observable Weighted Mean Field Deep Deterministic Policy Gradient(PO-WMFDDPG)algorithm.The effectiveness of the PO-WMFDDPG algorithm is verified through simulation,and through the comparative analysis with the DDPG and MFDDPG algorithms,it is verified that the PO-WMFDDPG algorithm has a higher task success rate and convergence speed.展开更多
基金supported by the Aeronautical Science Foundation of China(20220013053005)。
文摘In this paper,a deep deterministic policy gradient algorithm based on Partially Observable Weighted Mean Field Reinforcement Learning(PO-WMFRL)framework is designed to solve the problem of path planning in large-scale Unmanned Aerial Vehicle(UAV)swarm operations.We establish a motion control and detection communication model of UAVs.A simulation environment is carried out with No-Fly Zone(NFZ),the task assembly point is established,and the long-term reward and immediate reward functions are designed for large-scale UAV swarm path planning problem.Considering the combat characteristics of large-scale UAV swarm,we improve the traditional Deep Deterministic Policy Gradient(DDPG)algorithm and propose a Partially Observable Weighted Mean Field Deep Deterministic Policy Gradient(PO-WMFDDPG)algorithm.The effectiveness of the PO-WMFDDPG algorithm is verified through simulation,and through the comparative analysis with the DDPG and MFDDPG algorithms,it is verified that the PO-WMFDDPG algorithm has a higher task success rate and convergence speed.