This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalki...This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalking(MUTS)algorithm is proposed.Firstly,a V-type probabilistic data extraction method is proposed for the first time to overcome shortcomings of the MADDPG algorithm.The advantages of the proposed method are twofold:1)it can reduce the amount of data and shorten training time;2)it can filter out more important data in the experience buffer for training.Secondly,in order to avoid the collisions of USVs during the stalking process,an action constraint method called Safe DDPG is introduced.Finally,the MUTS algorithm and some existing algorithms are compared in cooperative target stalking scenarios.In order to demonstrate the effectiveness of the proposed MUTS algorithm in stalking tasks,mission operating scenarios and reward functions are well designed in this paper.The proposed MUTS algorithm can help the multi-USV system avoid internal collisions during the mission execution.Moreover,compared with some existing algorithms,the newly proposed one can provide a higher convergence speed and a narrower convergence domain.展开更多
基金supported in part by the National Natural Science Foundation of China(61873335,61833011,62173164)the Project of Science and Technology Commission of Shanghai Municipality,China(20ZR1420200,21SQBS01600,22JC1401400,19510750300,21190780300)the Natural Science Foundation of Jiangsu Province of China(BK20201451)。
文摘This paper is concerned with the cooperative target stalking for a multi-unmanned surface vehicle(multi-USV)system.Based on the multi-agent deep deterministic policy gradient(MADDPG)algorithm,a multi-USV target stalking(MUTS)algorithm is proposed.Firstly,a V-type probabilistic data extraction method is proposed for the first time to overcome shortcomings of the MADDPG algorithm.The advantages of the proposed method are twofold:1)it can reduce the amount of data and shorten training time;2)it can filter out more important data in the experience buffer for training.Secondly,in order to avoid the collisions of USVs during the stalking process,an action constraint method called Safe DDPG is introduced.Finally,the MUTS algorithm and some existing algorithms are compared in cooperative target stalking scenarios.In order to demonstrate the effectiveness of the proposed MUTS algorithm in stalking tasks,mission operating scenarios and reward functions are well designed in this paper.The proposed MUTS algorithm can help the multi-USV system avoid internal collisions during the mission execution.Moreover,compared with some existing algorithms,the newly proposed one can provide a higher convergence speed and a narrower convergence domain.