A group optimal penetration strategy in complex attack and defense confrontation situation is proposed in this paper to solve the coordinated penetration decision-making problem of endo-atmospheric gliding simultaneou...A group optimal penetration strategy in complex attack and defense confrontation situation is proposed in this paper to solve the coordinated penetration decision-making problem of endo-atmospheric gliding simultaneous multi-missile penetration of interceptors.First,the problem of large search space of multi-missile coordinated penetration maneuvers is fully considered,and the flight corridor of multi-missile coordinated penetration is designed to constrain search space of multi-agent coordinated strategy,comprehensively considering path constraints and anticollision constraints of gliding multi-missile flight.Then,a multi-missile hierarchical coordinated decision-making mechanism based on confrontation situation is proposed,and the swarm penetration strategy is optimized with the goal of maximizing swarm penetration effectiveness.The upper layer plans the swarm penetration formation according to confrontation situation,and generates the swarm coordinated penetration trajectory based on Multi-Agent Deep Deterministic Policy Gradient(MADDPG)method.The lower layer interpolates and smooths penetration trajectory,and generates the penetration guidance command based on Soft Actor-Critic and Extended Proportional Guidance(SAC-EPG)method.Simulation results verify that the proposed multi-missile cooperative penetration method based on hierarchical reinforcement learning converges faster than the penetration method based on MADDPG,and can quickly learn multi-missile cooperative penetration skills.In addition,multi-missile coordination can give full play to the group's detection and maneuverability,and occupy favorable penetration time and space through coordinated ballistic maneuvers.Thus the success rate of group penetration can be improved.展开更多
An intelligent endo-atmospheric penetration strategy based on generative adversarialreinforcement learning is proposed in this manuscript.Firstly,attack and defense adversarial mod-els are established,and missile mane...An intelligent endo-atmospheric penetration strategy based on generative adversarialreinforcement learning is proposed in this manuscript.Firstly,attack and defense adversarial mod-els are established,and missile maneuver penetration problem is transformed into an optimal con-trol problem,considering penetration,handover position and mid-terminal guidance velocityconstraints.Then,Radau Pseudospectral method is adopted to generate data samples consideringrandom perturbations.Furthermore,Generative Adversarial Imitation Learning Combined withDeep Deterministic Policy Gradient method(GAIL-DDPG)is designed,with internal processreward signals constructed to tackle long-term sparse reward in missile manuver penetration prob-lem.Finally,penetration strategy is trained and verified.Simulation shows that using generativeadversarial reinforcement learning,with sample library to learn expert experience in training earlystage,the proposed method can quickly converge.Also,performance is further optimized with rein-forcement learning exploration strategy in the later stage of training.Simulation shows that the pro-posed method has better engineering application ability compared with traditional reinforcementlearning method.展开更多
文摘A group optimal penetration strategy in complex attack and defense confrontation situation is proposed in this paper to solve the coordinated penetration decision-making problem of endo-atmospheric gliding simultaneous multi-missile penetration of interceptors.First,the problem of large search space of multi-missile coordinated penetration maneuvers is fully considered,and the flight corridor of multi-missile coordinated penetration is designed to constrain search space of multi-agent coordinated strategy,comprehensively considering path constraints and anticollision constraints of gliding multi-missile flight.Then,a multi-missile hierarchical coordinated decision-making mechanism based on confrontation situation is proposed,and the swarm penetration strategy is optimized with the goal of maximizing swarm penetration effectiveness.The upper layer plans the swarm penetration formation according to confrontation situation,and generates the swarm coordinated penetration trajectory based on Multi-Agent Deep Deterministic Policy Gradient(MADDPG)method.The lower layer interpolates and smooths penetration trajectory,and generates the penetration guidance command based on Soft Actor-Critic and Extended Proportional Guidance(SAC-EPG)method.Simulation results verify that the proposed multi-missile cooperative penetration method based on hierarchical reinforcement learning converges faster than the penetration method based on MADDPG,and can quickly learn multi-missile cooperative penetration skills.In addition,multi-missile coordination can give full play to the group's detection and maneuverability,and occupy favorable penetration time and space through coordinated ballistic maneuvers.Thus the success rate of group penetration can be improved.
文摘An intelligent endo-atmospheric penetration strategy based on generative adversarialreinforcement learning is proposed in this manuscript.Firstly,attack and defense adversarial mod-els are established,and missile maneuver penetration problem is transformed into an optimal con-trol problem,considering penetration,handover position and mid-terminal guidance velocityconstraints.Then,Radau Pseudospectral method is adopted to generate data samples consideringrandom perturbations.Furthermore,Generative Adversarial Imitation Learning Combined withDeep Deterministic Policy Gradient method(GAIL-DDPG)is designed,with internal processreward signals constructed to tackle long-term sparse reward in missile manuver penetration prob-lem.Finally,penetration strategy is trained and verified.Simulation shows that using generativeadversarial reinforcement learning,with sample library to learn expert experience in training earlystage,the proposed method can quickly converge.Also,performance is further optimized with rein-forcement learning exploration strategy in the later stage of training.Simulation shows that the pro-posed method has better engineering application ability compared with traditional reinforcementlearning method.