An intelligent endo-atmospheric penetration strategy based on generative adversarialreinforcement learning is proposed in this manuscript.Firstly,attack and defense adversarial mod-els are established,and missile mane...An intelligent endo-atmospheric penetration strategy based on generative adversarialreinforcement learning is proposed in this manuscript.Firstly,attack and defense adversarial mod-els are established,and missile maneuver penetration problem is transformed into an optimal con-trol problem,considering penetration,handover position and mid-terminal guidance velocityconstraints.Then,Radau Pseudospectral method is adopted to generate data samples consideringrandom perturbations.Furthermore,Generative Adversarial Imitation Learning Combined withDeep Deterministic Policy Gradient method(GAIL-DDPG)is designed,with internal processreward signals constructed to tackle long-term sparse reward in missile manuver penetration prob-lem.Finally,penetration strategy is trained and verified.Simulation shows that using generativeadversarial reinforcement learning,with sample library to learn expert experience in training earlystage,the proposed method can quickly converge.Also,performance is further optimized with rein-forcement learning exploration strategy in the later stage of training.Simulation shows that the pro-posed method has better engineering application ability compared with traditional reinforcementlearning method.展开更多
文摘An intelligent endo-atmospheric penetration strategy based on generative adversarialreinforcement learning is proposed in this manuscript.Firstly,attack and defense adversarial mod-els are established,and missile maneuver penetration problem is transformed into an optimal con-trol problem,considering penetration,handover position and mid-terminal guidance velocityconstraints.Then,Radau Pseudospectral method is adopted to generate data samples consideringrandom perturbations.Furthermore,Generative Adversarial Imitation Learning Combined withDeep Deterministic Policy Gradient method(GAIL-DDPG)is designed,with internal processreward signals constructed to tackle long-term sparse reward in missile manuver penetration prob-lem.Finally,penetration strategy is trained and verified.Simulation shows that using generativeadversarial reinforcement learning,with sample library to learn expert experience in training earlystage,the proposed method can quickly converge.Also,performance is further optimized with rein-forcement learning exploration strategy in the later stage of training.Simulation shows that the pro-posed method has better engineering application ability compared with traditional reinforcementlearning method.