This study addresses the maneuver evasion problem for medium-to-long-range air-to-air missiles by proposing a KAN-λ-PPO-based evasion algorithm.The algorithm introduces Kolmogorov-Arnold Networks(KAN)to mitigate the ...This study addresses the maneuver evasion problem for medium-to-long-range air-to-air missiles by proposing a KAN-λ-PPO-based evasion algorithm.The algorithm introduces Kolmogorov-Arnold Networks(KAN)to mitigate the catastrophic forgetting issue of Multilayer Perceptrons(MLP)in continual learning,while incorporatingλ-return to resolve sparse reward challenges in evasion scenarios.First,we model the evasion problem withλ-return and present the KAN-λ-PPO algorithm.Subsequently,we establish game environments based on the segmented ballistic characteristics of medium and long range missiles.During training,a joint reward function is designed by combining the miss distance and positional advantages to train the agent.Experiments evaluate four dimensions:(1)Performance comparison between KAN and MLP in value function approximation;(2)Catastrophic forgetting mitigation of KAN-λ-PPO in dual-task scenarios;(3)Continual learning capabilities across multiple evasion scenarios;(4)Quantitative analysis of agent strategy evolution and positional advantages.Empirical results demonstrate that KAN improves value function approximation accuracy by an order of magnitude compared with traditional MLP architectures.In continual learning tasks,the KAN-λ-PPO scheme exhibits significant knowledge retention,achieving performance improvements of 32.7% and 8.6%over MLP baselines in Task1→2 and Task2→3 transitions,respectively.Furthermore,the learned maneuver strategies outperform High-G Barrel Rolls(HGB)and S-maneuver tactics in securing positional advantages while accomplishing evasion.展开更多
文摘This study addresses the maneuver evasion problem for medium-to-long-range air-to-air missiles by proposing a KAN-λ-PPO-based evasion algorithm.The algorithm introduces Kolmogorov-Arnold Networks(KAN)to mitigate the catastrophic forgetting issue of Multilayer Perceptrons(MLP)in continual learning,while incorporatingλ-return to resolve sparse reward challenges in evasion scenarios.First,we model the evasion problem withλ-return and present the KAN-λ-PPO algorithm.Subsequently,we establish game environments based on the segmented ballistic characteristics of medium and long range missiles.During training,a joint reward function is designed by combining the miss distance and positional advantages to train the agent.Experiments evaluate four dimensions:(1)Performance comparison between KAN and MLP in value function approximation;(2)Catastrophic forgetting mitigation of KAN-λ-PPO in dual-task scenarios;(3)Continual learning capabilities across multiple evasion scenarios;(4)Quantitative analysis of agent strategy evolution and positional advantages.Empirical results demonstrate that KAN improves value function approximation accuracy by an order of magnitude compared with traditional MLP architectures.In continual learning tasks,the KAN-λ-PPO scheme exhibits significant knowledge retention,achieving performance improvements of 32.7% and 8.6%over MLP baselines in Task1→2 and Task2→3 transitions,respectively.Furthermore,the learned maneuver strategies outperform High-G Barrel Rolls(HGB)and S-maneuver tactics in securing positional advantages while accomplishing evasion.