This study addresses the maneuver evasion problem for medium-to-long-range air-to-air missiles by proposing a KAN-λ-PPO-based evasion algorithm.The algorithm introduces Kolmogorov-Arnold Networks(KAN)to mitigate the ...This study addresses the maneuver evasion problem for medium-to-long-range air-to-air missiles by proposing a KAN-λ-PPO-based evasion algorithm.The algorithm introduces Kolmogorov-Arnold Networks(KAN)to mitigate the catastrophic forgetting issue of Multilayer Perceptrons(MLP)in continual learning,while incorporatingλ-return to resolve sparse reward challenges in evasion scenarios.First,we model the evasion problem withλ-return and present the KAN-λ-PPO algorithm.Subsequently,we establish game environments based on the segmented ballistic characteristics of medium and long range missiles.During training,a joint reward function is designed by combining the miss distance and positional advantages to train the agent.Experiments evaluate four dimensions:(1)Performance comparison between KAN and MLP in value function approximation;(2)Catastrophic forgetting mitigation of KAN-λ-PPO in dual-task scenarios;(3)Continual learning capabilities across multiple evasion scenarios;(4)Quantitative analysis of agent strategy evolution and positional advantages.Empirical results demonstrate that KAN improves value function approximation accuracy by an order of magnitude compared with traditional MLP architectures.In continual learning tasks,the KAN-λ-PPO scheme exhibits significant knowledge retention,achieving performance improvements of 32.7% and 8.6%over MLP baselines in Task1→2 and Task2→3 transitions,respectively.Furthermore,the learned maneuver strategies outperform High-G Barrel Rolls(HGB)and S-maneuver tactics in securing positional advantages while accomplishing evasion.展开更多
Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.Howeve...Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.However,communication between UAVs still faces challenges due to high-dynamic topology,volatile wireless links,and strict energy budgets.In this work,we introduce an improved communication scheme,namely Proximal Policy Optimization(PPO).Our solution casts hop–by–hop relay selection as aMarkov decision process and develops a decentralized Proximal Policy Optimization framework in an actor–critic form.Akey novelty is the design of the reward function,which jointly considers the delivery ratio,end-to-end delay,and energy efficiency,enabling flexible prioritization in dynamic environments.The simulation results across swarms of 20–70 UAVs show that,the proposed framework enhances delivery ratio to 5%over a Deep Q-Network baseline(reaching≈80%at 70 nodes),reduces latency by about 2–3ms inmedium-to-dense settings(from∼43 to 35–36ms),and attains comparable or slightly lower total energy consumption(typically 0.5%–2%lower).The results indicate that the proposed communication scheme,adaptive and scalable learning-based UAV scenarios,pave the way for re-world UAV deployments.展开更多
[目的/意义]为解决当前作物管理中个性化需求难以捕捉、决策过程缺乏灵活性难题,本研究提出了一种基于大语言模型的个性化作物生产智能决策方法[方法]通过自然语言对话收集用户在蔬菜作物管理过程中的个性化需求,涵盖产量、人力资源消...[目的/意义]为解决当前作物管理中个性化需求难以捕捉、决策过程缺乏灵活性难题,本研究提出了一种基于大语言模型的个性化作物生产智能决策方法[方法]通过自然语言对话收集用户在蔬菜作物管理过程中的个性化需求,涵盖产量、人力资源消耗和水肥消耗等方面。随后,将作物管理过程建模为多目标优化问题,同时考虑用户个性化偏好和作物产量,并采用强化学习算法来学习作物管理策略。水肥管理策略的训练通过与环境的交互持续更新,学习在不同条件下采取何种行动以实现最优决策,从而实现个性化的作物管理。[结果和讨论]在gym-DSSAT(Gym-Decision Support System for Agrotechnology Transfer)仿真平台上进行的实验,结果表明,所提出的个性化作物生产智能决策方法能够有效地根据用户的个性化偏好调整作物管理策略。[结论]通过精准捕捉用户的个性化需求,该方法在保证作物产量的同时,优化了人力资源与水肥资源的消耗。展开更多
文摘This study addresses the maneuver evasion problem for medium-to-long-range air-to-air missiles by proposing a KAN-λ-PPO-based evasion algorithm.The algorithm introduces Kolmogorov-Arnold Networks(KAN)to mitigate the catastrophic forgetting issue of Multilayer Perceptrons(MLP)in continual learning,while incorporatingλ-return to resolve sparse reward challenges in evasion scenarios.First,we model the evasion problem withλ-return and present the KAN-λ-PPO algorithm.Subsequently,we establish game environments based on the segmented ballistic characteristics of medium and long range missiles.During training,a joint reward function is designed by combining the miss distance and positional advantages to train the agent.Experiments evaluate four dimensions:(1)Performance comparison between KAN and MLP in value function approximation;(2)Catastrophic forgetting mitigation of KAN-λ-PPO in dual-task scenarios;(3)Continual learning capabilities across multiple evasion scenarios;(4)Quantitative analysis of agent strategy evolution and positional advantages.Empirical results demonstrate that KAN improves value function approximation accuracy by an order of magnitude compared with traditional MLP architectures.In continual learning tasks,the KAN-λ-PPO scheme exhibits significant knowledge retention,achieving performance improvements of 32.7% and 8.6%over MLP baselines in Task1→2 and Task2→3 transitions,respectively.Furthermore,the learned maneuver strategies outperform High-G Barrel Rolls(HGB)and S-maneuver tactics in securing positional advantages while accomplishing evasion.
基金funded byHung YenUniversity of Technology and Education under grant number UTEHY.L.2026.05.
文摘Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.However,communication between UAVs still faces challenges due to high-dynamic topology,volatile wireless links,and strict energy budgets.In this work,we introduce an improved communication scheme,namely Proximal Policy Optimization(PPO).Our solution casts hop–by–hop relay selection as aMarkov decision process and develops a decentralized Proximal Policy Optimization framework in an actor–critic form.Akey novelty is the design of the reward function,which jointly considers the delivery ratio,end-to-end delay,and energy efficiency,enabling flexible prioritization in dynamic environments.The simulation results across swarms of 20–70 UAVs show that,the proposed framework enhances delivery ratio to 5%over a Deep Q-Network baseline(reaching≈80%at 70 nodes),reduces latency by about 2–3ms inmedium-to-dense settings(from∼43 to 35–36ms),and attains comparable or slightly lower total energy consumption(typically 0.5%–2%lower).The results indicate that the proposed communication scheme,adaptive and scalable learning-based UAV scenarios,pave the way for re-world UAV deployments.
文摘[目的/意义]为解决当前作物管理中个性化需求难以捕捉、决策过程缺乏灵活性难题,本研究提出了一种基于大语言模型的个性化作物生产智能决策方法[方法]通过自然语言对话收集用户在蔬菜作物管理过程中的个性化需求,涵盖产量、人力资源消耗和水肥消耗等方面。随后,将作物管理过程建模为多目标优化问题,同时考虑用户个性化偏好和作物产量,并采用强化学习算法来学习作物管理策略。水肥管理策略的训练通过与环境的交互持续更新,学习在不同条件下采取何种行动以实现最优决策,从而实现个性化的作物管理。[结果和讨论]在gym-DSSAT(Gym-Decision Support System for Agrotechnology Transfer)仿真平台上进行的实验,结果表明,所提出的个性化作物生产智能决策方法能够有效地根据用户的个性化偏好调整作物管理策略。[结论]通过精准捕捉用户的个性化需求,该方法在保证作物产量的同时,优化了人力资源与水肥资源的消耗。