Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.Howeve...Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.However,communication between UAVs still faces challenges due to high-dynamic topology,volatile wireless links,and strict energy budgets.In this work,we introduce an improved communication scheme,namely Proximal Policy Optimization(PPO).Our solution casts hop–by–hop relay selection as aMarkov decision process and develops a decentralized Proximal Policy Optimization framework in an actor–critic form.Akey novelty is the design of the reward function,which jointly considers the delivery ratio,end-to-end delay,and energy efficiency,enabling flexible prioritization in dynamic environments.The simulation results across swarms of 20–70 UAVs show that,the proposed framework enhances delivery ratio to 5%over a Deep Q-Network baseline(reaching≈80%at 70 nodes),reduces latency by about 2–3ms inmedium-to-dense settings(from∼43 to 35–36ms),and attains comparable or slightly lower total energy consumption(typically 0.5%–2%lower).The results indicate that the proposed communication scheme,adaptive and scalable learning-based UAV scenarios,pave the way for re-world UAV deployments.展开更多
基金funded byHung YenUniversity of Technology and Education under grant number UTEHY.L.2026.05.
文摘Nowadays,Unmanned Aerial Vehicles(UAVs)are making increasingly important contributions to numerous applications that enhance human quality of life,such as sensing and data collection,computing,and communication.However,communication between UAVs still faces challenges due to high-dynamic topology,volatile wireless links,and strict energy budgets.In this work,we introduce an improved communication scheme,namely Proximal Policy Optimization(PPO).Our solution casts hop–by–hop relay selection as aMarkov decision process and develops a decentralized Proximal Policy Optimization framework in an actor–critic form.Akey novelty is the design of the reward function,which jointly considers the delivery ratio,end-to-end delay,and energy efficiency,enabling flexible prioritization in dynamic environments.The simulation results across swarms of 20–70 UAVs show that,the proposed framework enhances delivery ratio to 5%over a Deep Q-Network baseline(reaching≈80%at 70 nodes),reduces latency by about 2–3ms inmedium-to-dense settings(from∼43 to 35–36ms),and attains comparable or slightly lower total energy consumption(typically 0.5%–2%lower).The results indicate that the proposed communication scheme,adaptive and scalable learning-based UAV scenarios,pave the way for re-world UAV deployments.