To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape...To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments.展开更多
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit...To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified.展开更多
在民用和军用领域,无人机(Unmanned Aerial Vehicle,UAV)编队的应用日益广泛;然而,对于复杂度较高的六自由度固定翼无人机模型,现有的编队控制研究相对有限。针对上述挑战,提出了分层近端策略优化算法(Hierarchical Guidance and Contro...在民用和军用领域,无人机(Unmanned Aerial Vehicle,UAV)编队的应用日益广泛;然而,对于复杂度较高的六自由度固定翼无人机模型,现有的编队控制研究相对有限。针对上述挑战,提出了分层近端策略优化算法(Hierarchical Guidance and Control Proximal Policy Optimization,HGC-PPO)。HGC-PPO基于分层架构,将编队控制分为制导层和控制层。HGC-PPO的无人机控制层采用了一种基于强化学习的多输入多输出架构,在编队制导层,基于课程学习策略以构建奖励函数,平滑了学习过程中的难度曲线。这种方法确保了训练的有效收敛,同时提高了样本利用率和训练效率。实验结果表明,HGC-PPO算法能够在保持高效的同时,实现稳定且精确的固定翼无人机编队控制。综上所述,HGC-PPO算法为六自由度固定翼无人机编队控制提供了一种有效的解决方案。展开更多
基金supported in part by the National Natural Science Foundation of China under grants 61901078,61771082,61871062,and U20A20157in part by the Science and Technology Research Program of Chongqing Municipal Education Commission under grant KJQN201900609+2 种基金in part by the Natural Science Foundation of Chongqing under grant cstc2020jcyj-zdxmX0024in part by University Innovation Research Group of Chongqing under grant CXQT20017in part by the China University Industry-University-Research Collaborative Innovation Fund(Future Network Innovation Research and Application Project)under grant 2021FNA04008.
文摘To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments.
基金financial support from National Natural Science Foundation of China(Grant No.61601491)Natural Science Foundation of Hubei Province,China(Grant No.2018CFC865)Military Research Project of China(-Grant No.YJ2020B117)。
文摘To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified.
文摘在民用和军用领域,无人机(Unmanned Aerial Vehicle,UAV)编队的应用日益广泛;然而,对于复杂度较高的六自由度固定翼无人机模型,现有的编队控制研究相对有限。针对上述挑战,提出了分层近端策略优化算法(Hierarchical Guidance and Control Proximal Policy Optimization,HGC-PPO)。HGC-PPO基于分层架构,将编队控制分为制导层和控制层。HGC-PPO的无人机控制层采用了一种基于强化学习的多输入多输出架构,在编队制导层,基于课程学习策略以构建奖励函数,平滑了学习过程中的难度曲线。这种方法确保了训练的有效收敛,同时提高了样本利用率和训练效率。实验结果表明,HGC-PPO算法能够在保持高效的同时,实现稳定且精确的固定翼无人机编队控制。综上所述,HGC-PPO算法为六自由度固定翼无人机编队控制提供了一种有效的解决方案。