摘要
无人机近些年来发展迅速,但无人机自主导航由于难以同时观测、定位、决策和行动而面临重大挑战。针对无人机三维环境下的自主导航决策,提出了一种高效的状态分解深度确定性策略梯度算法。基于深度确定性策略梯度(Deep Deterministic Policy Gradient,DDPG)算法,根据自身状态提出了一种新的状态分解方法,即对感知相关状态和自身相关状态分别使用两个子网络,以建立更合适的行动者网络,并分化经验池的PM(Prioritized Memory)DDPG方法。通过虚拟引擎中的airsim平台下进行三维环境搭建并训练。实验证明,提出的PM DDPG算法能有效提高无人机在三维复杂环境中的导航性能,较传统DDPG,TD3算法在收敛速度,训练无人机到达目标点的效率方面均有更好表现。
Unmanned Aerial Vehicles(UAVs)have developed rapidly in recent years,but autonomous navigation of UAVs faces significant challenges due to the difficulty of simultaneous observation,positioning,decision,making,and action.An efficient state decomposition deep deterministic strategy gradient algorithm is proposed for autonomous navigation decision making in the three dimensional environment of UAVs.Based on the Deep Deterministic Policy Gradient(DDPG)algorithm,a new state decomposition method is proposed according to the self state,which uses two sub networks for the perception related state and the self related state respectively to establish a more suitable actor network,and divides the experience pool into PM(Prioritized Memory)DDPG methods.By using the airsim platform in the virtual engine to build and train a 3D environment,experiments have shown that the proposed PM DDPG algorithm can effectively improve the navigation performance of UAVs in complex 3D environments.Compared to traditional DDPG,TD3 algorithm performs better in terms of convergence speed and efficiency in training UAVs to reach the target point.
作者
程擎
曾嘉诚
CHEN Qing;ZENG Jia cheng(Civil Aviation Flight University of China,Guanghan 618000,China)
出处
《航空计算技术》
2025年第1期1-6,共6页
Aeronautical Computing Technique
基金
交通运输工程一流学科建设项目资助(CZYL2024002)
导航工程新工科人才培养实践创新平台建设探索与实践项目资助(MHJT2023043)。
关键词
深度强化学习
DDPG
无人机
避障
deep reinforcement learning
DDPG
UAVs
obstacle avoidance