摘要
在灾后黄金救援时期,可使用无人机率先抵达受损高层建筑进行螺旋上升式全覆盖扫描感知灾情。然而,由于受灾现场复杂的动态环境,无人机在抵近立体扫描时,容易出现轨迹跟踪精度低和碰撞风险高等问题。为此,本文提出融合优先经验回放的软演员-评论家(PER-SAC)控制模型,并基于六自由度(6-DOF)非线性动力学模型搭建高保真仿真平台。模型通过优先学习高时序差分误差(TD-error)的关键经验,提升复杂任务中的学习效率与控制策略的鲁棒性。仿真对比实验表明,所提PER-SAC策略的收敛速度和最终性能均优于软演员-评论家(SAC)和近端策略优化(PPO)算法。在静态轨迹跟踪任务中,PER-SAC的任务成功率达99.0%,平均轨迹误差较SAC降低了66.3%;在动态避障任务中,其任务成功率高达97.0%,且规避动作更平滑高效,模型控制的鲁棒性得到充分验证。通过融合优先经验回放机制显著提升无人机在未知动态环境下的自主飞行性能。所构建的PER-SAC模型即可以兼顾飞行控制精度、飞行品质与安全性,也可直接应用于灾后对高层受损建筑物的自主螺旋式扫描,通过稳定的飞行姿态获取高清影像,从而辅助救援团队快速感知灾情,提升应急搜救效率。
During the golden rescue period following a disaster,unmanned aerial vehicles(UAVs)can be delivered first to reach the damaged buildings for spiral full-coverage scanning and search for survivors.However,due to the complex and dynamic environment at disaster sites,UAVs often encounter challenges such as low trajectory tracking accuracy and high collision risks during close-range three-dimensional(3D)scanning.To address these issues,this paper proposes a Prioritized Experience Replay Soft Actor-Critic(PER-SAC)control model and establishes a high-fidelity simulation platform based on a 6-degree-of-freedom(6-DOF)nonlinear dynamic model.By prioritizing the learning of key experiences with high Temporal-Difference error(TD-error),the model enhances learning efficiency and policy robustness in complex tasks.Comparative simulation experiments demonstrate that the proposed PER-SAC strategy outperforms both Soft Actor-Critic(SAC)and Proximal Policy Optimization(PPO)algorithms in terms of convergence speed and final performance.In static trajectory tracking tasks,PER-SAC achieves a success rate of 99.0%,with an average trajectory error reduced by 66.3%compared to SAC.For dynamic obstacle avoidance tasks,the success rate is 97.0%,exhibiting smoother and more efficient evasion maneuvers,thereby fully validating its robustness.The incorporation of prioritized experience replay significantly improves UAVs'autonomous flight performance in unknown dynamic environments.The proposed PER-SAC strategy represents an advanced control method that effectively balances control precision,flight quality,and safety.It can be directly applied to autonomous spiral scanning of high-rise damaged buildings post-disaster,enabling stable flight attitudes to capture high-definition imagery.This capability assists rescue teams in rapidly locating trapped individuals,thereby enhancing emergency search and rescue efficiency.
作者
陈德启
张自设
张文会
闫学东
蒋贤才
CHEN Deqi;ZHANG Zishe;ZHANG Wenhui;YAN Xuedong;JIANG Xiancai(School of Civil Engineering and Transportation,Northeast Forestry University,Harbin 150040,China;School of Transportation and Logistics,Southwest Jiaotong University,Chengdu 611756,China)
出处
《交通运输系统工程与信息》
北大核心
2025年第6期87-100,共14页
Journal of Transportation Systems Engineering and Information Technology
基金
黑龙江省哲学社会科学研究规划项目(23GLCo22)
国家自然科学基金(52272311)。
关键词
航空运输
动态避障
深度强化学习
无人机
应急救援
air transportation
dynamic obstacle avoidance
deep reinforcement learning
unmanned aerial vehicle(UAV)
emergency rescue