基于Stackelberg模型和分层强化学习的无人机构形控制

Formation Control of Unmanned Vehicles Based on the Stackelberg Game Model and Hierarchical Reinforcement Learning Method

下载PDF

导出

摘要针对多无人机系统的构形控制问题,提出了一种基于Stackelberg博弈模型与分层强化学习的控制方法。首先,基于“领导者跟随者”框架进行系统动力学建模,并利用Stackelberg模型构建了一个包含N+1个参与者的博弈系统,将问题转化为最优控制和微分博弈问题。然后利用最优控制与动态规划的方法获取每个参与者的最优策略及其对应的耦合哈密顿雅可比贝尔曼(Hamilton-Jacobi-Bellman,HJB)方程组。在此基础上提出一种两阶段分层值迭代强化学习算法,对最优解进行迭代计算。在值函数的逼近过程中,引入神经网络激活函数来进行拟合,从而提高算法的计算效率与精度。与传统方法相比,所提出的算法能够有效地求解领导者与多跟随者之间的斯塔克伯格-纳什均衡。最后,基于数值仿真实验,验证了该方法在实际任务中的可行性和有效性。 This paper addresses the configuration control problem of multi-UAV systems by proposing a control method based on the Stackelberg game model and hierarchical reinforcement learning.First,the system dynamics are modeled within a leader-follower framework,and the task is analyzed using the Stackelberg game model.A game-theoretic dynamic system with N+1 participants is constructed,transforming the problem into optimal control and differential game theory.Subsequently,optimal control and dynamic programming techniques are employed to derive the optimal strategies for each participant,along with the corresponding coupled Hamilton-Jacobi-Bellman(HJB)equations.Building on this,a two-stage hierarchical value iteration reinforcement learning algorithm is proposed to iteratively compute the optimal solution.During the value function approximation process,a neural network activation function is incorporated for fitting,thereby enhancing the computational efficiency and accuracy of the algorithm.Compared to traditional methods,the proposed algorithm effectively solves the Stackelberg-Nash equilibrium between the leader and multiple followers.Finally,numerical simulation experiments are conducted to validate the feasibility and effectiveness of the proposed approach in real-world tasks.

作者朱俊彦杨亚王伟朱鸿绪孙然 ZHU Junyan;YANG Ya;WANG Wei;ZHU Hongxu;SUN Ran(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093;Shanghai Electromechanical Engineering Institute,Shanghai 201109;School of Aeronautics and Astronautics,Shanghai Jiao Tong University,Shanghai 200240)

机构地区上海理工大学光电信息与计算机工程学院上海机电工程研究所上海交通大学航空航天学院

出处《飞控与探测》 2025年第6期95-106,共12页 Flight Control & Detection

基金国家自然科学基金面上项目(52272408)。

关键词无人机编队构形控制斯塔克伯格博弈强化学习自适应动态规划 UAV formation formation control Stackelberg game reinforcement learning adaptive dynamic programming

分类号 V249.12 [航空宇航科学与技术—飞行器设计] V19 [航空宇航科学与技术—人机与环境工程]

引文网络
相关文献

参考文献7

1刘源渊,周蕾梅,李昊,高子义.面向无人自主空战的编队飞行控制方法综述[J].空天防御,2024,7(4):47-58. 被引量：2
2万良田,王家帅,孙璐,李奎贤,林云.面向复杂环境的集群无人机任务调度方法研究综述[J].信息对抗技术,2024,3(4):17-33. 被引量：6
3Bodi MA,Zhenbao LIU,Feihong JIANG,Wen ZHAO,Qingqing DANG,Xiao WANG,Junhong ZHANG,Lina WANG.Reinforcement learning based UAV formation control in GPS-denied environment[J].Chinese Journal of Aeronautics,2023,36(11):281-296. 被引量：5
4Yueqian LIANG,Qi DONG,Yanjie ZHAO.Adaptive leader–follower formation control for swarms of unmanned aerial vehicles with motion constraints and unknown disturbances[J].Chinese Journal of Aeronautics,2020,33(11):2972-2988. 被引量：23
5李欣,蔡光斌,吴彤,杨芊.一种基于应力矩阵的无人机集群队形变换控制方法[J].控制与决策,2024,39(7):2195-2204. 被引量：5
6李明,徐群,王艳,纪志成.基于自适应动态规划机械臂运动策略仿真研究[J].系统仿真学报,2023,35(10):2182-2192. 被引量：9
7樊彧,邵建芳,王熹徽.基于匮乏成本的应急物资政企合作储备与运输策略研究[J].系统工程理论与实践,2024,44(5):1603-1614. 被引量：8

二级参考文献38

1王祥科,李迅,郑志强.多智能体系统编队控制相关问题研究综述[J].控制与决策,2013,28(11):1601-1613. 被引量：104
2罗德林,张海洋,谢荣增,吴顺祥.基于多agent系统的大规模无人机集群对抗[J].控制理论与应用,2015,32(11):1498-1504. 被引量：51
3张琳,田军,杨瑞娜,冯耕中.数量柔性契约中的应急物资采购定价策略研究[J].系统工程理论与实践,2016,36(10):2590-2600. 被引量：57
4宗群,王丹丹,邵士凯,张博渊,韩宇.多无人机协同编队飞行控制研究现状及发展[J].哈尔滨工业大学学报,2017,49(3):1-14. 被引量：162
5Adel BELKADI,Zhixiang LIU,Laurent CIARLETTA,Youmin ZHANG,Didier THEILLIOL.Flocking control of a fleet of unmanned aerial vehicles[J].Control Theory and Technology,2018,16(2):82-92. 被引量：4
6扈衷权,田军,冯耕中.基于期权采购的政企联合储备应急物资模型[J].系统工程理论与实践,2018,38(8):2032-2044. 被引量：51
7高晓宁,田军,冯耕中.政府委托下应急物资生产能力代储系统激励契约设计[J].管理工程学报,2019,33(1):182-188. 被引量：38
8李焕,王奉文,徐世杰,侯月阳,卢山.基于阻抗控制的机械臂末端工具的柔顺控制[J].空间控制技术与应用,2019,45(1):20-26. 被引量：21
9刘阳,田军,冯耕中,高晓宁,扈衷权.考虑补贴约束的应急设备储备系统激励模型[J].系统工程理论与实践,2019,39(9):2330-2344. 被引量：16
10Yang XU,Delin LUO,Dongyu LI,Yancheng YOU,Haibin DUAN.Target-enclosing affine formation control of two-layer networked spacecraft with collision avoidance[J].Chinese Journal of Aeronautics,2019,32(12):2679-2693. 被引量：16

共引文献51

1王君,李昂.多无人机编队递归非奇异终端滑模容错控制[J].信息与控制,2024,53(1):71-85. 被引量：7
2Yang TAO,Neng XIONG,Xiaobing WANG,Jun LIN,Zhiyong LIU,Shang MA,Junqiang WU.Experimental and computational investigation of transonic speed[J].Chinese Journal of Aeronautics,2021,34(1):32-43. 被引量：3
3郭洪振,陈谋.基于预设性能的四旋翼无人机编队安全控制[J].航空学报,2021,42(8):293-305. 被引量：17
4Ziquan YU,Youmin ZHANG,Bin JIANG,Jun FU,Ying JIN.A review on fault-tolerant cooperative control of multiple unmanned aerial vehicles[J].Chinese Journal of Aeronautics,2022,35(1):1-18. 被引量：33
5Panpan ZHOU,Ben M.CHEN.Semi-global leader-following consensus-based formation flight of unmanned aerial vehicles[J].Chinese Journal of Aeronautics,2022,35(1):31-43. 被引量：4
6Jiang ZHAO,Jiaming SUN,Zhihao CAI,Yingxun WANG,Kun WU.Distributed coordinated control scheme of UAV swarm based on heterogeneous roles[J].Chinese Journal of Aeronautics,2022,35(1):81-97. 被引量：8
7Xiangyu WANG,Yanping YANG,Dong WANG,Zijian ZHANG.Mission-oriented cooperative 3D path planning for modular solar-powered aircraft with energy optimization[J].Chinese Journal of Aeronautics,2022,35(1):98-109. 被引量：4
8Yuzhu BAI,Rong CHEN,Yong ZHAO,Yi WANG.Gaussian mixture model based adaptive control for uncertain nonlinear systems with complex state constraints[J].Chinese Journal of Aeronautics,2022,35(5):361-373. 被引量：1
9Tong WU,Jie WANG,Bailing TIAN.Periodic event-triggered formation control for multi-UAV systems with collision avoidance[J].Chinese Journal of Aeronautics,2022,35(8):193-203. 被引量：13
10Nasim ULLAH,Yasir MEHMOOD,Jawad ASLAM,Shaoping WANG,Khamphe PHOUNGTHONG.Fractional order adaptive robust formation control of multiple quad-rotor UAVs with parametric uncertainties and wind disturbances[J].Chinese Journal of Aeronautics,2022,35(8):204-220. 被引量：2

1闫登航,柳鑫.直播助农双渠道供应链决策研究[J].农村经济与科技,2025,36(9):224-227.
2张晶蓉,谷东红,李梦丽,霍冉,王正阳.运营努力成本信息不对称下中欧班列物流服务供应链协调[J].铁道科学与工程学报,2025,22(8):3459-3470.
3С.В.安德列耶夫,梅边(译).植物保护中的生物物理学[J].中国农业科学,1962(1):56-57.
4Jianan Liu,Zhongxin Fan,Shihua Li,Rongjie Liu.Neural Network Based Inverse Optimal Control for Uncertain Nonlinear System with Unmatched Disturbance[J].The International Journal of Intelligent Control and Systems,2025,30(2):155-163.
5左怀宇,张二青,唐宇龙,董武.博弈论视角下无人机通信网络中资源管理优化研究综述[J].高技术通讯,2025,35(11):1188-1200.
6孙琪琪.在宁芙与辩证意象之间:迪迪-于贝尔曼对光晕的再阐释[J].中国图书评论,2026(1):80-91.
7李雅思,李超贵.利用数学实验测量π的近似值[J].实验教学与仪器,2025,42(12):88-89.
8李俊,李海兵.基于任务驱动模式的技校烹饪教学改革实践研究[J].中国食品工业,2025(22):159-161.
9史佳妮,陈希琼,胡大伟.基于电商平台与商家定价Stackelberg博弈的动态物流策略[J].计算机应用研究,2025,42(10):3084-3091.
10邵羿,夏登峰,陈朦瑶,陶子璇.CEV模型下基于损失依赖的鲁棒最优再保险-投资[J].合肥大学学报,2025,42(5):1-8.

飞控与探测

2025年第6期

浏览历史

内容加载中请稍等...

基于Stackelberg模型和分层强化学习的无人机构形控制

参考文献7

二级参考文献38

共引文献51

相关作者

相关机构

相关主题

浏览历史