摘要
针对多无人机系统的构形控制问题,提出了一种基于Stackelberg博弈模型与分层强化学习的控制方法。首先,基于“领导者跟随者”框架进行系统动力学建模,并利用Stackelberg模型构建了一个包含N+1个参与者的博弈系统,将问题转化为最优控制和微分博弈问题。然后利用最优控制与动态规划的方法获取每个参与者的最优策略及其对应的耦合哈密顿雅可比贝尔曼(Hamilton-Jacobi-Bellman,HJB)方程组。在此基础上提出一种两阶段分层值迭代强化学习算法,对最优解进行迭代计算。在值函数的逼近过程中,引入神经网络激活函数来进行拟合,从而提高算法的计算效率与精度。与传统方法相比,所提出的算法能够有效地求解领导者与多跟随者之间的斯塔克伯格-纳什均衡。最后,基于数值仿真实验,验证了该方法在实际任务中的可行性和有效性。
This paper addresses the configuration control problem of multi-UAV systems by proposing a control method based on the Stackelberg game model and hierarchical reinforcement learning.First,the system dynamics are modeled within a leader-follower framework,and the task is analyzed using the Stackelberg game model.A game-theoretic dynamic system with N+1 participants is constructed,transforming the problem into optimal control and differential game theory.Subsequently,optimal control and dynamic programming techniques are employed to derive the optimal strategies for each participant,along with the corresponding coupled Hamilton-Jacobi-Bellman(HJB)equations.Building on this,a two-stage hierarchical value iteration reinforcement learning algorithm is proposed to iteratively compute the optimal solution.During the value function approximation process,a neural network activation function is incorporated for fitting,thereby enhancing the computational efficiency and accuracy of the algorithm.Compared to traditional methods,the proposed algorithm effectively solves the Stackelberg-Nash equilibrium between the leader and multiple followers.Finally,numerical simulation experiments are conducted to validate the feasibility and effectiveness of the proposed approach in real-world tasks.
作者
朱俊彦
杨亚
王伟
朱鸿绪
孙然
ZHU Junyan;YANG Ya;WANG Wei;ZHU Hongxu;SUN Ran(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093;Shanghai Electromechanical Engineering Institute,Shanghai 201109;School of Aeronautics and Astronautics,Shanghai Jiao Tong University,Shanghai 200240)
出处
《飞控与探测》
2025年第6期95-106,共12页
Flight Control & Detection
基金
国家自然科学基金面上项目(52272408)。
关键词
无人机编队
构形控制
斯塔克伯格博弈
强化学习
自适应动态规划
UAV formation
formation control
Stackelberg game
reinforcement learning
adaptive dynamic programming