Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,...Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness.展开更多
The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address th...The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios.展开更多
The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-ma...The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
Multi-agent reinforcement learning holds tremendous potential for revolutionizing intelligent systems across diverse domains.However,it is also concomitant with a set of formidable challenges,which include the effecti...Multi-agent reinforcement learning holds tremendous potential for revolutionizing intelligent systems across diverse domains.However,it is also concomitant with a set of formidable challenges,which include the effective allocation of credit values to each agent,real-time collaboration among heterogeneous agents,and an appropriate reward function to guide agent behavior.To handle these issues,we propose an innovative solution named the Graph Attention Counterfactual Multiagent Actor–Critic algorithm(GACMAC).This algorithm encompasses several key components:First,it employs a multiagent actor–critic framework along with counterfactual baselines to assess the individual actions of each agent.Second,it integrates a graph attention network to enhance real-time collaboration among agents,enabling heterogeneous agents to effectively share information during handling tasks.Third,it incorporates prior human knowledge through a potential-based reward shaping method,thereby elevating the convergence speed and stability of the algorithm.We tested our algorithm on the StarCraft Multi-Agent Challenge(SMAC)platform,which is a recognized platform for testing multiagent algorithms,and our algorithm achieved a win rate of over 95%on the platform,comparable to the current state-of-the-art multi-agent controllers.展开更多
This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method...This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.展开更多
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj...This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.展开更多
This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing...This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.展开更多
The motion planning problem for multi-agent systems becomes particularly challenging when humans or human-controlled robots are present in a mixed environment.To address this challenge,this paper presents an interacti...The motion planning problem for multi-agent systems becomes particularly challenging when humans or human-controlled robots are present in a mixed environment.To address this challenge,this paper presents an interaction-aware motion planning approach based on game theory in a receding-horizon manner Leveraging the framework provided by dynamic potential games for handling the interactions among agents,this approach formulates the multi-agent motion planning problem as a differential potential game,highlighting the effectiveness of constrained potential games in facilitating interactive motion planning among agents.Furthermore,online learning techniques are incorporated to dynamically learn the unknown preferences and models of humans or human-controlled robots through the analysis of observed data.To evaluate the effectiveness of the proposed approach,numerical simulations are conducted,demonstrating its capability to generate interactive trajectories for all agents,including humans and human-controlled agents,operating within the mixed environment.The simulation results illustrate the effectiveness of the proposed approach in handling the complexities of multi-agent motion planning in real-world scenarios.展开更多
Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form ...Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form of alliances,introduces neighboring countries’exchange rates into the cross-border multi-agent power-trading market and proposes a method to study each agent’s dynamic decision-making behavior based on evolutionary game theory.To this end,this study uses three national agents as examples,constructs a tripartite evolutionary game model,and analyzes the evolution process of the decision-making behavior of each agent member state under the initial willingness value,cost of payment,and additional revenue of the alliance.This research helps realize cross-border energy operations so that the transaction agent can achieve greater trade profits and provides a theoretical basis for cooperation and stability between multiple agents.展开更多
This paper mainly focuses on the velocity-constrained consensus problem of discrete-time heterogeneous multi-agent systems with nonconvex constraints and arbitrarily switching topologies,where each agent has first-ord...This paper mainly focuses on the velocity-constrained consensus problem of discrete-time heterogeneous multi-agent systems with nonconvex constraints and arbitrarily switching topologies,where each agent has first-order or second-order dynamics.To solve this problem,a distributed algorithm is proposed based on a contraction operator.By employing the properties of the stochastic matrix,it is shown that all agents’position states could converge to a common point and second-order agents’velocity states could remain in corresponding nonconvex constraint sets and converge to zero as long as the joint communication topology has one directed spanning tree.Finally,the numerical simulation results are provided to verify the effectiveness of the proposed algorithms.展开更多
This paper focuses on the problem of leaderfollowing consensus for nonlinear cascaded multi-agent systems.The control strategies for these systems are transformed into successive control problem schemes for lower-orde...This paper focuses on the problem of leaderfollowing consensus for nonlinear cascaded multi-agent systems.The control strategies for these systems are transformed into successive control problem schemes for lower-order error subsystems.A distributed consensus analysis for the corresponding error systems is conducted by employing recursive methods and virtual controllers,accompanied by a series of Lyapunov functions devised throughout the iterative process,which solves the leaderfollowing consensus problem of a class of nonlinear cascaded multi-agent systems.Specific simulation examples illustrate the effectiveness of the proposed control algorithm.展开更多
This article investigates the time-varying output group formation tracking control(GFTC)problem for heterogeneous multi-agent systems(HMASs)under switching topologies.The objective is to design a distributed control s...This article investigates the time-varying output group formation tracking control(GFTC)problem for heterogeneous multi-agent systems(HMASs)under switching topologies.The objective is to design a distributed control strategy that enables the outputs of the followers to form the desired sub-formations and track the outputs of the leader in each subgroup.Firstly,novel distributed observers are developed to estimate the states of the leaders under switching topologies.Then,GFTC protocols are designed based on the proposed observers.It is shown that with the distributed protocol,the GFTC problem for HMASs under switching topologies is solved if the average dwell time associated with the switching topologies is larger than a fixed threshold.Finally,an example is provided to illustrate the effectiveness of the proposed control strategy.展开更多
Formation control in multi-agent systems has become a critical area of interest due to its wide-ranging applications in robotics,autonomous transportation,and surveillance.While various studies have explored distribut...Formation control in multi-agent systems has become a critical area of interest due to its wide-ranging applications in robotics,autonomous transportation,and surveillance.While various studies have explored distributed cooperative control,this review focuses on the theoretical foundations and recent developments in formation control strategies.The paper categorizes and analyzes key formation types,including formation maintenance,group or cluster formation,bipartite formations,event-triggered formations,finite-time convergence,and constrained formations.A significant portion of the review addresses formation control under constrained dynamics,presenting both modelbased and model-free approaches that consider practical limitations such as actuator bounds,communication delays,and nonholonomic constraints.Additionally,the paper discusses emerging trends,including the integration of eventdriven mechanisms and AI-enhanced coordination strategies.Comparative evaluations highlight the trade-offs among various methodologies regarding scalability,robustness,and real-world feasibility.Practical implementations are reviewed across diverse platforms,and the review identifies the current achievements and unresolved challenges in the field.The paper concludes by outlining promising research directions,such as adaptive control for dynamic environments,energy-efficient coordination,and using learning-based control under uncertainty.This review synthesizes the current state of the art and provides a road map for future investigation,making it a valuable reference for researchers and practitioners aiming to advance formation control in multi-agent systems.展开更多
Continuous control protocols are extensively utilized in traditional MASs,in which information needs to be transmitted among agents consecutively,therefore resulting in excessive consumption of limited resources.To de...Continuous control protocols are extensively utilized in traditional MASs,in which information needs to be transmitted among agents consecutively,therefore resulting in excessive consumption of limited resources.To decrease the control cost,based on ISC,several LFC problems are investigated for second-order MASs without and with time delay,respectively.Firstly,an intermittent sampled controller is designed,and a sufficient and necessary condition is derived,under which state errors between the leader and all the followers approach zero asymptotically.Considering that time delay is inevitable,a new protocol is proposed to deal with the time-delay situation.The error system’s stability is analyzed using the Schur stability theorem,and sufficient and necessary conditions for LFC are obtained,which are closely associated with the coupling gain,the system parameters,and the network structure.Furthermore,for the case where the current position and velocity information are not available,a distributed protocol is designed that depends only on the sampled position information.The sufficient and necessary conditions for LFC are also given.The results show that second-order MASs can achieve the LFC if and only if the system parameters satisfy the inequalities proposed in the paper.Finally,the correctness of the obtained results is verified by numerical simulations.展开更多
In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agen...In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agents lead to complex collective behaviors.This paper studies Multi-Agent Reinforcement Learning to simulate fish schooling behavior,overcoming the challenges of tuning parameters in traditional models and addressing the limitations of single-agent methods in multi-agent environments.Based on this foundation,a novel Graph Convolutional Networks(GCN)-Critic MADDPG algorithm leveraging GCN is proposed to enhance cooperation among agents in a multi-agent system.Simulation experiments demonstrate that,compared to traditional single-agent algorithms,the proposed method not only exhibits significant advantages in terms of convergence speed and stability but also achieves tighter group formations and more naturally aligned Milling behavior.Additionally,a fish school self-organizing behavior research platform based on an event-triggered mechanism has been developed,providing a robust tool for exploring dynamic behavioral changes under various conditions.展开更多
In the islanded operation of distribution networks,due to the mismatch of line impedance at the inverter output,conventional droop control leads to inaccurate power sharing according to capacity,resulting in voltage a...In the islanded operation of distribution networks,due to the mismatch of line impedance at the inverter output,conventional droop control leads to inaccurate power sharing according to capacity,resulting in voltage and frequency fluctuations under minor external disturbances.To address this issue,this paper introduces an enhanced scheme for power sharing and voltage-frequency control.First,to solve the power distribution problem,we propose an adaptive virtual impedance control based on multi-agent consensus,which allows for precise active and reactive power allocation without requiring feeder impedance knowledge.Moreover,a novel consensus-based voltage and frequency control is proposed to correct the voltage deviation inherent in droop control and virtual impedance methods.This strategy maintains voltage and frequency stability even during communication disruptions and enhances system robustness.Additionally,a small-signal model is established for system stability analysis,and the control parameters are optimized.Simulation results validate the effectiveness of the proposed control scheme.展开更多
The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defendin...The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs.展开更多
Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies dri...Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks.展开更多
基金funded by National Natural Science Foundation of China No.62302520.
文摘Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness.
基金funded in part by the Humanities and Social Sciences Planning Foundation of Ministry of Education of China under Grant No.24YJAZH123National Undergraduate Innovation and Entrepreneurship Training Program of China under Grant No.202510347069the Huzhou Science and Technology Planning Foundation under Grant No.2023GZ04.
文摘The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios.
文摘The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
文摘Multi-agent reinforcement learning holds tremendous potential for revolutionizing intelligent systems across diverse domains.However,it is also concomitant with a set of formidable challenges,which include the effective allocation of credit values to each agent,real-time collaboration among heterogeneous agents,and an appropriate reward function to guide agent behavior.To handle these issues,we propose an innovative solution named the Graph Attention Counterfactual Multiagent Actor–Critic algorithm(GACMAC).This algorithm encompasses several key components:First,it employs a multiagent actor–critic framework along with counterfactual baselines to assess the individual actions of each agent.Second,it integrates a graph attention network to enhance real-time collaboration among agents,enabling heterogeneous agents to effectively share information during handling tasks.Third,it incorporates prior human knowledge through a potential-based reward shaping method,thereby elevating the convergence speed and stability of the algorithm.We tested our algorithm on the StarCraft Multi-Agent Challenge(SMAC)platform,which is a recognized platform for testing multiagent algorithms,and our algorithm achieved a win rate of over 95%on the platform,comparable to the current state-of-the-art multi-agent controllers.
基金The National Natural Science Foundation of China(W2431048)The Science and Technology Research Program of Chongqing Municipal Education Commission,China(KJZDK202300807)The Chongqing Natural Science Foundation,China(CSTB2024NSCQQCXMX0052).
文摘This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.
基金supported by the National Natural Science Foundation of China(Nos.12272104,U22B2013).
文摘This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.
基金supported by the National Research and Development Program of China under Grant JCKY2018607C019in part by the Key Laboratory Fund of UAV of Northwestern Polytechnical University under Grant 2021JCJQLB0710L.
文摘This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.
基金supported by the A*STAR under its"RIE2025 IAF-PP Advanced ROS2-native Platform Technologies for Cross sectorial Robotics Adoption(M21K1a0104)"programme.
文摘The motion planning problem for multi-agent systems becomes particularly challenging when humans or human-controlled robots are present in a mixed environment.To address this challenge,this paper presents an interaction-aware motion planning approach based on game theory in a receding-horizon manner Leveraging the framework provided by dynamic potential games for handling the interactions among agents,this approach formulates the multi-agent motion planning problem as a differential potential game,highlighting the effectiveness of constrained potential games in facilitating interactive motion planning among agents.Furthermore,online learning techniques are incorporated to dynamically learn the unknown preferences and models of humans or human-controlled robots through the analysis of observed data.To evaluate the effectiveness of the proposed approach,numerical simulations are conducted,demonstrating its capability to generate interactive trajectories for all agents,including humans and human-controlled agents,operating within the mixed environment.The simulation results illustrate the effectiveness of the proposed approach in handling the complexities of multi-agent motion planning in real-world scenarios.
基金National Key R&D Program of China(Grant No.2022YFB2703500)National Natural Science Foundation of China(Grant No.52277104)+2 种基金National Key R&D Program of Yunnan Province(202303AC100003)Applied Basic Research Foundation of Yunnan Province (202301AT070455, 202101AT070080)Revitalizing Talent Support Program of Yunnan Province (KKRD202204024).
文摘Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form of alliances,introduces neighboring countries’exchange rates into the cross-border multi-agent power-trading market and proposes a method to study each agent’s dynamic decision-making behavior based on evolutionary game theory.To this end,this study uses three national agents as examples,constructs a tripartite evolutionary game model,and analyzes the evolution process of the decision-making behavior of each agent member state under the initial willingness value,cost of payment,and additional revenue of the alliance.This research helps realize cross-border energy operations so that the transaction agent can achieve greater trade profits and provides a theoretical basis for cooperation and stability between multiple agents.
基金2024 Jiangsu Province Youth Science and Technology Talent Support Project2024 Yancheng Key Research and Development Plan(Social Development)projects,“Research and Application of Multi Agent Offline Distributed Trust Perception Virtual Wireless Sensor Network Algorithm”and“Research and Application of a New Type of Fishery Ship Safety Production Monitoring Equipment”。
文摘This paper mainly focuses on the velocity-constrained consensus problem of discrete-time heterogeneous multi-agent systems with nonconvex constraints and arbitrarily switching topologies,where each agent has first-order or second-order dynamics.To solve this problem,a distributed algorithm is proposed based on a contraction operator.By employing the properties of the stochastic matrix,it is shown that all agents’position states could converge to a common point and second-order agents’velocity states could remain in corresponding nonconvex constraint sets and converge to zero as long as the joint communication topology has one directed spanning tree.Finally,the numerical simulation results are provided to verify the effectiveness of the proposed algorithms.
基金National Natural Science Foundation of China(No.12071370)。
文摘This paper focuses on the problem of leaderfollowing consensus for nonlinear cascaded multi-agent systems.The control strategies for these systems are transformed into successive control problem schemes for lower-order error subsystems.A distributed consensus analysis for the corresponding error systems is conducted by employing recursive methods and virtual controllers,accompanied by a series of Lyapunov functions devised throughout the iterative process,which solves the leaderfollowing consensus problem of a class of nonlinear cascaded multi-agent systems.Specific simulation examples illustrate the effectiveness of the proposed control algorithm.
文摘This article investigates the time-varying output group formation tracking control(GFTC)problem for heterogeneous multi-agent systems(HMASs)under switching topologies.The objective is to design a distributed control strategy that enables the outputs of the followers to form the desired sub-formations and track the outputs of the leader in each subgroup.Firstly,novel distributed observers are developed to estimate the states of the leaders under switching topologies.Then,GFTC protocols are designed based on the proposed observers.It is shown that with the distributed protocol,the GFTC problem for HMASs under switching topologies is solved if the average dwell time associated with the switching topologies is larger than a fixed threshold.Finally,an example is provided to illustrate the effectiveness of the proposed control strategy.
基金supported in part by the National Natural Science Foundation of China under Grant 6237319in part by the Postgraduate Research and Practice Innovation Program of Jiangsu Province under Grant KYCX230479.
文摘Formation control in multi-agent systems has become a critical area of interest due to its wide-ranging applications in robotics,autonomous transportation,and surveillance.While various studies have explored distributed cooperative control,this review focuses on the theoretical foundations and recent developments in formation control strategies.The paper categorizes and analyzes key formation types,including formation maintenance,group or cluster formation,bipartite formations,event-triggered formations,finite-time convergence,and constrained formations.A significant portion of the review addresses formation control under constrained dynamics,presenting both modelbased and model-free approaches that consider practical limitations such as actuator bounds,communication delays,and nonholonomic constraints.Additionally,the paper discusses emerging trends,including the integration of eventdriven mechanisms and AI-enhanced coordination strategies.Comparative evaluations highlight the trade-offs among various methodologies regarding scalability,robustness,and real-world feasibility.Practical implementations are reviewed across diverse platforms,and the review identifies the current achievements and unresolved challenges in the field.The paper concludes by outlining promising research directions,such as adaptive control for dynamic environments,energy-efficient coordination,and using learning-based control under uncertainty.This review synthesizes the current state of the art and provides a road map for future investigation,making it a valuable reference for researchers and practitioners aiming to advance formation control in multi-agent systems.
基金supported by the National Natural Science Foundation of China under Grants 62476138 and 42375016.
文摘Continuous control protocols are extensively utilized in traditional MASs,in which information needs to be transmitted among agents consecutively,therefore resulting in excessive consumption of limited resources.To decrease the control cost,based on ISC,several LFC problems are investigated for second-order MASs without and with time delay,respectively.Firstly,an intermittent sampled controller is designed,and a sufficient and necessary condition is derived,under which state errors between the leader and all the followers approach zero asymptotically.Considering that time delay is inevitable,a new protocol is proposed to deal with the time-delay situation.The error system’s stability is analyzed using the Schur stability theorem,and sufficient and necessary conditions for LFC are obtained,which are closely associated with the coupling gain,the system parameters,and the network structure.Furthermore,for the case where the current position and velocity information are not available,a distributed protocol is designed that depends only on the sampled position information.The sufficient and necessary conditions for LFC are also given.The results show that second-order MASs can achieve the LFC if and only if the system parameters satisfy the inequalities proposed in the paper.Finally,the correctness of the obtained results is verified by numerical simulations.
基金supported by the National Natural Science Foundation of China under Grant 62273351 and Grant 62303020.
文摘In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agents lead to complex collective behaviors.This paper studies Multi-Agent Reinforcement Learning to simulate fish schooling behavior,overcoming the challenges of tuning parameters in traditional models and addressing the limitations of single-agent methods in multi-agent environments.Based on this foundation,a novel Graph Convolutional Networks(GCN)-Critic MADDPG algorithm leveraging GCN is proposed to enhance cooperation among agents in a multi-agent system.Simulation experiments demonstrate that,compared to traditional single-agent algorithms,the proposed method not only exhibits significant advantages in terms of convergence speed and stability but also achieves tighter group formations and more naturally aligned Milling behavior.Additionally,a fish school self-organizing behavior research platform based on an event-triggered mechanism has been developed,providing a robust tool for exploring dynamic behavioral changes under various conditions.
基金supported by the National Natural Science Foundation of China(52007009)Natural Science Foundation of Excellent Youth Project of Hunan Province of China(2023JJ20039)Science and Technology Projects of State Grid Hunan Provincial Electric Power Co.,Ltd.(5216A522001K,SGHNDK00PWJS2310173).
文摘In the islanded operation of distribution networks,due to the mismatch of line impedance at the inverter output,conventional droop control leads to inaccurate power sharing according to capacity,resulting in voltage and frequency fluctuations under minor external disturbances.To address this issue,this paper introduces an enhanced scheme for power sharing and voltage-frequency control.First,to solve the power distribution problem,we propose an adaptive virtual impedance control based on multi-agent consensus,which allows for precise active and reactive power allocation without requiring feeder impedance knowledge.Moreover,a novel consensus-based voltage and frequency control is proposed to correct the voltage deviation inherent in droop control and virtual impedance methods.This strategy maintains voltage and frequency stability even during communication disruptions and enhances system robustness.Additionally,a small-signal model is established for system stability analysis,and the control parameters are optimized.Simulation results validate the effectiveness of the proposed control scheme.
基金supported in part by the National Natural Science Foundation of China under Grants 62001225,62071236,62071234 and U22A2002in part by the Major Science and Technology plan of Hainan Province under Grant ZDKJ2021022+1 种基金in part by the Scientific Research Fund Project of Hainan University under Grant KYQD(ZR)-21008in part by the Key Technologies R&D Program of Jiangsu(Prospective and Key Technologies for Industry)under Grants BE2023022 and BE2023022-2.
文摘The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs.
基金funded by the National Key Research and Development Program of China under Grant 2019YFB1803301Beijing Natural Science Foundation (L202002)。
文摘Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks.