Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,...Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness.展开更多
This paper presents a comprehensive overview of distributed Nash equilibrium(NE)seeking algorithms in non-cooperative games for multiagent systems(MASs),with a distinct emphasis on the dynamic control perspective.It s...This paper presents a comprehensive overview of distributed Nash equilibrium(NE)seeking algorithms in non-cooperative games for multiagent systems(MASs),with a distinct emphasis on the dynamic control perspective.It specifically focuses on the research addressing distributed NE seeking problems in which agents are governed by heterogeneous dynamics.The paper begins by introducing fundamental concepts of general non-cooperative games and the NE,along with definitions of specific game structures such as aggregative games and multi-cluster games.It then systematically reviews existing studies on distributed NE seeking for various classes of MASs from the viewpoint of agent dynamics,including first-order,second-order,high-order,linear,and Euler-Lagrange(EL)systems.Furthermore,the paper highlights practical applications of these theoretical advances in cooperative control scenarios involving autonomous systems with complex dynamics,such as autonomous surface vessels,autonomous aerial vehicles,and other autonomous vehicles.Finally,the paper outlines several promising directions for future research.展开更多
The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address th...The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios.展开更多
Aiming at the flexible manufacturing system with multi-machining and multi-assembly equipment, a new scheduling algorithm is proposed to decompose the assembly structure of the products, thus obtaining simple scheduli...Aiming at the flexible manufacturing system with multi-machining and multi-assembly equipment, a new scheduling algorithm is proposed to decompose the assembly structure of the products, thus obtaining simple scheduling problems and forming the cOrrespOnding agents. Then, the importance and the restriction of each agent are cOnsidered, to obtain an order of simple scheduling problems based on the cooperation game theory. With this order, the scheduling of sub-questions is implemented in term of rules, and the almost optimal scheduling results for meeting the restriction can be obtained. Experimental results verify the effectiveness of the proposed scheduling algorithm.展开更多
The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual incom...The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values.展开更多
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M...In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.展开更多
In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control p...In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.展开更多
Multi-agent systems can solve scientific issues related to complex systems that are difficult or impossible for a single agent to solve through mutual collaboration and cooperation optimization.In a multi-agent system...Multi-agent systems can solve scientific issues related to complex systems that are difficult or impossible for a single agent to solve through mutual collaboration and cooperation optimization.In a multi-agent system,agents with a certain degree of autonomy generate complex interactions due to the correlation and coordination,which is manifested as cooperative/competitive behavior.This survey focuses on multi-agent cooperative optimization and cooperative/non-cooperative games.Starting from cooperative optimization,the studies on distributed optimization and federated optimization are summarized.The survey mainly focuses on distributed online optimization and its application in privacy protection,and overviews federated optimization from the perspective of privacy protection me-chanisms.Then,cooperative games and non-cooperative games are introduced to expand the cooperative optimization problems from two aspects of minimizing global costs and minimizing individual costs,respectively.Multi-agent cooperative and non-cooperative behaviors are modeled by games from both static and dynamic aspects,according to whether each player can make decisions based on the information of other players.Finally,future directions for cooperative optimization,cooperative/non-cooperative games,and their applications are discussed.展开更多
The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-ma...The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
Multi-agent reinforcement learning holds tremendous potential for revolutionizing intelligent systems across diverse domains.However,it is also concomitant with a set of formidable challenges,which include the effecti...Multi-agent reinforcement learning holds tremendous potential for revolutionizing intelligent systems across diverse domains.However,it is also concomitant with a set of formidable challenges,which include the effective allocation of credit values to each agent,real-time collaboration among heterogeneous agents,and an appropriate reward function to guide agent behavior.To handle these issues,we propose an innovative solution named the Graph Attention Counterfactual Multiagent Actor–Critic algorithm(GACMAC).This algorithm encompasses several key components:First,it employs a multiagent actor–critic framework along with counterfactual baselines to assess the individual actions of each agent.Second,it integrates a graph attention network to enhance real-time collaboration among agents,enabling heterogeneous agents to effectively share information during handling tasks.Third,it incorporates prior human knowledge through a potential-based reward shaping method,thereby elevating the convergence speed and stability of the algorithm.We tested our algorithm on the StarCraft Multi-Agent Challenge(SMAC)platform,which is a recognized platform for testing multiagent algorithms,and our algorithm achieved a win rate of over 95%on the platform,comparable to the current state-of-the-art multi-agent controllers.展开更多
This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method...This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.展开更多
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj...This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.展开更多
Consensus theory and noncooperative game theory respectively deal with cooperative and noncooperative interactions among multiple players/agents. They provide a natural framework for road pricing design, since each mo...Consensus theory and noncooperative game theory respectively deal with cooperative and noncooperative interactions among multiple players/agents. They provide a natural framework for road pricing design, since each motorist may myopically optimize his or her own utility as a function of road price and collectively communicate with his or her friends and neighbors on traffic situation at the same time. This paper considers the road pricing design by using game theory and consensus theory. For the case where a system supervisor broadcasts information on the overall system to each agent, we present a variant of standard fictitious play called average strategy fictitious play(ASFP) for large-scale repeated congestion games.Only a weighted running average of all other players actions is assumed to be available to each player. The ASFP reduces the burden of both information gathering and information processing for each player. Compared to the joint strategy fictitious play(JSFP) studied in the literature, the updating process of utility functions for each player is avoided. We prove that there exists at least one pure strategy Nash equilibrium for the congestion game under investigation, and the players actions generated by the ASFP with inertia(players reluctance to change their previous actions) converge to a Nash equilibrium almost surely. For the case without broadcasting, a consensus protocol is introduced for individual agents to estimate the percentage of players choosing each resource, and the convergence property of players action profile is still ensured. The results are applied to road pricing design to achieve socially local optimal trip timing. Simulation results are provided based on the real traffic data for the Singapore case study.展开更多
This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints.In order to obtain the optimal strategy of each agent,it is necessary to solve a se...This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints.In order to obtain the optimal strategy of each agent,it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman(HJB)equations.It is very difficult to solve HJB equations by the traditional method.The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained.In this paper,an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents.Actually,this algorithm is to find the optimal solution of Bellman equations online.This solution employs a distributed policy iteration process,using only the local information available to each agent.It can be proved that under certain conditions,when each agent updates its own strategy simultaneously,the whole multi-agent system will reach Nash equilibrium.In the process of algorithm implementation,for each agent,two layers of neural networks are used to fit the value function and control strategy,respectively.Finally,a simulation example is given to show the effectiveness of our method.展开更多
This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing...This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.展开更多
A soccer robot system (HIT 1) was built to participate in MIROSOT_China99 held in Harbin Institute of Technology. Robot soccer game is a very complex robot application that incorporates real time vision system, robot ...A soccer robot system (HIT 1) was built to participate in MIROSOT_China99 held in Harbin Institute of Technology. Robot soccer game is a very complex robot application that incorporates real time vision system, robot control, wireless communication and control of multiple robots. In the paper, we present the design and the hardware architecture and software architecture of our distributed multiple robot system.展开更多
The motion planning problem for multi-agent systems becomes particularly challenging when humans or human-controlled robots are present in a mixed environment.To address this challenge,this paper presents an interacti...The motion planning problem for multi-agent systems becomes particularly challenging when humans or human-controlled robots are present in a mixed environment.To address this challenge,this paper presents an interaction-aware motion planning approach based on game theory in a receding-horizon manner Leveraging the framework provided by dynamic potential games for handling the interactions among agents,this approach formulates the multi-agent motion planning problem as a differential potential game,highlighting the effectiveness of constrained potential games in facilitating interactive motion planning among agents.Furthermore,online learning techniques are incorporated to dynamically learn the unknown preferences and models of humans or human-controlled robots through the analysis of observed data.To evaluate the effectiveness of the proposed approach,numerical simulations are conducted,demonstrating its capability to generate interactive trajectories for all agents,including humans and human-controlled agents,operating within the mixed environment.The simulation results illustrate the effectiveness of the proposed approach in handling the complexities of multi-agent motion planning in real-world scenarios.展开更多
Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form ...Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form of alliances,introduces neighboring countries’exchange rates into the cross-border multi-agent power-trading market and proposes a method to study each agent’s dynamic decision-making behavior based on evolutionary game theory.To this end,this study uses three national agents as examples,constructs a tripartite evolutionary game model,and analyzes the evolution process of the decision-making behavior of each agent member state under the initial willingness value,cost of payment,and additional revenue of the alliance.This research helps realize cross-border energy operations so that the transaction agent can achieve greater trade profits and provides a theoretical basis for cooperation and stability between multiple agents.展开更多
基金funded by National Natural Science Foundation of China No.62302520.
文摘Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness.
基金National Natural Science Foundation of China(62325304).
文摘This paper presents a comprehensive overview of distributed Nash equilibrium(NE)seeking algorithms in non-cooperative games for multiagent systems(MASs),with a distinct emphasis on the dynamic control perspective.It specifically focuses on the research addressing distributed NE seeking problems in which agents are governed by heterogeneous dynamics.The paper begins by introducing fundamental concepts of general non-cooperative games and the NE,along with definitions of specific game structures such as aggregative games and multi-cluster games.It then systematically reviews existing studies on distributed NE seeking for various classes of MASs from the viewpoint of agent dynamics,including first-order,second-order,high-order,linear,and Euler-Lagrange(EL)systems.Furthermore,the paper highlights practical applications of these theoretical advances in cooperative control scenarios involving autonomous systems with complex dynamics,such as autonomous surface vessels,autonomous aerial vehicles,and other autonomous vehicles.Finally,the paper outlines several promising directions for future research.
基金funded in part by the Humanities and Social Sciences Planning Foundation of Ministry of Education of China under Grant No.24YJAZH123National Undergraduate Innovation and Entrepreneurship Training Program of China under Grant No.202510347069the Huzhou Science and Technology Planning Foundation under Grant No.2023GZ04.
文摘The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios.
文摘Aiming at the flexible manufacturing system with multi-machining and multi-assembly equipment, a new scheduling algorithm is proposed to decompose the assembly structure of the products, thus obtaining simple scheduling problems and forming the cOrrespOnding agents. Then, the importance and the restriction of each agent are cOnsidered, to obtain an order of simple scheduling problems based on the cooperation game theory. With this order, the scheduling of sub-questions is implemented in term of rules, and the almost optimal scheduling results for meeting the restriction can be obtained. Experimental results verify the effectiveness of the proposed scheduling algorithm.
基金supported by the National Natural Science Foundation of China(61503407,61806219,61703426,61876189,61703412)the China Postdoctoral Science Foundation(2016 M602996)。
文摘The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values.
基金supported by the National Key R&D Program of China(2017YFB1400105).
文摘In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.
文摘In this paper, an online optimal distributed learning algorithm is proposed to solve leader-synchronization problem of nonlinear multi-agent differential graphical games. Each player approximates its optimal control policy using a single-network approximate dynamic programming(ADP) where only one critic neural network(NN) is employed instead of typical actorcritic structure composed of two NNs. The proposed distributed weight tuning laws for critic NNs guarantee stability in the sense of uniform ultimate boundedness(UUB) and convergence of control policies to the Nash equilibrium. In this paper, by introducing novel distributed local operators in weight tuning laws, there is no more requirement for initial stabilizing control policies. Furthermore, the overall closed-loop system stability is guaranteed by Lyapunov stability analysis. Finally, Simulation results show the effectiveness of the proposed algorithm.
基金supported in part by the National Natural Science Foundation of China(Basic Science Center Program:61988101)the Sino-German Center for Research Promotion(M-0066)+2 种基金the International(Regional)Cooperation and Exchange Project(61720106008)the Programme of Introducing Talents of Discipline to Universities(the 111 Project)(B17017)the Program of Shanghai Academic Research Leader(20XD1401300).
文摘Multi-agent systems can solve scientific issues related to complex systems that are difficult or impossible for a single agent to solve through mutual collaboration and cooperation optimization.In a multi-agent system,agents with a certain degree of autonomy generate complex interactions due to the correlation and coordination,which is manifested as cooperative/competitive behavior.This survey focuses on multi-agent cooperative optimization and cooperative/non-cooperative games.Starting from cooperative optimization,the studies on distributed optimization and federated optimization are summarized.The survey mainly focuses on distributed online optimization and its application in privacy protection,and overviews federated optimization from the perspective of privacy protection me-chanisms.Then,cooperative games and non-cooperative games are introduced to expand the cooperative optimization problems from two aspects of minimizing global costs and minimizing individual costs,respectively.Multi-agent cooperative and non-cooperative behaviors are modeled by games from both static and dynamic aspects,according to whether each player can make decisions based on the information of other players.Finally,future directions for cooperative optimization,cooperative/non-cooperative games,and their applications are discussed.
文摘The strategy evolution process of game players is highly uncertain due to random emergent situations and other external disturbances.This paper investigates the issue of strategy interaction and behavioral decision-making among game players in simulated confrontation scenarios within a random interference environment.It considers the possible risks that random disturbances may pose to the autonomous decision-making of game players,as well as the impact of participants’manipulative behaviors on the state changes of the players.A nonlinear mathematical model is established to describe the strategy decision-making process of the participants in this scenario.Subsequently,the strategy selection interaction relationship,strategy evolution stability,and dynamic decision-making process of the game players are investigated and verified by simulation experiments.The results show that maneuver-related parameters and random environmental interference factors have different effects on the selection and evolutionary speed of the agent’s strategies.Especially in a highly uncertain environment,even small information asymmetry or miscalculation may have a significant impact on decision-making.This also confirms the feasibility and effectiveness of the method proposed in the paper,which can better explain the behavioral decision-making process of the agent in the interaction process.This study provides feasibility analysis ideas and theoretical references for improving multi-agent interactive decision-making and the interpretability of the game system model.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
文摘Multi-agent reinforcement learning holds tremendous potential for revolutionizing intelligent systems across diverse domains.However,it is also concomitant with a set of formidable challenges,which include the effective allocation of credit values to each agent,real-time collaboration among heterogeneous agents,and an appropriate reward function to guide agent behavior.To handle these issues,we propose an innovative solution named the Graph Attention Counterfactual Multiagent Actor–Critic algorithm(GACMAC).This algorithm encompasses several key components:First,it employs a multiagent actor–critic framework along with counterfactual baselines to assess the individual actions of each agent.Second,it integrates a graph attention network to enhance real-time collaboration among agents,enabling heterogeneous agents to effectively share information during handling tasks.Third,it incorporates prior human knowledge through a potential-based reward shaping method,thereby elevating the convergence speed and stability of the algorithm.We tested our algorithm on the StarCraft Multi-Agent Challenge(SMAC)platform,which is a recognized platform for testing multiagent algorithms,and our algorithm achieved a win rate of over 95%on the platform,comparable to the current state-of-the-art multi-agent controllers.
基金The National Natural Science Foundation of China(W2431048)The Science and Technology Research Program of Chongqing Municipal Education Commission,China(KJZDK202300807)The Chongqing Natural Science Foundation,China(CSTB2024NSCQQCXMX0052).
文摘This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.
基金supported by the National Natural Science Foundation of China(Nos.12272104,U22B2013).
文摘This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments.
文摘Consensus theory and noncooperative game theory respectively deal with cooperative and noncooperative interactions among multiple players/agents. They provide a natural framework for road pricing design, since each motorist may myopically optimize his or her own utility as a function of road price and collectively communicate with his or her friends and neighbors on traffic situation at the same time. This paper considers the road pricing design by using game theory and consensus theory. For the case where a system supervisor broadcasts information on the overall system to each agent, we present a variant of standard fictitious play called average strategy fictitious play(ASFP) for large-scale repeated congestion games.Only a weighted running average of all other players actions is assumed to be available to each player. The ASFP reduces the burden of both information gathering and information processing for each player. Compared to the joint strategy fictitious play(JSFP) studied in the literature, the updating process of utility functions for each player is avoided. We prove that there exists at least one pure strategy Nash equilibrium for the congestion game under investigation, and the players actions generated by the ASFP with inertia(players reluctance to change their previous actions) converge to a Nash equilibrium almost surely. For the case without broadcasting, a consensus protocol is introduced for individual agents to estimate the percentage of players choosing each resource, and the convergence property of players action profile is still ensured. The results are applied to road pricing design to achieve socially local optimal trip timing. Simulation results are provided based on the real traffic data for the Singapore case study.
基金supported by the National Natural Science Foundation of China(Nos.61773241,61973183)the Shandong Provincial Natural Science Foundation(No.ZR2019MF041).
文摘This paper studies an online iterative algorithm for solving discrete-time multi-agent dynamic graphical games with input constraints.In order to obtain the optimal strategy of each agent,it is necessary to solve a set of coupled Hamilton-Jacobi-Bellman(HJB)equations.It is very difficult to solve HJB equations by the traditional method.The relevant game problem will become more complex if the control input of each agent in the dynamic graphical game is constrained.In this paper,an online iterative algorithm is proposed to find the online solution to dynamic graphical game without the need for drift dynamics of agents.Actually,this algorithm is to find the optimal solution of Bellman equations online.This solution employs a distributed policy iteration process,using only the local information available to each agent.It can be proved that under certain conditions,when each agent updates its own strategy simultaneously,the whole multi-agent system will reach Nash equilibrium.In the process of algorithm implementation,for each agent,two layers of neural networks are used to fit the value function and control strategy,respectively.Finally,a simulation example is given to show the effectiveness of our method.
基金supported by the National Research and Development Program of China under Grant JCKY2018607C019in part by the Key Laboratory Fund of UAV of Northwestern Polytechnical University under Grant 2021JCJQLB0710L.
文摘This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.
基金Supported by the High Technology Research and Developmeent Program of China
文摘A soccer robot system (HIT 1) was built to participate in MIROSOT_China99 held in Harbin Institute of Technology. Robot soccer game is a very complex robot application that incorporates real time vision system, robot control, wireless communication and control of multiple robots. In the paper, we present the design and the hardware architecture and software architecture of our distributed multiple robot system.
基金supported by the A*STAR under its"RIE2025 IAF-PP Advanced ROS2-native Platform Technologies for Cross sectorial Robotics Adoption(M21K1a0104)"programme.
文摘The motion planning problem for multi-agent systems becomes particularly challenging when humans or human-controlled robots are present in a mixed environment.To address this challenge,this paper presents an interaction-aware motion planning approach based on game theory in a receding-horizon manner Leveraging the framework provided by dynamic potential games for handling the interactions among agents,this approach formulates the multi-agent motion planning problem as a differential potential game,highlighting the effectiveness of constrained potential games in facilitating interactive motion planning among agents.Furthermore,online learning techniques are incorporated to dynamically learn the unknown preferences and models of humans or human-controlled robots through the analysis of observed data.To evaluate the effectiveness of the proposed approach,numerical simulations are conducted,demonstrating its capability to generate interactive trajectories for all agents,including humans and human-controlled agents,operating within the mixed environment.The simulation results illustrate the effectiveness of the proposed approach in handling the complexities of multi-agent motion planning in real-world scenarios.
基金National Key R&D Program of China(Grant No.2022YFB2703500)National Natural Science Foundation of China(Grant No.52277104)+2 种基金National Key R&D Program of Yunnan Province(202303AC100003)Applied Basic Research Foundation of Yunnan Province (202301AT070455, 202101AT070080)Revitalizing Talent Support Program of Yunnan Province (KKRD202204024).
文摘Constructing a cross-border power energy system with multiagent power energy as an alliance is important for studying cross-border power-trading markets.This study considers multiple neighboring countries in the form of alliances,introduces neighboring countries’exchange rates into the cross-border multi-agent power-trading market and proposes a method to study each agent’s dynamic decision-making behavior based on evolutionary game theory.To this end,this study uses three national agents as examples,constructs a tripartite evolutionary game model,and analyzes the evolution process of the decision-making behavior of each agent member state under the initial willingness value,cost of payment,and additional revenue of the alliance.This research helps realize cross-border energy operations so that the transaction agent can achieve greater trade profits and provides a theoretical basis for cooperation and stability between multiple agents.