This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consen...This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consensus under average dwell time switching.Then sufficient conditions are derived to guarantee the positive consensus.The gain matrices of the control protocol are described using a matrix decomposition approach and the corresponding computational complexity is reduced by resorting to linear programming and co-positive Lyapunov functions.Finally,two numerical examples are provided to illustrate the results obtained.展开更多
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier...With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.展开更多
This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external di...This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external disturbances.Under the directed topology conditions,an observer-based finite-time control strategy based on adaptive backstepping and is proposed,in which a neural network-based state observer is employed to approximate the unmeasurable system state variables.To address the complexity explosion problem associated with the backstepping method,a finite-time command filter is incorporated,with error compensation signals designed to mitigate the filter-induced errors.Additionally,the Butterworth low-pass filter is introduced to avoid the algebraic ring problem in the design of the controller.The finite-time stability of the closed-loop system is rigorously analyzed with the finite-time Lyapunov stability criterion,validating that all closed-loop signals of the system remain bounded within a finite time.Finally,the effectiveness of the proposed control strategy is verified through a simulation example.展开更多
Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making p...Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.展开更多
Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal ...Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal alignment,temporal consistency,and robust handling of noisy or incomplete inputs across multiple modalities.We propose Multi Agent-Chain of Thought(CoT),a novel multi-agent chain-of-thought reasoning framework where specialized agents for text,vision,and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms.Our architecture incorporates self-reflection modules,conflict resolution protocols,and dynamic rationale alignment to enhance consistency,factual accuracy,and user engagement.The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity.Comprehensive evaluations on Situated Interactive Multi-Modal Conversations(SIMMC)2.0,VisDial v1.0,and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy(p<0.01),chain-of-thought interpretability,and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches.展开更多
The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defendin...The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs.展开更多
Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,...Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness.展开更多
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co...The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.展开更多
This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method...This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.展开更多
Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their inte...Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their intelligent functionalities is crucial for tackling increasingly challenging tasks.This goal resonates with a paradigm shift within the artificial intelligence(AI)community,from“internet AI”to“embodied AI”,and the MASs with embodied AI are referred to as embodied multi-agent systems(EMASs).An EMAS has the potential to acquire generalized competencies through interactions with environments,enabling it to effectively address a variety of tasks and thereby make a substantial contribution to the quest for artificial general intelligence.Despite the burgeoning interest in this domain,a comprehensive review of EMAS has been lacking.This paper offers analysis and synthesis for EMASs from a control perspective,conceptualizing each embodied agent as an entity equipped with a“brain”for decision and a“body”for environmental interaction.System designs are classified into open-loop,closed-loop,and double-loop categories,and EMAS implementations are discussed.Additionally,the current applications and challenges faced by EMASs are summarized and potential avenues for future research in this field are provided.展开更多
In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the M...In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.展开更多
For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with c...For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.展开更多
The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual incom...The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
This paper studies the consensus control of multiagent systems with binary-valued observations.An algorithm alternating estimation and control is proposed.Each agent estimates the states of its neighbors based on a pr...This paper studies the consensus control of multiagent systems with binary-valued observations.An algorithm alternating estimation and control is proposed.Each agent estimates the states of its neighbors based on a projected empirical measure method for a holding time.Based on the estimates,each agent designs the consensus control with a constant gain at some skipping time.The states of the system are updated by the designed control,and the estimation and control design will be repeated.For the estimation,the projected empirical measure method is proposed for the binary-valued observations.The algorithm can ensure the uniform boundedness of the estimates and the mean square error of the estimation is proved to be at the order of the reciprocal of the holding time(the same order as that in the case of accurate outputs).For the consensus control,a constant gain is designed instead of the stochastic approximation based gain in the existing literature for binary-valued observations.And,there is no need to make modification for control since the uniform boundedness of the estimates ensures the uniform boundedness of the agents’states.Finally,the systems updated by the designed control are proved to achieve consensus and the consensus speed is faster than that in the existing literature.Simulations are given to demonstrate the theoretical results.展开更多
基金supported by the National Natural Science Foundation of China(62463007,62463005)the Natural Science Foundation of Hainan Province(625RC710,625MS047)+1 种基金the System Control and Information Processing Education Ministry Key Laboratory Open Funding,China(Scip20240119)the Science Research Funding of Hainan University,China(KYQD(ZR)22180,KYQD(ZR)23180).
文摘This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consensus under average dwell time switching.Then sufficient conditions are derived to guarantee the positive consensus.The gain matrices of the control protocol are described using a matrix decomposition approach and the corresponding computational complexity is reduced by resorting to linear programming and co-positive Lyapunov functions.Finally,two numerical examples are provided to illustrate the results obtained.
文摘With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.
基金supported in part by the Beijing Natural Science Foundation under Grant 4252050in part by the National Science Fund for Distinguished Young Scholars under Grant 62425304in part by the Basic Science Center Programs of NSFC under Grant 62088101.
文摘This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external disturbances.Under the directed topology conditions,an observer-based finite-time control strategy based on adaptive backstepping and is proposed,in which a neural network-based state observer is employed to approximate the unmeasurable system state variables.To address the complexity explosion problem associated with the backstepping method,a finite-time command filter is incorporated,with error compensation signals designed to mitigate the filter-induced errors.Additionally,the Butterworth low-pass filter is introduced to avoid the algebraic ring problem in the design of the controller.The finite-time stability of the closed-loop system is rigorously analyzed with the finite-time Lyapunov stability criterion,validating that all closed-loop signals of the system remain bounded within a finite time.Finally,the effectiveness of the proposed control strategy is verified through a simulation example.
基金co-supported by the National Natural Science Foundation of China(Nos.72371052 and 71871042).
文摘Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.
文摘Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal alignment,temporal consistency,and robust handling of noisy or incomplete inputs across multiple modalities.We propose Multi Agent-Chain of Thought(CoT),a novel multi-agent chain-of-thought reasoning framework where specialized agents for text,vision,and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms.Our architecture incorporates self-reflection modules,conflict resolution protocols,and dynamic rationale alignment to enhance consistency,factual accuracy,and user engagement.The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity.Comprehensive evaluations on Situated Interactive Multi-Modal Conversations(SIMMC)2.0,VisDial v1.0,and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy(p<0.01),chain-of-thought interpretability,and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches.
基金supported in part by the National Natural Science Foundation of China under Grants 62001225,62071236,62071234 and U22A2002in part by the Major Science and Technology plan of Hainan Province under Grant ZDKJ2021022+1 种基金in part by the Scientific Research Fund Project of Hainan University under Grant KYQD(ZR)-21008in part by the Key Technologies R&D Program of Jiangsu(Prospective and Key Technologies for Industry)under Grants BE2023022 and BE2023022-2.
文摘The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs.
基金funded by National Natural Science Foundation of China No.62302520.
文摘Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness.
基金supported by The National Key R&D Program of China(2020YFB0905900):Research on artificial intelligence application of power internet of things.
文摘The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents.
基金The National Natural Science Foundation of China(W2431048)The Science and Technology Research Program of Chongqing Municipal Education Commission,China(KJZDK202300807)The Chongqing Natural Science Foundation,China(CSTB2024NSCQQCXMX0052).
文摘This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.
基金supported in part by National Natural Science Foundation of China(62495095,62088101).
文摘Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their intelligent functionalities is crucial for tackling increasingly challenging tasks.This goal resonates with a paradigm shift within the artificial intelligence(AI)community,from“internet AI”to“embodied AI”,and the MASs with embodied AI are referred to as embodied multi-agent systems(EMASs).An EMAS has the potential to acquire generalized competencies through interactions with environments,enabling it to effectively address a variety of tasks and thereby make a substantial contribution to the quest for artificial general intelligence.Despite the burgeoning interest in this domain,a comprehensive review of EMAS has been lacking.This paper offers analysis and synthesis for EMASs from a control perspective,conceptualizing each embodied agent as an entity equipped with a“brain”for decision and a“body”for environmental interaction.System designs are classified into open-loop,closed-loop,and double-loop categories,and EMAS implementations are discussed.Additionally,the current applications and challenges faced by EMASs are summarized and potential avenues for future research in this field are provided.
基金supported by the National Key R&D Program of China(2017YFB1400105).
文摘In the evolutionary game of the same task for groups,the changes in game rules,personal interests,the crowd size,and external supervision cause uncertain effects on individual decision-making and game results.In the Markov decision framework,a single-task multi-decision evolutionary game model based on multi-agent reinforcement learning is proposed to explore the evolutionary rules in the process of a game.The model can improve the result of a evolutionary game and facilitate the completion of the task.First,based on the multi-agent theory,to solve the existing problems in the original model,a negative feedback tax penalty mechanism is proposed to guide the strategy selection of individuals in the group.In addition,in order to evaluate the evolutionary game results of the group in the model,a calculation method of the group intelligence level is defined.Secondly,the Q-learning algorithm is used to improve the guiding effect of the negative feedback tax penalty mechanism.In the model,the selection strategy of the Q-learning algorithm is improved and a bounded rationality evolutionary game strategy is proposed based on the rule of evolutionary games and the consideration of the bounded rationality of individuals.Finally,simulation results show that the proposed model can effectively guide individuals to choose cooperation strategies which are beneficial to task completion and stability under different negative feedback factor values and different group sizes,so as to improve the group intelligence level.
基金supported by the National Natural Science Foundation of China (61070143 61173088)
文摘For multi-agent reinforcement learning in Markov games, knowledge extraction and sharing are key research problems. State list extracting means to calculate the optimal shared state path from state trajectories with cycles. A state list extracting algorithm checks cyclic state lists of a current state in the state trajectory, condensing the optimal action set of the current state. By reinforcing the optimal action selected, the action policy of cyclic states is optimized gradually. The state list extracting is repeatedly learned and used as the experience knowledge which is shared by teams. Agents speed up the rate of convergence by experience sharing. Competition games of preys and predators are used for the experiments. The results of experiments prove that the proposed algorithms overcome the lack of experience in the initial stage, speed up learning and improve the performance.
基金supported by the National Natural Science Foundation of China(61503407,61806219,61703426,61876189,61703412)the China Postdoctoral Science Foundation(2016 M602996)。
文摘The multi-agent system is the optimal solution to complex intelligent problems. In accordance with the game theory, the concept of loyalty is introduced to analyze the relationship between agents' individual income and global benefits and build the logical architecture of the multi-agent system. Besides, to verify the feasibility of the method, the cyclic neural network is optimized, the bi-directional coordination network is built as the training network for deep learning, and specific training scenes are simulated as the training background. After a certain number of training iterations, the model can learn simple strategies autonomously. Also,as the training time increases, the complexity of learning strategies rises gradually. Strategies such as obstacle avoidance, firepower distribution and collaborative cover are adopted to demonstrate the achievability of the model. The model is verified to be realizable by the examples of obstacle avoidance, fire distribution and cooperative cover. Under the same resource background, the model exhibits better convergence than other deep learning training networks, and it is not easy to fall into the local endless loop.Furthermore, the ability of the learning strategy is stronger than that of the training model based on rules, which is of great practical values.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
基金supported by the National Natural Science Foundation of China(61803370,61622309)the China Postdoctoral Science Foundation(2018M630216)the National Key Research and Development Program of China(2016YFB0901902)
文摘This paper studies the consensus control of multiagent systems with binary-valued observations.An algorithm alternating estimation and control is proposed.Each agent estimates the states of its neighbors based on a projected empirical measure method for a holding time.Based on the estimates,each agent designs the consensus control with a constant gain at some skipping time.The states of the system are updated by the designed control,and the estimation and control design will be repeated.For the estimation,the projected empirical measure method is proposed for the binary-valued observations.The algorithm can ensure the uniform boundedness of the estimates and the mean square error of the estimation is proved to be at the order of the reciprocal of the holding time(the same order as that in the case of accurate outputs).For the consensus control,a constant gain is designed instead of the stochastic approximation based gain in the existing literature for binary-valued observations.And,there is no need to make modification for control since the uniform boundedness of the estimates ensures the uniform boundedness of the agents’states.Finally,the systems updated by the designed control are proved to achieve consensus and the consensus speed is faster than that in the existing literature.Simulations are given to demonstrate the theoretical results.