Multi-agent reinforcement learning(MARL)has proven its effectiveness in cooperative multi-agent systems(MASs)but still faces issues on the curse of dimensionality and learning efficiency.The main difficulty is caused ...Multi-agent reinforcement learning(MARL)has proven its effectiveness in cooperative multi-agent systems(MASs)but still faces issues on the curse of dimensionality and learning efficiency.The main difficulty is caused by the strong inter-agent coupling nature embedded in an MARL problem,which is yet to be fully exploited in existing algorithms.In this work,we recognize a learning graph characterizing the dependence between individual rewards and individual policies.Then we propose a graph-based reward aggregation(GRA)method,which utilizes the inherent coupling relationship among agents to eliminate redundant information.Specifically,GRA passes information among cooperating agents through graph attention networks to obtain aggregated rewards that contribute to the fitting of the value function,making each agent learn a decentralized executable cooperation policy.In addition,we propose a variant of GRA,named GRA-decen,which achieves decentralized training and decentralized execution(DTDE)when each agent only has access to information of partial agents in the learning process.We conduct experiments in different environments and demonstrate the practicality and scalability of our algorithms.展开更多
This paper presents an adaptive multi-agent coordination(AMAC)strategy suitable for complex scenarios,which only requires information exchange between neighbouring robots.Unlike traditional multi-agent coordination me...This paper presents an adaptive multi-agent coordination(AMAC)strategy suitable for complex scenarios,which only requires information exchange between neighbouring robots.Unlike traditional multi-agent coordination methods that are solved by neural dynamics,the proposed strategy displays greater flexibility,adaptability and scalability.Furthermore,the proposed AMAC strategy is reconstructed as a time-varying complex-valued matrix equation.By introducing a dynamic error function,a fixed-time convergent zeroing neural network(FTCZNN)model is designed for the online solution of the AMAC strategy,with its convergence time upper bound derived theoretically.Finally,the effectiveness and applicability of the coordination control method are demonstrated by numerical simulations and physical experiments.Numerical results indicate that this method can reduce the formation error to the order of 10^(-6)within 1.8 s.展开更多
This paper addresses the synchronization of follower agents’state vectors with that of a leader in high-order nonlinear multi-agent systems.The proposed low-complexity control scheme employs high-gain observers to es...This paper addresses the synchronization of follower agents’state vectors with that of a leader in high-order nonlinear multi-agent systems.The proposed low-complexity control scheme employs high-gain observers to estimate higher-order synchronization errors,enabling the controller to rely solely on relative output measurements.This approach significantly reduces the dependence on full-state information,which is often infeasible or costly in practical engineering applications.An output feedback control strategy is developed to overcome these limitations while ensuring robust and effective synchronization.Simulation results are provided to demonstrate the effectiveness of the proposed approach and validate the theoretical findings.展开更多
In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Mu...In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Multi-agent reinforcement learning(MARL)overcomes this limitation by allowing several agents to learn simultaneously within a shared environment,each choosing actions that maximize its own or the group's rewards.By explicitly modeling and exploiting agent-to-agent dynamics,MARL can align those interactions with pedagogical goals such as peer tutoring,collaborative problem-solving,or gamified competition,thus opening richer avenues for adaptive and socially informed learning experiences.This survey investigates the impact of MARL on educational outcomes by examining evidence of its effectiveness in enhancing learner performance,engagement,equity,and reducing teacher workload compared to single agent or traditional approaches.It explores the educational domains and pedagogical problems addressed by MARL,identifies the algorithmic families used,and analyzes their influence on learning.The review also assesses experimental settings and evaluation metrics to determine ecological validity,and outlines current challenges and future research directions in applying MARL to education.展开更多
Numerical simulations and theoretical models are developed in this paper for the Detonation-Wave/Boundary-Layer Interactions(DWBLIs)under reflections.Transient flow fields demonstrate the highly non-stationarity of th...Numerical simulations and theoretical models are developed in this paper for the Detonation-Wave/Boundary-Layer Interactions(DWBLIs)under reflections.Transient flow fields demonstrate the highly non-stationarity of the DWBLIs when Mach Reflection(MR)occur,and subsequent analyses show that the subsonic region introduced by the boundary layer exacerbates the instability.Further quantitative analyses show that viscosity has little effect on propulsive performance and the separation wave can be considered as an oblique detonation wave.Influence parameters to DWBLIs such as combustion chamber height,incoming Mach number,equivalence ratio,and inlet channel length are categorized and studied.Besides simulations,theoretical analytical modeling is established for Regular Reflection(RR)and MR of DWBLIs.Multiple formulas for the separation zone length are obtained according to the mass conservation under different transformation type between inviscid and viscid reflections.Comparison with the numerical simulations verifies the validity of the model and it can be further generalized to the curved DWBLIs.The developed model makes the theoretical solution process of DWBLIs possible and provides the key foundation for further analysis and solution.展开更多
Multi-Agent Systems(MAS),which consist of multiple interacting agents,are crucial in Cyber-Physical Systems(CPS),because they improve system adaptability,efficiency,and robustness through parallel processing and colla...Multi-Agent Systems(MAS),which consist of multiple interacting agents,are crucial in Cyber-Physical Systems(CPS),because they improve system adaptability,efficiency,and robustness through parallel processing and collaboration.However,most existing unsupervised meta-learning methods are centralized and not suitable for multi-agent systems where data are distributed stored and inaccessible to all agents.Meta-GMVAE,based on Variational Autoencoder(VAE)and set-level variational inference,represents a sophisticated unsupervised meta-learning model that improves generative performance by efficiently learning data representations across various tasks,increasing adaptability and reducing sample requirements.Inspired by these advancements,we propose a novel Distributed Unsupervised Meta-Learning(DUML)framework based on Meta-GMVAE and a fusion strategy.Furthermore,we present a DUML algorithm based on Gaussian Mixture Model(DUMLGMM),where the parameters of the Gaussian-mixture are solved by an Expectation-Maximization algorithm.Simulations on Omniglot and Mini Image Net datasets show that DUMLGMM can achieve the performance of the corresponding centralized algorithm and outperform non-cooperative algorithm.展开更多
This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consen...This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consensus under average dwell time switching.Then sufficient conditions are derived to guarantee the positive consensus.The gain matrices of the control protocol are described using a matrix decomposition approach and the corresponding computational complexity is reduced by resorting to linear programming and co-positive Lyapunov functions.Finally,two numerical examples are provided to illustrate the results obtained.展开更多
The advancement of next-generation high-frequency communication systems and stealth detection technologies necessitate the development of efficient,multi-spectrum compatible shielding materials.However,the achievement...The advancement of next-generation high-frequency communication systems and stealth detection technologies necessitate the development of efficient,multi-spectrum compatible shielding materials.However,the achievement of simultaneous high efficiency and low reflectivity across microwave,terahertz,and infrared spectra remains a formidable challenge.Herein,a carbonized MXene/polyimide(C-MXene/PI)aerogel material integrating a spatially coupled hierarchically anisotropic structure with stepwise conductivity gradients was constructed.Electromagnetic waves propagate through the top-down vertical disordered horizontal architecture and progressive conductivity gradient of C-MXene/PI aerogel,undergoing stepwise absorption-dissipation-re-dissipation processes.The C-MXene/PI aerogel exhibits an average electromagnetic interference(EMI)shielding effectiveness of91.0 dB in X-band and a reflection coefficient of 0.40.In the terahertz frequency band,the average EMI shielding performance reaches66.2 dB with a reflection coefficient of 0.33.Furthermore,the heterolayered porous architecture of C-MXene/PI aerogels exhibits low thermal conductivity and reduced infrared emissivity,enabling exceptional infrared stealth capability across the 2-16μm wavelength spectrum.This study provides an feasible strategy for constructing low-reflectivity multi-spectrum compatible shielding materials.展开更多
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier...With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.展开更多
To maximize the profits of power grid operators(GOs),load aggregators(LAs)and electricity customers(ECs),this paper proposes a hierarchical demand response(HDR)framework that considers competing interaction based on m...To maximize the profits of power grid operators(GOs),load aggregators(LAs)and electricity customers(ECs),this paper proposes a hierarchical demand response(HDR)framework that considers competing interaction based on multiagent deep deterministic policy gradient(MaDDPG).The ECs are divided into conventional ECs and the electric vehicles(EVs)which are managed by ECs agent(ECA)and EV agent(EVA)to exploit the flexibility of the HDR framework.Thus,the HDR is a tri-layer model determined by five types of agents engaging in competing interaction to maximize their own profits.To address the limitations of mathematical expression and participation scale in the Stackelberg game within the HDR model,a dynamic interaction mechanism is adopted.Moreover,to tackle the HDR involving various entities,the MaDDPG develops multiple agents to simulation the dynamic competing interactions between each subject as well as solve the problem of continuous action control.Furthermore,MaDDPG adopts soft target update and priority experience replay method to ensure stable and effective training,and makes the exploration strategy comprehensive by using exploration noise.Simulation studies are conducted to verify the performance of the MaDDPG with dynamic interaction mechanism in dealing with multilayer multi-agent continuous action control,compared to the double deep Q network(DDQN),deep Q network(DQN)and dueling DQN.Additionally,comparisons among the proposed HDR with the price based DR(PBDR)and incentive based DR(IBDR)are analyzed to investigate the flexibility of the HDR.展开更多
This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external di...This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external disturbances.Under the directed topology conditions,an observer-based finite-time control strategy based on adaptive backstepping and is proposed,in which a neural network-based state observer is employed to approximate the unmeasurable system state variables.To address the complexity explosion problem associated with the backstepping method,a finite-time command filter is incorporated,with error compensation signals designed to mitigate the filter-induced errors.Additionally,the Butterworth low-pass filter is introduced to avoid the algebraic ring problem in the design of the controller.The finite-time stability of the closed-loop system is rigorously analyzed with the finite-time Lyapunov stability criterion,validating that all closed-loop signals of the system remain bounded within a finite time.Finally,the effectiveness of the proposed control strategy is verified through a simulation example.展开更多
Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making p...Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.展开更多
Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal ...Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal alignment,temporal consistency,and robust handling of noisy or incomplete inputs across multiple modalities.We propose Multi Agent-Chain of Thought(CoT),a novel multi-agent chain-of-thought reasoning framework where specialized agents for text,vision,and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms.Our architecture incorporates self-reflection modules,conflict resolution protocols,and dynamic rationale alignment to enhance consistency,factual accuracy,and user engagement.The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity.Comprehensive evaluations on Situated Interactive Multi-Modal Conversations(SIMMC)2.0,VisDial v1.0,and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy(p<0.01),chain-of-thought interpretability,and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches.展开更多
This paper presents Dual Adaptive Neural Topology(Dual ANT),a distributed dual-network metaadaptive framework that enhances ant-colony-based multi-agent coordination with online introspection,adaptive parameter contro...This paper presents Dual Adaptive Neural Topology(Dual ANT),a distributed dual-network metaadaptive framework that enhances ant-colony-based multi-agent coordination with online introspection,adaptive parameter control,and privacy-preserving interactions.This approach improves standard Ant Colony Optimization(ACO)with two lightweight neural components:a forward network that estimates swarm efficiency in real time and an inverse network that converts these descriptors into parameter adaptations.To preserve the privacy of individual trajectories in shared pheromone maps,we introduce a locally differentially private pheromone update mechanism that adds calibrated noise to each agent’s pheromone deposit while preserving the efficacy of the global pheromone signal.The resulting systemenables agents to dynamically and autonomously adapt their coordination strategies under challenging and dynamic conditions,including varying obstacle layouts,uncertain target locations,and time-varying disturbances.Extensive simulations of large grid-based search tasks demonstrated that Dual ANT achieved faster convergence,higher robustness,and improved scalability compared to advanced baselines such asMulti-StrategyACO and Hierarchical ACO.The meta-adaptive feedback loop compensates for the performance degradation caused by privacy noise and prevents premature stagnation by triggering Levy flight exploration only when necessary.展开更多
The wireless cloud robotic system(WCRS),which fully integrates sensing,communication,computing,and control capabilities as an intelligent agent,is a promising way to achieve intelligent manufacturing due to easy deplo...The wireless cloud robotic system(WCRS),which fully integrates sensing,communication,computing,and control capabilities as an intelligent agent,is a promising way to achieve intelligent manufacturing due to easy deployment and flexible expansion.However,the high-precision control of WCRS requires deterministic wireless communication,which is always challenging in the complex and dynamic radio space.This paper employs the reconfigurable intelligent surface(RIS)to establish a novel RIS-assisted WCRS architecture,where the radio channel is controlled to achieve ultra-reliable,low-delay,and low-jitter communication for high-precision closed-loop motion control.However,control and communication are strongly coupled and should be co-optimized.Fully considering the constraints of control input threshold,control delay deadline,beam phase,antenna power,and information distortion,we establish a stability maximization problem to jointly optimize control input compensation,RIS phase shift,and beamforming.Herein,a new jitter-oriented system stability objective with respect to control error and communication jitter is defined and the closed-form expression of control delay deadline is derived based on the Jensen Inequality and Lyapunov-Krasovskii functional.Due to the time-varying and partial observability of the channel and robot states,we model the problem as a partially observable Markov decision process(POMDP).To solve this complex problem,we propose a multi-agent transfer reinforcement learning algorithm named LSTM-PPO-MATRL,where the LSTM-enhanced proximal policy optimization(PPO)is designed to approximate an optimal solution and the option-guided policy transfer learning is proposed to facilitate the learning process.By centralized training and decentralized execution,LSTM-PPO-MATRL is validated by extensive experiments on MuJoCo tasks for both low-mobility and high-mobility robotic control scenarios.The results demonstrate that LSTM-PPO-MATRL not only realizes high learning efficiency,but also supports low-delay,low-jitter communication for low error control,where 71.9%control accuracy improvement and 68.7%delay jitter reduction are achieved compared to the PPO-MADRL baseline.展开更多
This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method...This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.展开更多
Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their inte...Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their intelligent functionalities is crucial for tackling increasingly challenging tasks.This goal resonates with a paradigm shift within the artificial intelligence(AI)community,from“internet AI”to“embodied AI”,and the MASs with embodied AI are referred to as embodied multi-agent systems(EMASs).An EMAS has the potential to acquire generalized competencies through interactions with environments,enabling it to effectively address a variety of tasks and thereby make a substantial contribution to the quest for artificial general intelligence.Despite the burgeoning interest in this domain,a comprehensive review of EMAS has been lacking.This paper offers analysis and synthesis for EMASs from a control perspective,conceptualizing each embodied agent as an entity equipped with a“brain”for decision and a“body”for environmental interaction.System designs are classified into open-loop,closed-loop,and double-loop categories,and EMAS implementations are discussed.Additionally,the current applications and challenges faced by EMASs are summarized and potential avenues for future research in this field are provided.展开更多
Electromagnetic interference(EMI)shielding materials with superior shielding efficiency and low-reflection properties hold promising potential for utilization across electronic components,precision instruments,and fif...Electromagnetic interference(EMI)shielding materials with superior shielding efficiency and low-reflection properties hold promising potential for utilization across electronic components,precision instruments,and fifth-generation communication equipment.In this study,multistage microcellular waterborne polyurethane(WPU)composites were constructed via gradient induction,layer-by-layer casting,and supercritical carbon dioxide foaming.The gradient-structured WPU/ironcobalt loaded reduced graphene oxide(FeCo@rGO)foam serves as an impedance-matched absorption layer,while the highly conductive WPU/silver loaded glass microspheres(Ag@GM)layer is employed as a reflection layer.Thanks to the incorporation of an asymmetric structure,as well as the introduction of gradient and porous configurations,the composite foam demonstrates excellent conductivity,outstanding EMI SE(74.9 dB),and minimal reflection characteristics(35.28%)in 8.2-12.4 GHz,implying that more than 99.99999%of electromagnetic(EM)waves were blocked and only 35.28%were reflected to the external environment.Interestingly,the reflectivity of the composite foam is reduced to 0.41%at 10.88 GHz due to the resonance for incident and reflected EM waves.Beyond that,the composite foam is characterized by low density(0.47 g/cm^(3))and great stability of EMI shielding properties.This work offers a viable approach for craft-ing lightweight,highly shielding,and minimally reflective EMI shielding composites.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
基金supported in part by the National Natural Science Foundation of China(grants 62203073 and 62573068)the Natural Science Foundation of Chongqing,China(grant CSTB2022NSCQMSX0577)。
文摘Multi-agent reinforcement learning(MARL)has proven its effectiveness in cooperative multi-agent systems(MASs)but still faces issues on the curse of dimensionality and learning efficiency.The main difficulty is caused by the strong inter-agent coupling nature embedded in an MARL problem,which is yet to be fully exploited in existing algorithms.In this work,we recognize a learning graph characterizing the dependence between individual rewards and individual policies.Then we propose a graph-based reward aggregation(GRA)method,which utilizes the inherent coupling relationship among agents to eliminate redundant information.Specifically,GRA passes information among cooperating agents through graph attention networks to obtain aggregated rewards that contribute to the fitting of the value function,making each agent learn a decentralized executable cooperation policy.In addition,we propose a variant of GRA,named GRA-decen,which achieves decentralized training and decentralized execution(DTDE)when each agent only has access to information of partial agents in the learning process.We conduct experiments in different environments and demonstrate the practicality and scalability of our algorithms.
基金supported by the National Natural Science Foundation of China under Grants 61962023,61562029 and 62466019.
文摘This paper presents an adaptive multi-agent coordination(AMAC)strategy suitable for complex scenarios,which only requires information exchange between neighbouring robots.Unlike traditional multi-agent coordination methods that are solved by neural dynamics,the proposed strategy displays greater flexibility,adaptability and scalability.Furthermore,the proposed AMAC strategy is reconstructed as a time-varying complex-valued matrix equation.By introducing a dynamic error function,a fixed-time convergent zeroing neural network(FTCZNN)model is designed for the online solution of the AMAC strategy,with its convergence time upper bound derived theoretically.Finally,the effectiveness and applicability of the coordination control method are demonstrated by numerical simulations and physical experiments.Numerical results indicate that this method can reduce the formation error to the order of 10^(-6)within 1.8 s.
文摘This paper addresses the synchronization of follower agents’state vectors with that of a leader in high-order nonlinear multi-agent systems.The proposed low-complexity control scheme employs high-gain observers to estimate higher-order synchronization errors,enabling the controller to rely solely on relative output measurements.This approach significantly reduces the dependence on full-state information,which is often infeasible or costly in practical engineering applications.An output feedback control strategy is developed to overcome these limitations while ensuring robust and effective synchronization.Simulation results are provided to demonstrate the effectiveness of the proposed approach and validate the theoretical findings.
文摘In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Multi-agent reinforcement learning(MARL)overcomes this limitation by allowing several agents to learn simultaneously within a shared environment,each choosing actions that maximize its own or the group's rewards.By explicitly modeling and exploiting agent-to-agent dynamics,MARL can align those interactions with pedagogical goals such as peer tutoring,collaborative problem-solving,or gamified competition,thus opening richer avenues for adaptive and socially informed learning experiences.This survey investigates the impact of MARL on educational outcomes by examining evidence of its effectiveness in enhancing learner performance,engagement,equity,and reducing teacher workload compared to single agent or traditional approaches.It explores the educational domains and pedagogical problems addressed by MARL,identifies the algorithmic families used,and analyzes their influence on learning.The review also assesses experimental settings and evaluation metrics to determine ecological validity,and outlines current challenges and future research directions in applying MARL to education.
基金support of the National Natural Science Foundation of China(Nos.U20A2069,U21B6003,12302389 and 12472337)the Advanced Aero-Power Innovation Workstation,China(No.HKCX2024-01-017)。
文摘Numerical simulations and theoretical models are developed in this paper for the Detonation-Wave/Boundary-Layer Interactions(DWBLIs)under reflections.Transient flow fields demonstrate the highly non-stationarity of the DWBLIs when Mach Reflection(MR)occur,and subsequent analyses show that the subsonic region introduced by the boundary layer exacerbates the instability.Further quantitative analyses show that viscosity has little effect on propulsive performance and the separation wave can be considered as an oblique detonation wave.Influence parameters to DWBLIs such as combustion chamber height,incoming Mach number,equivalence ratio,and inlet channel length are categorized and studied.Besides simulations,theoretical analytical modeling is established for Regular Reflection(RR)and MR of DWBLIs.Multiple formulas for the separation zone length are obtained according to the mass conservation under different transformation type between inviscid and viscid reflections.Comparison with the numerical simulations verifies the validity of the model and it can be further generalized to the curved DWBLIs.The developed model makes the theoretical solution process of DWBLIs possible and provides the key foundation for further analysis and solution.
基金supported by the National Natural Science Foundation of China Youth Fund(No.62101579)。
文摘Multi-Agent Systems(MAS),which consist of multiple interacting agents,are crucial in Cyber-Physical Systems(CPS),because they improve system adaptability,efficiency,and robustness through parallel processing and collaboration.However,most existing unsupervised meta-learning methods are centralized and not suitable for multi-agent systems where data are distributed stored and inaccessible to all agents.Meta-GMVAE,based on Variational Autoencoder(VAE)and set-level variational inference,represents a sophisticated unsupervised meta-learning model that improves generative performance by efficiently learning data representations across various tasks,increasing adaptability and reducing sample requirements.Inspired by these advancements,we propose a novel Distributed Unsupervised Meta-Learning(DUML)framework based on Meta-GMVAE and a fusion strategy.Furthermore,we present a DUML algorithm based on Gaussian Mixture Model(DUMLGMM),where the parameters of the Gaussian-mixture are solved by an Expectation-Maximization algorithm.Simulations on Omniglot and Mini Image Net datasets show that DUMLGMM can achieve the performance of the corresponding centralized algorithm and outperform non-cooperative algorithm.
基金supported by the National Natural Science Foundation of China(62463007,62463005)the Natural Science Foundation of Hainan Province(625RC710,625MS047)+1 种基金the System Control and Information Processing Education Ministry Key Laboratory Open Funding,China(Scip20240119)the Science Research Funding of Hainan University,China(KYQD(ZR)22180,KYQD(ZR)23180).
文摘This paper focuses on the leader-following positive consensus problems of heterogeneous switched multi-agent systems.First,a state-feedback controller with dynamic compensation is introduced to achieve positive consensus under average dwell time switching.Then sufficient conditions are derived to guarantee the positive consensus.The gain matrices of the control protocol are described using a matrix decomposition approach and the corresponding computational complexity is reduced by resorting to linear programming and co-positive Lyapunov functions.Finally,two numerical examples are provided to illustrate the results obtained.
基金supported by the Fundamental Research Funds for the Central Universities under No.2024KQ130the National Natural Science Foundation of China(No.52373259)。
文摘The advancement of next-generation high-frequency communication systems and stealth detection technologies necessitate the development of efficient,multi-spectrum compatible shielding materials.However,the achievement of simultaneous high efficiency and low reflectivity across microwave,terahertz,and infrared spectra remains a formidable challenge.Herein,a carbonized MXene/polyimide(C-MXene/PI)aerogel material integrating a spatially coupled hierarchically anisotropic structure with stepwise conductivity gradients was constructed.Electromagnetic waves propagate through the top-down vertical disordered horizontal architecture and progressive conductivity gradient of C-MXene/PI aerogel,undergoing stepwise absorption-dissipation-re-dissipation processes.The C-MXene/PI aerogel exhibits an average electromagnetic interference(EMI)shielding effectiveness of91.0 dB in X-band and a reflection coefficient of 0.40.In the terahertz frequency band,the average EMI shielding performance reaches66.2 dB with a reflection coefficient of 0.33.Furthermore,the heterolayered porous architecture of C-MXene/PI aerogels exhibits low thermal conductivity and reduced infrared emissivity,enabling exceptional infrared stealth capability across the 2-16μm wavelength spectrum.This study provides an feasible strategy for constructing low-reflectivity multi-spectrum compatible shielding materials.
文摘With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.
基金supported by the National Natural Science Foundation of China(No.52477097)the GuangDong Basic and Applied Basic Research Foundation(2023A1515240014)the State Key Laboratory of Advanced Electromagnetic Technology(Grant No.AET 2024KF005).
文摘To maximize the profits of power grid operators(GOs),load aggregators(LAs)and electricity customers(ECs),this paper proposes a hierarchical demand response(HDR)framework that considers competing interaction based on multiagent deep deterministic policy gradient(MaDDPG).The ECs are divided into conventional ECs and the electric vehicles(EVs)which are managed by ECs agent(ECA)and EV agent(EVA)to exploit the flexibility of the HDR framework.Thus,the HDR is a tri-layer model determined by five types of agents engaging in competing interaction to maximize their own profits.To address the limitations of mathematical expression and participation scale in the Stackelberg game within the HDR model,a dynamic interaction mechanism is adopted.Moreover,to tackle the HDR involving various entities,the MaDDPG develops multiple agents to simulation the dynamic competing interactions between each subject as well as solve the problem of continuous action control.Furthermore,MaDDPG adopts soft target update and priority experience replay method to ensure stable and effective training,and makes the exploration strategy comprehensive by using exploration noise.Simulation studies are conducted to verify the performance of the MaDDPG with dynamic interaction mechanism in dealing with multilayer multi-agent continuous action control,compared to the double deep Q network(DDQN),deep Q network(DQN)and dueling DQN.Additionally,comparisons among the proposed HDR with the price based DR(PBDR)and incentive based DR(IBDR)are analyzed to investigate the flexibility of the HDR.
基金supported in part by the Beijing Natural Science Foundation under Grant 4252050in part by the National Science Fund for Distinguished Young Scholars under Grant 62425304in part by the Basic Science Center Programs of NSFC under Grant 62088101.
文摘This paper investigates the consensus tracking control problem for high order nonlinear multi-agent systems subject to non-affine faults,partial measurable states,uncertain control coefficients,and unknown external disturbances.Under the directed topology conditions,an observer-based finite-time control strategy based on adaptive backstepping and is proposed,in which a neural network-based state observer is employed to approximate the unmeasurable system state variables.To address the complexity explosion problem associated with the backstepping method,a finite-time command filter is incorporated,with error compensation signals designed to mitigate the filter-induced errors.Additionally,the Butterworth low-pass filter is introduced to avoid the algebraic ring problem in the design of the controller.The finite-time stability of the closed-loop system is rigorously analyzed with the finite-time Lyapunov stability criterion,validating that all closed-loop signals of the system remain bounded within a finite time.Finally,the effectiveness of the proposed control strategy is verified through a simulation example.
基金co-supported by the National Natural Science Foundation of China(Nos.72371052 and 71871042).
文摘Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.
文摘Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities.Current approaches struggle with crossmodal alignment,temporal consistency,and robust handling of noisy or incomplete inputs across multiple modalities.We propose Multi Agent-Chain of Thought(CoT),a novel multi-agent chain-of-thought reasoning framework where specialized agents for text,vision,and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms.Our architecture incorporates self-reflection modules,conflict resolution protocols,and dynamic rationale alignment to enhance consistency,factual accuracy,and user engagement.The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity.Comprehensive evaluations on Situated Interactive Multi-Modal Conversations(SIMMC)2.0,VisDial v1.0,and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy(p<0.01),chain-of-thought interpretability,and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches.
基金funded by the Deanship of Scientific Research at Northern Border University,Arar,Saudi Arabia,under project number NBU-FFR-2026-2441-02.
文摘This paper presents Dual Adaptive Neural Topology(Dual ANT),a distributed dual-network metaadaptive framework that enhances ant-colony-based multi-agent coordination with online introspection,adaptive parameter control,and privacy-preserving interactions.This approach improves standard Ant Colony Optimization(ACO)with two lightweight neural components:a forward network that estimates swarm efficiency in real time and an inverse network that converts these descriptors into parameter adaptations.To preserve the privacy of individual trajectories in shared pheromone maps,we introduce a locally differentially private pheromone update mechanism that adds calibrated noise to each agent’s pheromone deposit while preserving the efficacy of the global pheromone signal.The resulting systemenables agents to dynamically and autonomously adapt their coordination strategies under challenging and dynamic conditions,including varying obstacle layouts,uncertain target locations,and time-varying disturbances.Extensive simulations of large grid-based search tasks demonstrated that Dual ANT achieved faster convergence,higher robustness,and improved scalability compared to advanced baselines such asMulti-StrategyACO and Hierarchical ACO.The meta-adaptive feedback loop compensates for the performance degradation caused by privacy noise and prevents premature stagnation by triggering Levy flight exploration only when necessary.
基金supported in part by the National Natural Science Foundation of China(62522320,92267108,62173322)Liaoning Revitalization Talents Program(XLYC2403062)the Science and Technology Program of Liaoning Province(2023JH3/10200004,2022JH25/10100005)。
文摘The wireless cloud robotic system(WCRS),which fully integrates sensing,communication,computing,and control capabilities as an intelligent agent,is a promising way to achieve intelligent manufacturing due to easy deployment and flexible expansion.However,the high-precision control of WCRS requires deterministic wireless communication,which is always challenging in the complex and dynamic radio space.This paper employs the reconfigurable intelligent surface(RIS)to establish a novel RIS-assisted WCRS architecture,where the radio channel is controlled to achieve ultra-reliable,low-delay,and low-jitter communication for high-precision closed-loop motion control.However,control and communication are strongly coupled and should be co-optimized.Fully considering the constraints of control input threshold,control delay deadline,beam phase,antenna power,and information distortion,we establish a stability maximization problem to jointly optimize control input compensation,RIS phase shift,and beamforming.Herein,a new jitter-oriented system stability objective with respect to control error and communication jitter is defined and the closed-form expression of control delay deadline is derived based on the Jensen Inequality and Lyapunov-Krasovskii functional.Due to the time-varying and partial observability of the channel and robot states,we model the problem as a partially observable Markov decision process(POMDP).To solve this complex problem,we propose a multi-agent transfer reinforcement learning algorithm named LSTM-PPO-MATRL,where the LSTM-enhanced proximal policy optimization(PPO)is designed to approximate an optimal solution and the option-guided policy transfer learning is proposed to facilitate the learning process.By centralized training and decentralized execution,LSTM-PPO-MATRL is validated by extensive experiments on MuJoCo tasks for both low-mobility and high-mobility robotic control scenarios.The results demonstrate that LSTM-PPO-MATRL not only realizes high learning efficiency,but also supports low-delay,low-jitter communication for low error control,where 71.9%control accuracy improvement and 68.7%delay jitter reduction are achieved compared to the PPO-MADRL baseline.
基金The National Natural Science Foundation of China(W2431048)The Science and Technology Research Program of Chongqing Municipal Education Commission,China(KJZDK202300807)The Chongqing Natural Science Foundation,China(CSTB2024NSCQQCXMX0052).
文摘This paper addresses the consensus problem of nonlinear multi-agent systems subject to external disturbances and uncertainties under denial-ofservice(DoS)attacks.Firstly,an observer-based state feedback control method is employed to achieve secure control by estimating the system's state in real time.Secondly,by combining a memory-based adaptive eventtriggered mechanism with neural networks,the paper aims to approximate the nonlinear terms in the networked system and efficiently conserve system resources.Finally,based on a two-degree-of-freedom model of a vehicle affected by crosswinds,this paper constructs a multi-unmanned ground vehicle(Multi-UGV)system to validate the effectiveness of the proposed method.Simulation results show that the proposed control strategy can effectively handle external disturbances such as crosswinds in practical applications,ensuring the stability and reliable operation of the Multi-UGV system.
基金supported in part by National Natural Science Foundation of China(62495095,62088101).
文摘Multi-agent systems(MASs)have demonstrated significant achievements in a wide range of tasks,leveraging their capacity for coordination and adaptation within complex environments.Moreover,the enhancement of their intelligent functionalities is crucial for tackling increasingly challenging tasks.This goal resonates with a paradigm shift within the artificial intelligence(AI)community,from“internet AI”to“embodied AI”,and the MASs with embodied AI are referred to as embodied multi-agent systems(EMASs).An EMAS has the potential to acquire generalized competencies through interactions with environments,enabling it to effectively address a variety of tasks and thereby make a substantial contribution to the quest for artificial general intelligence.Despite the burgeoning interest in this domain,a comprehensive review of EMAS has been lacking.This paper offers analysis and synthesis for EMASs from a control perspective,conceptualizing each embodied agent as an entity equipped with a“brain”for decision and a“body”for environmental interaction.System designs are classified into open-loop,closed-loop,and double-loop categories,and EMAS implementations are discussed.Additionally,the current applications and challenges faced by EMASs are summarized and potential avenues for future research in this field are provided.
基金supported by the Natural Science Foundation of Anhui Province(No.2308085QE146 and 2208085ME116)the National Natural Science Foundation of China(No.52173039)+1 种基金the Natural Science Foundation of Jiangsu Province(No.BK20210894)the Anhui Provincial Universities Outstanding Youth Research Project(No.2023AH020018).
文摘Electromagnetic interference(EMI)shielding materials with superior shielding efficiency and low-reflection properties hold promising potential for utilization across electronic components,precision instruments,and fifth-generation communication equipment.In this study,multistage microcellular waterborne polyurethane(WPU)composites were constructed via gradient induction,layer-by-layer casting,and supercritical carbon dioxide foaming.The gradient-structured WPU/ironcobalt loaded reduced graphene oxide(FeCo@rGO)foam serves as an impedance-matched absorption layer,while the highly conductive WPU/silver loaded glass microspheres(Ag@GM)layer is employed as a reflection layer.Thanks to the incorporation of an asymmetric structure,as well as the introduction of gradient and porous configurations,the composite foam demonstrates excellent conductivity,outstanding EMI SE(74.9 dB),and minimal reflection characteristics(35.28%)in 8.2-12.4 GHz,implying that more than 99.99999%of electromagnetic(EM)waves were blocked and only 35.28%were reflected to the external environment.Interestingly,the reflectivity of the composite foam is reduced to 0.41%at 10.88 GHz due to the resonance for incident and reflected EM waves.Beyond that,the composite foam is characterized by low density(0.47 g/cm^(3))and great stability of EMI shielding properties.This work offers a viable approach for craft-ing lightweight,highly shielding,and minimally reflective EMI shielding composites.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.