Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experi...Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experiences that hinder convergence,resulting in ineffective training performance for multi‐agent systems.To tackle this issue,a novel reinforcement learning scheme,Mutual Information Oriented Deep Skill Chaining(MioDSC),is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency.These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state.In addition,MioDSC can generate cooperative policies using the options framework,allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning.MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels.The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.展开更多
In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Mu...In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Multi-agent reinforcement learning(MARL)overcomes this limitation by allowing several agents to learn simultaneously within a shared environment,each choosing actions that maximize its own or the group's rewards.By explicitly modeling and exploiting agent-to-agent dynamics,MARL can align those interactions with pedagogical goals such as peer tutoring,collaborative problem-solving,or gamified competition,thus opening richer avenues for adaptive and socially informed learning experiences.This survey investigates the impact of MARL on educational outcomes by examining evidence of its effectiveness in enhancing learner performance,engagement,equity,and reducing teacher workload compared to single agent or traditional approaches.It explores the educational domains and pedagogical problems addressed by MARL,identifies the algorithmic families used,and analyzes their influence on learning.The review also assesses experimental settings and evaluation metrics to determine ecological validity,and outlines current challenges and future research directions in applying MARL to education.展开更多
Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making p...Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.展开更多
Dear Editor,This letter investigates predefined-time optimization problems(OPs) of multi-agent systems(MASs), where the agent of MASs is subject to inequality constraints, and the team objective function accounts for ...Dear Editor,This letter investigates predefined-time optimization problems(OPs) of multi-agent systems(MASs), where the agent of MASs is subject to inequality constraints, and the team objective function accounts for impulse effects. Firstly, to address the inequality constraints,the penalty method is introduced. Then, a novel optimization strategy is developed, which only requires that the team objective function be strongly convex.展开更多
Dear Editor,This letter studies a real-world issue in leader-follower multi-agent systems(MASs)named open topology,which permits the variations of agent set and network connections.Specially,a novel transition process...Dear Editor,This letter studies a real-world issue in leader-follower multi-agent systems(MASs)named open topology,which permits the variations of agent set and network connections.Specially,a novel transition process is developed to explain how the involved variation of network scale affects the dynamic behavior of the MASs.From a resource limited perspective,the distributed saturated impulsive control is then designed,under which some sufficient criteria are integrated into local quasi-consensus performance.We also provide a combined optimization algorithm for all agents to make the estimated domain of initial errors closer to the real one,thereby resulting in less conservativeness.Finally,a numerical example validates our results.展开更多
Dear Editor,This letter is concerned with the problem of time-varying formation tracking for heterogeneous multi-agent systems(MASs) under directed switching networks. For this purpose, our first step is to present so...Dear Editor,This letter is concerned with the problem of time-varying formation tracking for heterogeneous multi-agent systems(MASs) under directed switching networks. For this purpose, our first step is to present some sufficient conditions for the exponential stability of a particular category of switched systems.展开更多
Dear Editor,This letter studies the bipartite consensus tracking problem for heterogeneous multi-agent systems with actuator faults and a leader's unknown time-varying control input. To handle such a problem, the ...Dear Editor,This letter studies the bipartite consensus tracking problem for heterogeneous multi-agent systems with actuator faults and a leader's unknown time-varying control input. To handle such a problem, the continuous fault-tolerant control protocol via observer design is developed. In addition, it is strictly proved that the multi-agent system driven by the designed controllers can still achieve bipartite consensus tracking after faults occur.展开更多
基金National Natural Science Foundation of China,Grant/Award Number:61872171The Belt and Road Special Foundation of the State Key Laboratory of Hydrology‐Water Resources and Hydraulic Engineering,Grant/Award Number:2021490811。
文摘Multi‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents.However,in high‐dimensional continuous spaces,the non‐stationary environment can provide outdated experiences that hinder convergence,resulting in ineffective training performance for multi‐agent systems.To tackle this issue,a novel reinforcement learning scheme,Mutual Information Oriented Deep Skill Chaining(MioDSC),is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency.These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state.In addition,MioDSC can generate cooperative policies using the options framework,allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning.MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels.The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.
文摘In recent years,researchers have leveraged single-agent reinforcement learning to boost educational outcomes and deliver personalized interventions;yet this paradigm provides no capacity for inter-agent interaction.Multi-agent reinforcement learning(MARL)overcomes this limitation by allowing several agents to learn simultaneously within a shared environment,each choosing actions that maximize its own or the group's rewards.By explicitly modeling and exploiting agent-to-agent dynamics,MARL can align those interactions with pedagogical goals such as peer tutoring,collaborative problem-solving,or gamified competition,thus opening richer avenues for adaptive and socially informed learning experiences.This survey investigates the impact of MARL on educational outcomes by examining evidence of its effectiveness in enhancing learner performance,engagement,equity,and reducing teacher workload compared to single agent or traditional approaches.It explores the educational domains and pedagogical problems addressed by MARL,identifies the algorithmic families used,and analyzes their influence on learning.The review also assesses experimental settings and evaluation metrics to determine ecological validity,and outlines current challenges and future research directions in applying MARL to education.
基金co-supported by the National Natural Science Foundation of China(Nos.72371052 and 71871042).
文摘Addressing optimal confrontation methods in multi-agent attack-defense scenarios is a complex challenge.Multi-Agent Reinforcement Learning(MARL)provides an effective framework for tackling sequential decision-making problems,significantly enhancing swarm intelligence in maneuvering.However,applying MARL to unmanned swarms presents two primary challenges.First,defensive agents must balance autonomy with collaboration under limited perception while coordinating against adversaries.Second,current algorithms aim to maximize global or individual rewards,making them sensitive to fluctuations in enemy strategies and environmental changes,especially when rewards are sparse.To tackle these issues,we propose an algorithm of MultiAgent Reinforcement Learning with Layered Autonomy and Collaboration(MARL-LAC)for collaborative confrontations.This algorithm integrates dual twin Critics to mitigate the high variance associated with policy gradients.Furthermore,MARL-LAC employs layered autonomy and collaboration to address multi-objective problems,specifically learning a global reward function for the swarm alongside local reward functions for individual defensive agents.Experimental results demonstrate that MARL-LAC enhances decision-making and collaborative behaviors among agents,outperforming the existing algorithms and emphasizing the importance of layered autonomy and collaboration in multi-agent systems.The observed adversarial behaviors demonstrate that agents using MARL-LAC effectively maintain cohesive formations that conceal their intentions by confusing the offensive agent while successfully encircling the target.
基金supported in part by the National Natural Science Foundation of China(62276119)the Natural Science Foundation of Jiangsu Province(BK20241764)the Postgraduate Research & Practice Innovation Program of Jiangsu Province(KYCX22_2860)
文摘Dear Editor,This letter investigates predefined-time optimization problems(OPs) of multi-agent systems(MASs), where the agent of MASs is subject to inequality constraints, and the team objective function accounts for impulse effects. Firstly, to address the inequality constraints,the penalty method is introduced. Then, a novel optimization strategy is developed, which only requires that the team objective function be strongly convex.
基金supported by the Natural Science Foundation of Jiangsu Province(BK20240009)the National Natural Science Foundation of China(62373105,62373262)Jiangsu Provincial Scientific Research Center of Applied Mathematics(BK20233002).
文摘Dear Editor,This letter studies a real-world issue in leader-follower multi-agent systems(MASs)named open topology,which permits the variations of agent set and network connections.Specially,a novel transition process is developed to explain how the involved variation of network scale affects the dynamic behavior of the MASs.From a resource limited perspective,the distributed saturated impulsive control is then designed,under which some sufficient criteria are integrated into local quasi-consensus performance.We also provide a combined optimization algorithm for all agents to make the estimated domain of initial errors closer to the real one,thereby resulting in less conservativeness.Finally,a numerical example validates our results.
基金supported in part by the National Natural Science Foundation of China(62273255,62350003,62088101)the Shanghai Science and Technology Cooperation Project(22510712000,21550760900)+1 种基金the Shanghai Municipal Science and Technology Major Project(2021SHZDZX0100)the Fundamental Research Funds for the Central Universities
文摘Dear Editor,This letter is concerned with the problem of time-varying formation tracking for heterogeneous multi-agent systems(MASs) under directed switching networks. For this purpose, our first step is to present some sufficient conditions for the exponential stability of a particular category of switched systems.
基金supported by the National Natural Science Foundation of China(62325304,U22B2046,62073079,62376029)the Jiangsu Provincial Scientific Research Center of Applied Mathematics(BK20233002)the China Postdoctoral Science Foundation(2023M730255,2024T171123)
文摘Dear Editor,This letter studies the bipartite consensus tracking problem for heterogeneous multi-agent systems with actuator faults and a leader's unknown time-varying control input. To handle such a problem, the continuous fault-tolerant control protocol via observer design is developed. In addition, it is strictly proved that the multi-agent system driven by the designed controllers can still achieve bipartite consensus tracking after faults occur.