Aiming at the flexible manufacturing system with multi-machining and multi-assembly equipment, a new scheduling algorithm is proposed to decompose the assembly structure of the products, thus obtaining simple scheduli...Aiming at the flexible manufacturing system with multi-machining and multi-assembly equipment, a new scheduling algorithm is proposed to decompose the assembly structure of the products, thus obtaining simple scheduling problems and forming the cOrrespOnding agents. Then, the importance and the restriction of each agent are cOnsidered, to obtain an order of simple scheduling problems based on the cooperation game theory. With this order, the scheduling of sub-questions is implemented in term of rules, and the almost optimal scheduling results for meeting the restriction can be obtained. Experimental results verify the effectiveness of the proposed scheduling algorithm.展开更多
In multi-agent systems, autonomous agents may form coalition to increase the efficiency of problem solving. But the current coalition algorithm is very complex, and cannot satisfy the condition of optimality and stabl...In multi-agent systems, autonomous agents may form coalition to increase the efficiency of problem solving. But the current coalition algorithm is very complex, and cannot satisfy the condition of optimality and stableness simultaneously. To solve the problem, an algorithm that uses the mechanism of distribution according to work for coalition formation is presented, which can achieve global optimal and stable solution in subadditive task oriented domains. The validity of the algorithm is demonstrated by both experiments and theory.展开更多
A general multi-agent architecture is proposed for intelligent decision support system (MAIDSS). The agent in MAIDSS is built based on an extension of BDI framework. Several agents form a team working together on a de...A general multi-agent architecture is proposed for intelligent decision support system (MAIDSS). The agent in MAIDSS is built based on an extension of BDI framework. Several agents form a team working together on a decision problem; several agent teams are defined to stand for the benefits of different people in the real world. The decision making process is based on multi-agent cooperation, and a logical framework for a team of agents cooperating to create the solution for the decision problem is discussed in detail.展开更多
With the release of the electricity sales side,large-scale small-capacity distributed power generation units are connected to the distribution side,forming multi-type market entities such as microgrids,integrated ener...With the release of the electricity sales side,large-scale small-capacity distributed power generation units are connected to the distribution side,forming multi-type market entities such as microgrids,integrated energy systems,and virtual power plants.With the large-scale integration of distributed energy,the energy market under the energy internet is different from a traditional transmission grid.It is currently developing in the direction of diversified entities and commodities,a flat structure,and a flexible and competitive multi-agent market mechanism.In this context,this study analyzes the value of combining blockchain and the electricity market presents the design of a blockchain trading framework for multi-agent cooperation and sharing of the energy internet.The nodes in market transactions are modeled through power system modeling in the physical layer and the transaction consensus strategy in the cyber layer;moreover,the nodes are verified in a modified IEEE 13 testing feeder of a distribution network.A transaction example is demonstrated using the multi-agent cooperation and sharing transaction platform based on the Ethereum private blockchain.展开更多
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-...Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.展开更多
Experts and officials shared their insights on poverty reduction cooperation and sustainable development during the 2025 International Seminar on Global Poverty Reduction Partnerships.
The development of chassis active safety control technology has improved vehicle stability under extreme conditions.However,its cross-system and multi-functional characteristics make the controller difficult to achiev...The development of chassis active safety control technology has improved vehicle stability under extreme conditions.However,its cross-system and multi-functional characteristics make the controller difficult to achieve cooperative goals.In addition,the chassis system,which has high complexity,numerous subsystems,and strong coupling,will also lead to low computing efficiency and poor control effect of the controller.Therefore,this paper proposes a scenario-driven hybrid distributed model predictive control algorithm with variable control topology.This algorithm divides multiple stability regions based on the vehicle’s β−γ phase plane,forming a mapping relationship between the control structure and the vehicle’s state.A control input fusion mechanism within the transition domain is designed to mitigate the problems of system state oscillation and control input jitter caused by switching control structures.Then,a distributed state-space equation with state coupling and input coupling characteristics is constructed,and a weighted local agent cost function in quadratic programming is derived.Through cost coupling,local agents can coordinate global performance goals.Finally,through Simulink/CarSim joint simulation and hardware-in-the-loop(HIL)test,the proposed algorithm is validated to improve vehicle stability while ensuring trajectory tracking accuracy and has good applicability for multi-objective coordinated control.This paper combines the advantages of distributed MPC and decentralized MPC,achieving a balance between approximating the global optimal results and the solution’s efficiency.展开更多
This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing...This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.展开更多
The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defendin...The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs.展开更多
Within the context of ground-air cooperation,the distributed formation trajectory tracking control problems for the Heterogeneous Multi-Agent Systems(HMASs)is studied.First,considering external disturbances and model ...Within the context of ground-air cooperation,the distributed formation trajectory tracking control problems for the Heterogeneous Multi-Agent Systems(HMASs)is studied.First,considering external disturbances and model uncertainties,a graph theory-based formation control protocol is designed for the HMASs consisting of Unmanned Aerial Vehicles(UAVs)and Unmanned Ground Vehicles(UGVs).Subsequently,a formation trajectory tracking control strategy employing adaptive Fractional-Order Sliding Mode Control(FOSMC)method is developed,and a Feedback Multilayer Fuzzy Neural Network(FMFNN)is introduced to estimate the lumped uncertainties.This approach empowers HMASs to adaptively follow the expected trajectory and adopt the designated formation configuration,even in the presence of various uncertainties.Additionally,an event-triggered mechanism is incorporated into the controller to reduce the update frequency of the controller and minimize the communication exchange among the agents,and the absence of Zeno behavior is rigorously demonstrated by an integral inequality analysis.Finally,to confirm the effectiveness of the proposed formation control protocol,some numerical simulations are presented.展开更多
The accomplishment of a complex problem usually involves cooperation between participators with different knowledge background concerned. This paper identifies interdependency between different sub problems (through p...The accomplishment of a complex problem usually involves cooperation between participators with different knowledge background concerned. This paper identifies interdependency between different sub problems (through problem decomposition) as the major factor that influences cooperative relations in multi-Agent systems, based on which we propose an efficient means to measure cooperation coefficient (degree) between different Agents. Then cognitive cooperation between Agents is analyzed which aims at collecting the wisdom of the cognitive community for a systematic solution to the overall problem.展开更多
In this paper, rough set theory is introduced into the interface multi-agent system (MAS) for industrial supervisory system. Taking advantages of rough set in data mining, a cooperation model for MAS is built. Rules...In this paper, rough set theory is introduced into the interface multi-agent system (MAS) for industrial supervisory system. Taking advantages of rough set in data mining, a cooperation model for MAS is built. Rules for avoiding cooperation conflict are deduced. An optimization algorithm is used to enhance security and real time attributes of the system. An application based on the proposed algorithm and rules are given.展开更多
The application of reinforcement learning is widely used by multi-agent systems in recent years. An agent uses a multi-agent system to cooperate with other agents to accomplish the given task, and one agent′s behavio...The application of reinforcement learning is widely used by multi-agent systems in recent years. An agent uses a multi-agent system to cooperate with other agents to accomplish the given task, and one agent′s behavior usually affects the others′ behaviors. In traditional reinforcement learning, one agent takes the others location, so it is difficult to consider the others′ behavior, which decreases the learning efficiency. This paper proposes multi-agent reinforcement learning with cooperation based on eligibility traces, i.e. one agent estimates the other agent′s behavior with the other agent′s eligibility traces. The results of this simulation prove the validity of the proposed learning method.展开更多
The cooperative control and stability analysis problems for the multi-agent system with sampled com- munication are investigated. Distributed state feedback controllers are adopted for the cooperation of networked age...The cooperative control and stability analysis problems for the multi-agent system with sampled com- munication are investigated. Distributed state feedback controllers are adopted for the cooperation of networked agents. A theorem in the form of linear matrix inequalities(LMI) is derived to analyze the system stability. An- other theorem in the form of optimization problem subject to LMI constraints is proposed to design the controller, and then the algorithm is presented. The simulation results verify the validity and the effectiveness of the pro- posed approach.展开更多
With the new characteristics of global cooperation in supply chains being synthetically considered,a hybrid model to the cooperative negotiation process for the order distribution in supply chain is mainly studied.Aft...With the new characteristics of global cooperation in supply chains being synthetically considered,a hybrid model to the cooperative negotiation process for the order distribution in supply chain is mainly studied.After reviewing and analyzing some main domestic and overseas processes in cooperative negotiation modeling in supply chain,some problems are subsequently pointed out.For example,the traditional simple multi-agent system(MAS)frameworks which have some limitations,are not suitable for solving modeling complex systems.To solve these problems,thinking with the aid of the multi-agent structure and complex system modeling,the manufacturing supply chain is taken as an example,and a time Petri net production model is adopted to decompose the materials.And then a cooperative negotiation model for the order distribution in supply chain is constructed based on combining multi-agent techniques with time Petri net modeling.The simulation results reveal that the above model helps solve the problems of cooperative negotiation in supply chains.展开更多
Aim To design and implement a multi-agent cooperative problem solving expert system tool. Methods A blackboard system was adopted in the system as a data sharing and information exchanging center, to coordinate the co...Aim To design and implement a multi-agent cooperative problem solving expert system tool. Methods A blackboard system was adopted in the system as a data sharing and information exchanging center, to coordinate the complex cooperative problem solving. The system was developed in UNIX and MSWindows 95 mixed TCP/IP network environment. Results and Conclusion A prototype system of a multi-agent cooperative expert systems tool is implemented.The experiment demonstrates that the fundamental functions of a cooperative expert systems is realized.展开更多
Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising techn...Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising technique for learning cooperation,most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement.This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms.This method reshapes each UAV’s reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors.And the dependence between UAVs can be directly captured by the Pointwise Mutual Information(PMI)neural network without complicated aggregation statistics.Then,the experience sharing Reciprocal Reward Multi-Agent Actor-Critic(MAAC-R)algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs.Experiments demonstrate that the proposed algorithm can improve the UAVs’cooperation more effectively than the baseline algorithms,and can stimulate a rich form of cooperative tracking behaviors of UAV swarms.Besides,the learned policy can better scale to other scenarios with more UAVs and targets.展开更多
Aiming at the problem on cooperative air-defense of surface warship formation, this paper maps the cooperative airdefense system of systems (SoS) for surface warship formation (CASoSSWF) to the biological immune s...Aiming at the problem on cooperative air-defense of surface warship formation, this paper maps the cooperative airdefense system of systems (SoS) for surface warship formation (CASoSSWF) to the biological immune system (BIS) according to the similarity of the defense mechanism and characteristics between the CASoSSWF and the BIS, and then designs the models of components and the architecture for a monitoring agent, a regulating agent, a killer agent, a pre-warning agent and a communicating agent by making use of the theories and methods of the artificial immune system, the multi-agent system (MAS), the vaccine and the danger theory (DT). Moreover a new immune multi-agent model using vaccine based on DT (IMMUVBDT) for the cooperative air-defense SoS is advanced. The immune response and immune mechanism of the CASoSSWF are analyzed. The model has a capability of memory, evolution, commendable dynamic environment adaptability and self-learning, and embodies adequately the cooperative air-defense mechanism for the CASoSSWF. Therefore it shows a novel idea for the CASoSSWF which can provide conception models for a surface warship formation operation simulation system.展开更多
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight...This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.展开更多
文摘Aiming at the flexible manufacturing system with multi-machining and multi-assembly equipment, a new scheduling algorithm is proposed to decompose the assembly structure of the products, thus obtaining simple scheduling problems and forming the cOrrespOnding agents. Then, the importance and the restriction of each agent are cOnsidered, to obtain an order of simple scheduling problems based on the cooperation game theory. With this order, the scheduling of sub-questions is implemented in term of rules, and the almost optimal scheduling results for meeting the restriction can be obtained. Experimental results verify the effectiveness of the proposed scheduling algorithm.
文摘In multi-agent systems, autonomous agents may form coalition to increase the efficiency of problem solving. But the current coalition algorithm is very complex, and cannot satisfy the condition of optimality and stableness simultaneously. To solve the problem, an algorithm that uses the mechanism of distribution according to work for coalition formation is presented, which can achieve global optimal and stable solution in subadditive task oriented domains. The validity of the algorithm is demonstrated by both experiments and theory.
文摘A general multi-agent architecture is proposed for intelligent decision support system (MAIDSS). The agent in MAIDSS is built based on an extension of BDI framework. Several agents form a team working together on a decision problem; several agent teams are defined to stand for the benefits of different people in the real world. The decision making process is based on multi-agent cooperation, and a logical framework for a team of agents cooperating to create the solution for the decision problem is discussed in detail.
基金the Smart Grid Joint Fund of the National Natural Science Foundation of China(No.U2066209)the Science and Technology Project of the China Electric Power Research Institute(No.AI83-20-002).
文摘With the release of the electricity sales side,large-scale small-capacity distributed power generation units are connected to the distribution side,forming multi-type market entities such as microgrids,integrated energy systems,and virtual power plants.With the large-scale integration of distributed energy,the energy market under the energy internet is different from a traditional transmission grid.It is currently developing in the direction of diversified entities and commodities,a flat structure,and a flexible and competitive multi-agent market mechanism.In this context,this study analyzes the value of combining blockchain and the electricity market presents the design of a blockchain trading framework for multi-agent cooperation and sharing of the energy internet.The nodes in market transactions are modeled through power system modeling in the physical layer and the transaction consensus strategy in the cyber layer;moreover,the nodes are verified in a modified IEEE 13 testing feeder of a distribution network.A transaction example is demonstrated using the multi-agent cooperation and sharing transaction platform based on the Ethereum private blockchain.
基金The National Natural Science Foundation of China(62136008,62293541)The Beijing Natural Science Foundation(4232056)The Beijing Nova Program(20240484514).
文摘Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world.
文摘Experts and officials shared their insights on poverty reduction cooperation and sustainable development during the 2025 International Seminar on Global Poverty Reduction Partnerships.
基金Supported by National Natural Science Foundation of China(Grant Nos.52225212,52272418,U22A20100)National Key Research and Development Program of China(Grant No.2022YFB2503302).
文摘The development of chassis active safety control technology has improved vehicle stability under extreme conditions.However,its cross-system and multi-functional characteristics make the controller difficult to achieve cooperative goals.In addition,the chassis system,which has high complexity,numerous subsystems,and strong coupling,will also lead to low computing efficiency and poor control effect of the controller.Therefore,this paper proposes a scenario-driven hybrid distributed model predictive control algorithm with variable control topology.This algorithm divides multiple stability regions based on the vehicle’s β−γ phase plane,forming a mapping relationship between the control structure and the vehicle’s state.A control input fusion mechanism within the transition domain is designed to mitigate the problems of system state oscillation and control input jitter caused by switching control structures.Then,a distributed state-space equation with state coupling and input coupling characteristics is constructed,and a weighted local agent cost function in quadratic programming is derived.Through cost coupling,local agents can coordinate global performance goals.Finally,through Simulink/CarSim joint simulation and hardware-in-the-loop(HIL)test,the proposed algorithm is validated to improve vehicle stability while ensuring trajectory tracking accuracy and has good applicability for multi-objective coordinated control.This paper combines the advantages of distributed MPC and decentralized MPC,achieving a balance between approximating the global optimal results and the solution’s efficiency.
基金supported by the National Research and Development Program of China under Grant JCKY2018607C019in part by the Key Laboratory Fund of UAV of Northwestern Polytechnical University under Grant 2021JCJQLB0710L.
文摘This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems.
基金supported in part by the National Natural Science Foundation of China under Grants 62001225,62071236,62071234 and U22A2002in part by the Major Science and Technology plan of Hainan Province under Grant ZDKJ2021022+1 种基金in part by the Scientific Research Fund Project of Hainan University under Grant KYQD(ZR)-21008in part by the Key Technologies R&D Program of Jiangsu(Prospective and Key Technologies for Industry)under Grants BE2023022 and BE2023022-2.
文摘The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs.
基金supported by the Beijing Municipal Science&Technology Commission China(No.Z19111000270000)the National Natural Science Foundation of China(Nos.62203050,51774042).
文摘Within the context of ground-air cooperation,the distributed formation trajectory tracking control problems for the Heterogeneous Multi-Agent Systems(HMASs)is studied.First,considering external disturbances and model uncertainties,a graph theory-based formation control protocol is designed for the HMASs consisting of Unmanned Aerial Vehicles(UAVs)and Unmanned Ground Vehicles(UGVs).Subsequently,a formation trajectory tracking control strategy employing adaptive Fractional-Order Sliding Mode Control(FOSMC)method is developed,and a Feedback Multilayer Fuzzy Neural Network(FMFNN)is introduced to estimate the lumped uncertainties.This approach empowers HMASs to adaptively follow the expected trajectory and adopt the designated formation configuration,even in the presence of various uncertainties.Additionally,an event-triggered mechanism is incorporated into the controller to reduce the update frequency of the controller and minimize the communication exchange among the agents,and the absence of Zeno behavior is rigorously demonstrated by an integral inequality analysis.Finally,to confirm the effectiveness of the proposed formation control protocol,some numerical simulations are presented.
基金Supported by the National Natural Science Foun-dation of China (60303025 )and the Natural Science Foundation ofJiangsu Province for Youth Scholar (BK2004411)
文摘The accomplishment of a complex problem usually involves cooperation between participators with different knowledge background concerned. This paper identifies interdependency between different sub problems (through problem decomposition) as the major factor that influences cooperative relations in multi-Agent systems, based on which we propose an efficient means to measure cooperation coefficient (degree) between different Agents. Then cognitive cooperation between Agents is analyzed which aims at collecting the wisdom of the cognitive community for a systematic solution to the overall problem.
基金Project supported by Science Foundation of Shanghai MunicipalCommission of Science and Technology (Grant Nos .025111052 ,04JC14038)
文摘In this paper, rough set theory is introduced into the interface multi-agent system (MAS) for industrial supervisory system. Taking advantages of rough set in data mining, a cooperation model for MAS is built. Rules for avoiding cooperation conflict are deduced. An optimization algorithm is used to enhance security and real time attributes of the system. An application based on the proposed algorithm and rules are given.
文摘The application of reinforcement learning is widely used by multi-agent systems in recent years. An agent uses a multi-agent system to cooperate with other agents to accomplish the given task, and one agent′s behavior usually affects the others′ behaviors. In traditional reinforcement learning, one agent takes the others location, so it is difficult to consider the others′ behavior, which decreases the learning efficiency. This paper proposes multi-agent reinforcement learning with cooperation based on eligibility traces, i.e. one agent estimates the other agent′s behavior with the other agent′s eligibility traces. The results of this simulation prove the validity of the proposed learning method.
基金Supported by the National Natural Science Foundation of China(91016017)the National Aviation Found of China(20115868009)~~
文摘The cooperative control and stability analysis problems for the multi-agent system with sampled com- munication are investigated. Distributed state feedback controllers are adopted for the cooperation of networked agents. A theorem in the form of linear matrix inequalities(LMI) is derived to analyze the system stability. An- other theorem in the form of optimization problem subject to LMI constraints is proposed to design the controller, and then the algorithm is presented. The simulation results verify the validity and the effectiveness of the pro- posed approach.
基金The National Natural Science Foundation of China(No.70401013)the National Key Technology R&D Program of China during the 11th Five-Year Plan Period(No.2006BAH02A06)
文摘With the new characteristics of global cooperation in supply chains being synthetically considered,a hybrid model to the cooperative negotiation process for the order distribution in supply chain is mainly studied.After reviewing and analyzing some main domestic and overseas processes in cooperative negotiation modeling in supply chain,some problems are subsequently pointed out.For example,the traditional simple multi-agent system(MAS)frameworks which have some limitations,are not suitable for solving modeling complex systems.To solve these problems,thinking with the aid of the multi-agent structure and complex system modeling,the manufacturing supply chain is taken as an example,and a time Petri net production model is adopted to decompose the materials.And then a cooperative negotiation model for the order distribution in supply chain is constructed based on combining multi-agent techniques with time Petri net modeling.The simulation results reveal that the above model helps solve the problems of cooperative negotiation in supply chains.
文摘Aim To design and implement a multi-agent cooperative problem solving expert system tool. Methods A blackboard system was adopted in the system as a data sharing and information exchanging center, to coordinate the complex cooperative problem solving. The system was developed in UNIX and MSWindows 95 mixed TCP/IP network environment. Results and Conclusion A prototype system of a multi-agent cooperative expert systems tool is implemented.The experiment demonstrates that the fundamental functions of a cooperative expert systems is realized.
基金funded by the Science and Technology Innovation 2030-Key Project of“New Generation Artificial Intelligence”,China(No.2020AAA0108200)the National Natural Science Foundation of China(No.61906209)。
文摘Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising technique for learning cooperation,most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement.This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms.This method reshapes each UAV’s reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors.And the dependence between UAVs can be directly captured by the Pointwise Mutual Information(PMI)neural network without complicated aggregation statistics.Then,the experience sharing Reciprocal Reward Multi-Agent Actor-Critic(MAAC-R)algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs.Experiments demonstrate that the proposed algorithm can improve the UAVs’cooperation more effectively than the baseline algorithms,and can stimulate a rich form of cooperative tracking behaviors of UAV swarms.Besides,the learned policy can better scale to other scenarios with more UAVs and targets.
文摘Aiming at the problem on cooperative air-defense of surface warship formation, this paper maps the cooperative airdefense system of systems (SoS) for surface warship formation (CASoSSWF) to the biological immune system (BIS) according to the similarity of the defense mechanism and characteristics between the CASoSSWF and the BIS, and then designs the models of components and the architecture for a monitoring agent, a regulating agent, a killer agent, a pre-warning agent and a communicating agent by making use of the theories and methods of the artificial immune system, the multi-agent system (MAS), the vaccine and the danger theory (DT). Moreover a new immune multi-agent model using vaccine based on DT (IMMUVBDT) for the cooperative air-defense SoS is advanced. The immune response and immune mechanism of the CASoSSWF are analyzed. The model has a capability of memory, evolution, commendable dynamic environment adaptability and self-learning, and embodies adequately the cooperative air-defense mechanism for the CASoSSWF. Therefore it shows a novel idea for the CASoSSWF which can provide conception models for a surface warship formation operation simulation system.
基金supported by the National Science and Technology Major Project(2021ZD0112702)the National Natural Science Foundation(NNSF)of China(62373100,62233003)the Natural Science Foundation of Jiangsu Province of China(BK20202006)。
文摘This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system.A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency.Firstly a regional multi-agent Q-learning framework is proposed,which can equivalently decompose the global Q value of the traffic system into the local values of several regions Based on the framework and the idea of human-machine cooperation,a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to realtime traffic flow densities.In order to achieve better cooperation inside each region,a lightweight spatio-temporal fusion feature extraction network is designed.The experiments in synthetic real-world and city-level scenarios show that the proposed RegionS TLight converges more quickly,is more stable,and obtains better asymptotic performance compared to state-of-theart models.