期刊文献+
共找到3,899篇文章
< 1 2 195 >
每页显示 20 50 100
MARCS:A Mobile Crowdsensing Framework Based on Data Shapley Value Enabled Multi-Agent Deep Reinforcement Learning
1
作者 Yiqin Wang Yufeng Wang +1 位作者 Jianhua Ma Qun Jin 《Computers, Materials & Continua》 2025年第3期4431-4449,共19页
Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.Howeve... Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.However,in practice,opportunistic MCS has several challenges from both the perspectives of MCS participants and the data platform.On the one hand,participants face uncertainties in conducting MCS tasks,including their mobility and implicit interactions among participants,and participants’economic returns given by the MCS data platform are determined by not only their own actions but also other participants’strategic actions.On the other hand,the platform can only observe the participants’uploaded sensing data that depends on the unknown effort/action exerted by participants to the platform,while,for optimizing its overall objective,the platform needs to properly reward certain participants for incentivizing them to provide high-quality data.To address the challenge of balancing individual incentives and platform objectives in MCS,this paper proposes MARCS,an online sensing policy based on multi-agent deep reinforcement learning(MADRL)with centralized training and decentralized execution(CTDE).Specifically,the interactions between MCS participants and the data platform are modeled as a partially observable Markov game,where participants,acting as agents,use DRL-based policies to make decisions based on local observations,such as task trajectories and platform payments.To align individual and platform goals effectively,the platform leverages Shapley value to estimate the contribution of each participant’s sensed data,using these estimates as immediate rewards to guide agent training.The experimental results on real mobility trajectory datasets indicate that the revenue of MARCS reaches almost 35%,53%,and 100%higher than DDPG,Actor-Critic,and model predictive control(MPC)respectively on the participant side and similar results on the platform side,which show superior performance compared to baselines. 展开更多
关键词 Mobile crowdsensing online data acquisition data Shapley value multi-agent deep reinforcement learning centralized training and decentralized execution(CTDE)
在线阅读 下载PDF
Multi-agent deep reinforcement learning based resource management in heterogeneous V2X networks
2
作者 Junhui Zhao Fajin Hu +1 位作者 Jiahang Li Yiwen Nie 《Digital Communications and Networks》 2025年第1期182-190,共9页
In Heterogeneous Vehicle-to-Everything Networks(HVNs),multiple users such as vehicles and handheld devices and infrastructure can communicate with each other to obtain more advanced services.However,the increasing num... In Heterogeneous Vehicle-to-Everything Networks(HVNs),multiple users such as vehicles and handheld devices and infrastructure can communicate with each other to obtain more advanced services.However,the increasing number of entities accessing HVNs presents a huge technical challenge to allocate the limited wireless resources.Traditional model-driven resource allocation approaches are no longer applicable because of rich data and the interference problem of multiple communication modes reusing resources in HVNs.In this paper,we investigate a wireless resource allocation scheme including power control and spectrum allocation based on the resource block reuse strategy.To meet the high capacity of cellular users and the high reliability of Vehicle-to-Vehicle(V2V)user pairs,we propose a data-driven Multi-Agent Deep Reinforcement Learning(MADRL)resource allocation scheme for the HVN.Simulation results demonstrate that compared to existing algorithms,the proposed MADRL-based scheme achieves a high sum capacity and probability of successful V2V transmission,while providing close-to-limit performance. 展开更多
关键词 DATA-DRIVEN deep reinforcement learning Resource allocation V2X communications
在线阅读 下载PDF
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:1
3
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
在线阅读 下载PDF
Service Function Chain Deployment Algorithm Based on Multi-Agent Deep Reinforcement Learning
4
作者 Wanwei Huang Qiancheng Zhang +2 位作者 Tao Liu YaoliXu Dalei Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第9期4875-4893,共19页
Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(S... Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%. 展开更多
关键词 Network function virtualization service function chain Markov decision process multi-agent reinforcement learning
在线阅读 下载PDF
Multi-Agent Deep Reinforcement Learning for Efficient Computation Offloading in Mobile Edge Computing
5
作者 Tianzhe Jiao Xiaoyue Feng +2 位作者 Chaopeng Guo Dongqi Wang Jie Song 《Computers, Materials & Continua》 SCIE EI 2023年第9期3585-3603,共19页
Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtua... Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtual reality,mobile devices,and smart cities.In general,these IoT applications always bring higher energy consumption than traditional applications,which are usually energy-constrained.To provide persistent energy,many references have studied the offloading problem to save energy consumption.However,the dynamic environment dramatically increases the optimization difficulty of the offloading decision.In this paper,we aim to minimize the energy consumption of the entireMECsystemunder the latency constraint by fully considering the dynamic environment.UnderMarkov games,we propose amulti-agent deep reinforcement learning approach based on the bi-level actorcritic learning structure to jointly optimize the offloading decision and resource allocation,which can solve the combinatorial optimization problem using an asymmetric method and compute the Stackelberg equilibrium as a better convergence point than Nash equilibrium in terms of Pareto superiority.Our method can better adapt to a dynamic environment during the data transmission than the single-agent strategy and can effectively tackle the coordination problem in the multi-agent environment.The simulation results show that the proposed method could decrease the total computational overhead by 17.8%compared to the actor-critic-based method and reduce the total computational overhead by 31.3%,36.5%,and 44.7%compared with randomoffloading,all local execution,and all offloading execution,respectively. 展开更多
关键词 Computation offloading multi-agent deep reinforcement learning mobile-edge computing latency energy efficiency
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
6
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
Multi-Agent Deep Reinforcement Learning-Based Resource Allocation in HPC/AI Converged Cluster 被引量:1
7
作者 Jargalsaikhan Narantuya Jun-Sik Shin +1 位作者 Sun Park JongWon Kim 《Computers, Materials & Continua》 SCIE EI 2022年第9期4375-4395,共21页
As the complexity of deep learning(DL)networks and training data grows enormously,methods that scale with computation are becoming the future of artificial intelligence(AI)development.In this regard,the interplay betw... As the complexity of deep learning(DL)networks and training data grows enormously,methods that scale with computation are becoming the future of artificial intelligence(AI)development.In this regard,the interplay between machine learning(ML)and high-performance computing(HPC)is an innovative paradigm to speed up the efficiency of AI research and development.However,building and operating an HPC/AI converged system require broad knowledge to leverage the latest computing,networking,and storage technologies.Moreover,an HPC-based AI computing environment needs an appropriate resource allocation and monitoring strategy to efficiently utilize the system resources.In this regard,we introduce a technique for building and operating a high-performance AI-computing environment with the latest technologies.Specifically,an HPC/AI converged system is configured inside Gwangju Institute of Science and Technology(GIST),called GIST AI-X computing cluster,which is built by leveraging the latest Nvidia DGX servers,high-performance storage and networking devices,and various open source tools.Therefore,it can be a good reference for building a small or middlesized HPC/AI converged system for research and educational institutes.In addition,we propose a resource allocation method for DL jobs to efficiently utilize the computing resources with multi-agent deep reinforcement learning(mDRL).Through extensive simulations and experiments,we validate that the proposed mDRL algorithm can help the HPC/AI converged cluster to achieve both system utilization and power consumption improvement.By deploying the proposed resource allocation method to the system,total job completion time is reduced by around 20%and inefficient power consumption is reduced by around 40%. 展开更多
关键词 deep learning HPC/AI converged cluster reinforcement learning
在线阅读 下载PDF
Exploring Local Chemical Space in De Novo Molecular Generation Using Multi-Agent Deep Reinforcement Learning 被引量:2
8
作者 Wei Hu 《Natural Science》 2021年第9期412-424,共13页
Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also ... Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also employed in the design of molecules and drugs. While a single agent is a good fit for computer games, it has limitations when used in molecule design. Its sequential learning makes it impossible to modify or improve the previous steps while working on the current step. In this paper, we proposed to apply the multi-agent RL approach to the research of molecules, which can optimize all sites of a molecule simultaneously. To elucidate the validity of our approach, we chose one chemical compound Favipiravir to explore its local chemical space. Favipiravir is a broad-spectrum inhibitor of viral RNA polymerase, and is one of the compounds that are currently being used in SARS-CoV-2 (COVID-19) clinical trials. Our experiments revealed the collaborative learning of a team of deep RL agents as well as the learning of its individual learning agent in the exploration of Favipiravir. In particular, our multi-agents not only discovered the molecules near Favipiravir in chemical space, but also the learnability of each site in the string representation of Favipiravir, critical information for us to understand the underline mechanism that supports machine learning of molecules. 展开更多
关键词 multi-agent reinforcement learning Actor-Critic Molecule Design SARS-CoV-2 COVID-19
在线阅读 下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks
9
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
在线阅读 下载PDF
Multi-objective optimization of hybrid electric vehicles energy management using multi-agent deep reinforcement learning framework
10
作者 Xiaoyu Li Zaihang Zhou +2 位作者 Changyin Wei Xiao Gao Yibo Zhang 《Energy and AI》 2025年第2期287-297,共11页
Hybrid electric vehicles(HEVs)have the advantages of lower emissions and less noise pollution than traditional fuel vehicles.Developing reasonable energy management strategies(EMSs)can effectively reduce fuel consumpt... Hybrid electric vehicles(HEVs)have the advantages of lower emissions and less noise pollution than traditional fuel vehicles.Developing reasonable energy management strategies(EMSs)can effectively reduce fuel consumption and improve the fuel economy of HEVs.However,current EMSs still have problems,such as complex multi-objective optimization and poor algorithm robustness.Herein,a multi-agent reinforcement learning(MADRL)framework is proposed based on Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm to solve such problems.Specifically,a vehicle model and dynamics model are established,and based on this,a multi-objective EMS is developed by considering fuel economy,maintaining the battery State of Charge(SOC),and reducing battery degradation.Secondly,the proposed strategy regards the engine and battery as two agents,and the agents cooperate with each other to realize optimal power distribution and achieve the optimal control strategy.Finally,the WLTC and HWFET driving cycles are employed to verify the performances of the proposed method,the fuel consumption decreases by 26.91%and 8.41%on average compared to the other strategies.The simulation results demonstrate that the proposed strategy has remarkable superiority in multi-objective optimization. 展开更多
关键词 Energy management strategy Hybrid electric vehicle reinforcement learning multi-agent deep deterministicstrategy gradient
在线阅读 下载PDF
Multi-agent deep reinforcement learning with traffic flow for traffic signal control
11
作者 Liang Hou Dailin Huang +1 位作者 Jie Cao Jialin Ma 《Journal of Control and Decision》 2025年第1期81-92,共12页
Multi-agent Reinforcement Learning(MARL)has become one of the best methods in Adaptive Traffic Signal Control(ATSC).Traffic flow is a very regular traffic volume,which is highly critical to signal control policy.Howev... Multi-agent Reinforcement Learning(MARL)has become one of the best methods in Adaptive Traffic Signal Control(ATSC).Traffic flow is a very regular traffic volume,which is highly critical to signal control policy.However,dynamic control policies will directly affect traffic flow formation,and it is impossible to provide observation through the original traffic flow prediction.This paper proposes a method for estimating traffic flow according to the time window in Reinforcement Learning(RL)training.Therefore,it is verified on both the regular road network and the real road network.Our method further reduces the intersection delay and queue length compared with the original method. 展开更多
关键词 multi-agent reinforcement learning actor-critic adaptive traffic signal control traffic flow
原文传递
Multi-agent Deep Reinforcement Learning Approach for Temporally Coordinated Demand Response in Microgrids
12
作者 Chunchao Hu Zexiang Cai Yanxu Zhang 《CSEE Journal of Power and Energy Systems》 2025年第4期1512-1522,共11页
Price-based and incentive-based demand response(DR)are both recognized as promising solutions to address the increasing uncertainties of renewable energy sources(RES)in microgrids.However,since the temporally optimiza... Price-based and incentive-based demand response(DR)are both recognized as promising solutions to address the increasing uncertainties of renewable energy sources(RES)in microgrids.However,since the temporally optimization horizons of price-based and incentive-based DR are different,few existing methods consider their coordination.In this paper,a multi-agent deep reinforcement learning(MA-DRL)approach is proposed for the temporally coordinated DR in microgrids.The proposed method enhances micrigrid operation revenue by coordinating day-ahead price-based demand response(PBDR)and hourly direct load control(DLC).The operation at different time scales is decided by different DRL agents,and optimized by a multiagent deep deterministic policy gradient(MA-DDPG)using a shared critic to guide agents to attain a global objective.The effectiveness of the proposed approach is validated on a modified IEEE 33-bus distribution system and a modified heavily loaded 69-bus distribution system. 展开更多
关键词 Day-ahead price-based demand response demand response hourly direct load control microgrid multiagent deep reinforcement learning
原文传递
A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G network
13
作者 Haiwen Niu Luhan Wang +3 位作者 Keliang Du Zhaoming Lu Xiangming Wen Yu Liu 《Digital Communications and Networks》 2025年第1期92-105,共14页
Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies dri... Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks. 展开更多
关键词 Cybertwin multi-agent deep reinforcement learning(MADRL) Task offloading PIPELINING Delay-aware
在线阅读 下载PDF
Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs 被引量:2
14
作者 Bocheng ZHAO Mingying HUO +4 位作者 Zheng LI Wenyu FENG Ze YU Naiming QI Shaohai WANG 《Chinese Journal of Aeronautics》 2025年第3期109-123,共15页
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj... This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments. 展开更多
关键词 Unmanned aerial vehicle(UAV) multi-agent reinforcement learning(MARL) Graph attention network(GAT) Tracking Dynamic and unknown environment
原文传递
A Survey of Cooperative Multi-agent Reinforcement Learning for Multi-task Scenarios 被引量:1
15
作者 Jiajun CHAI Zijie ZHAO +1 位作者 Yuanheng ZHU Dongbin ZHAO 《Artificial Intelligence Science and Engineering》 2025年第2期98-121,共24页
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-... Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world. 展开更多
关键词 MULTI-TASK multi-agent reinforcement learning large language models
在线阅读 下载PDF
Achievement of Fish School Milling Motion Based on Distributed Multi-agent Reinforcement Learning
16
作者 Jincun Liu Yinjie Ren +3 位作者 Yang Liu Yan Meng Dong An Yaoguang Wei 《Journal of Bionic Engineering》 2025年第4期1683-1701,共19页
In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agen... In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agents lead to complex collective behaviors.This paper studies Multi-Agent Reinforcement Learning to simulate fish schooling behavior,overcoming the challenges of tuning parameters in traditional models and addressing the limitations of single-agent methods in multi-agent environments.Based on this foundation,a novel Graph Convolutional Networks(GCN)-Critic MADDPG algorithm leveraging GCN is proposed to enhance cooperation among agents in a multi-agent system.Simulation experiments demonstrate that,compared to traditional single-agent algorithms,the proposed method not only exhibits significant advantages in terms of convergence speed and stability but also achieves tighter group formations and more naturally aligned Milling behavior.Additionally,a fish school self-organizing behavior research platform based on an event-triggered mechanism has been developed,providing a robust tool for exploring dynamic behavioral changes under various conditions. 展开更多
关键词 Collective motion Collective behavior SELF-ORGANIZATION Fish school multi-agent reinforcement learning
在线阅读 下载PDF
Multi-Agent Reinforcement Learning for Moving Target Defense Temporal Decision-Making Approach Based on Stackelberg-FlipIt Games
17
作者 Rongbo Sun Jinlong Fei +1 位作者 Yuefei Zhu Zhongyu Guo 《Computers, Materials & Continua》 2025年第8期3765-3786,共22页
Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,... Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness. 展开更多
关键词 Cyber security moving target defense multi-agent reinforcement learning security metrics game theory
在线阅读 下载PDF
Defending Against Jamming and Interference for Internet of UAVs Using Cooperative Multi-Agent Reinforcement Learning with Mutual Information
18
作者 Lin Yan Wu Zhijuan +4 位作者 Peng Nuoheng Zhao Tianyu Zhang Yijin Shu Feng Li Jun 《China Communications》 2025年第5期220-237,共18页
The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defendin... The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs. 展开更多
关键词 anti-jamming communication internet of UAVs multi-agent reinforcement learning spectrum allocation
在线阅读 下载PDF
Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems:A Multi-Agent Reinforcement Learning Policy Optimization Approach
19
作者 Lei Lei Chengfu Wu Huaimin Chen 《Computers, Materials & Continua》 2025年第10期1339-1363,共25页
This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing... This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems. 展开更多
关键词 multi-agent reinforcement learning multi-UAV systems pursuit-evasion games
在线阅读 下载PDF
A sample selection mechanism for multi-UCAV air combat policy training using multi-agent reinforcement learning
20
作者 Zihui YAN Xiaolong LIANG +3 位作者 Yueqi HOU Aiwu YANG Jiaqiang ZHANG Ning WANG 《Chinese Journal of Aeronautics》 2025年第6期501-516,共16页
Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes... Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes a novel Dominant and Non-dominant strategy sample selection(DoNot)mechanism and a Local Observation Enhanced Multi-Agent Proximal Policy Optimization(LOE-MAPPO)algorithm to train the multi-UCAV air combat policy and improve its generalization.Specifically,the LOE-MAPPO algorithm adopts a mixed state that concatenates the global state and individual agent's local observation to enable efficient value function learning in multi-UCAV air combat.The DoNot mechanism classifies opponents into dominant or non-dominant strategy opponents,and samples from easier to more challenging opponents to form an adaptive training curriculum.Empirical results demonstrate that the proposed LOE-MAPPO algorithm outperforms baseline MARL algorithms in multi-UCAV air combat scenarios,and the DoNot mechanism leads to stronger policy generalization when facing diverse opponents.The results pave the way for the fast generation of cooperative strategies for air combat agents with MARLalgorithms. 展开更多
关键词 Unmanned combat aerial vehicle Air combat Sample selection multi-agent reinforcement learning Policyproximal optimization
原文传递
上一页 1 2 195 下一页 到第
使用帮助 返回顶部