期刊文献+
共找到388篇文章
< 1 2 20 >
每页显示 20 50 100
Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs 被引量:2
1
作者 Bocheng ZHAO Mingying HUO +4 位作者 Zheng LI Wenyu FENG Ze YU Naiming QI Shaohai WANG 《Chinese Journal of Aeronautics》 2025年第3期109-123,共15页
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj... This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments. 展开更多
关键词 Unmanned aerial vehicle(UAV) multi-agent reinforcement learning(MARL) graph attention network(GAT) Tracking Dynamic and unknown environment
原文传递
A Survey of Cooperative Multi-agent Reinforcement Learning for Multi-task Scenarios 被引量:1
2
作者 Jiajun CHAI Zijie ZHAO +1 位作者 Yuanheng ZHU Dongbin ZHAO 《Artificial Intelligence Science and Engineering》 2025年第2期98-121,共24页
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-... Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world. 展开更多
关键词 MULTI-TASK multi-agent reinforcement learning large language models
在线阅读 下载PDF
Achievement of Fish School Milling Motion Based on Distributed Multi-agent Reinforcement Learning
3
作者 Jincun Liu Yinjie Ren +3 位作者 Yang Liu Yan Meng Dong An Yaoguang Wei 《Journal of Bionic Engineering》 2025年第4期1683-1701,共19页
In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agen... In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agents lead to complex collective behaviors.This paper studies Multi-Agent Reinforcement Learning to simulate fish schooling behavior,overcoming the challenges of tuning parameters in traditional models and addressing the limitations of single-agent methods in multi-agent environments.Based on this foundation,a novel Graph Convolutional Networks(GCN)-Critic MADDPG algorithm leveraging GCN is proposed to enhance cooperation among agents in a multi-agent system.Simulation experiments demonstrate that,compared to traditional single-agent algorithms,the proposed method not only exhibits significant advantages in terms of convergence speed and stability but also achieves tighter group formations and more naturally aligned Milling behavior.Additionally,a fish school self-organizing behavior research platform based on an event-triggered mechanism has been developed,providing a robust tool for exploring dynamic behavioral changes under various conditions. 展开更多
关键词 Collective motion Collective behavior SELF-ORGANIZATION Fish school multi-agent reinforcement learning
在线阅读 下载PDF
A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G network
4
作者 Haiwen Niu Luhan Wang +3 位作者 Keliang Du Zhaoming Lu Xiangming Wen Yu Liu 《Digital Communications and Networks》 2025年第1期92-105,共14页
Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies dri... Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks. 展开更多
关键词 Cybertwin multi-agent Deep reinforcement learning(MADRL) Task offloading PIPELINING Delay-aware
在线阅读 下载PDF
Multi-Agent Reinforcement Learning for Moving Target Defense Temporal Decision-Making Approach Based on Stackelberg-FlipIt Games
5
作者 Rongbo Sun Jinlong Fei +1 位作者 Yuefei Zhu Zhongyu Guo 《Computers, Materials & Continua》 2025年第8期3765-3786,共22页
Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,... Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness. 展开更多
关键词 Cyber security moving target defense multi-agent reinforcement learning security metrics game theory
在线阅读 下载PDF
Novel multi-agent action masked deep reinforcement learning for general industrial assembly lines balancing problems
6
作者 Ali M.Ali Luca Tirel Hashim A.Hashim 《Journal of Automation and Intelligence》 2025年第4期299-311,共13页
Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to... Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to such challenges can be obtained through Integer Programming(IP),the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios.Heuristic methods,such as Genetic Algorithms,can also be applied,but they frequently produce suboptimal solutions in extensive cases.This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process(MDP),without imposing assumptions on the type of assembly line a notable distinction from most existing models.The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning(DRL)agents to optimize task and resource scheduling.To enhance the efficiency of agent training,the paper proposes two innovative tools.The first is an action-masking technique,which ensures the agent selects only feasible actions,thereby reducing training time.The second is a multi-agent approach,where each workstation is managed by an individual agent,as a result,the state and action spaces were reduced.A centralized training framework with decentralized execution is adopted,offering a scalable learning architecture for optimizing industrial assembly lines.This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action.The effectiveness of the proposed scheme is validated through numerical simulations,demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach. 展开更多
关键词 Artificial intelligence in industrial engineering Autonomous decision making Distributed multi-agent learning reinforcement learning
在线阅读 下载PDF
Defending Against Jamming and Interference for Internet of UAVs Using Cooperative Multi-Agent Reinforcement Learning with Mutual Information
7
作者 Lin Yan Wu Zhijuan +4 位作者 Peng Nuoheng Zhao Tianyu Zhang Yijin Shu Feng Li Jun 《China Communications》 2025年第5期220-237,共18页
The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defendin... The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs. 展开更多
关键词 anti-jamming communication internet of UAVs multi-agent reinforcement learning spectrum allocation
在线阅读 下载PDF
Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems:A Multi-Agent Reinforcement Learning Policy Optimization Approach
8
作者 Lei Lei Chengfu Wu Huaimin Chen 《Computers, Materials & Continua》 2025年第10期1339-1363,共25页
This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing... This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems. 展开更多
关键词 multi-agent reinforcement learning multi-UAV systems pursuit-evasion games
在线阅读 下载PDF
A sample selection mechanism for multi-UCAV air combat policy training using multi-agent reinforcement learning
9
作者 Zihui YAN Xiaolong LIANG +3 位作者 Yueqi HOU Aiwu YANG Jiaqiang ZHANG Ning WANG 《Chinese Journal of Aeronautics》 2025年第6期501-516,共16页
Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes... Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes a novel Dominant and Non-dominant strategy sample selection(DoNot)mechanism and a Local Observation Enhanced Multi-Agent Proximal Policy Optimization(LOE-MAPPO)algorithm to train the multi-UCAV air combat policy and improve its generalization.Specifically,the LOE-MAPPO algorithm adopts a mixed state that concatenates the global state and individual agent's local observation to enable efficient value function learning in multi-UCAV air combat.The DoNot mechanism classifies opponents into dominant or non-dominant strategy opponents,and samples from easier to more challenging opponents to form an adaptive training curriculum.Empirical results demonstrate that the proposed LOE-MAPPO algorithm outperforms baseline MARL algorithms in multi-UCAV air combat scenarios,and the DoNot mechanism leads to stronger policy generalization when facing diverse opponents.The results pave the way for the fast generation of cooperative strategies for air combat agents with MARLalgorithms. 展开更多
关键词 Unmanned combat aerial vehicle Air combat Sample selection multi-agent reinforcement learning Policyproximal optimization
原文传递
AoI and TTC Based Resource Allocation in C-V2X Sidelink via Multi-Agent Reinforcement Learning
10
作者 Tong Xiaolu Shi Yan +2 位作者 Xu Yaqi Chen Shanzhi Ge Yuming 《China Communications》 2025年第8期281-297,共17页
The rapid development of the Internet of Vehicles(IoVs)underscores the importance of Vehicle-to-Everything(V2X)communication for ensuring driving safety.V2X supports control systems by providing reliable and real-time... The rapid development of the Internet of Vehicles(IoVs)underscores the importance of Vehicle-to-Everything(V2X)communication for ensuring driving safety.V2X supports control systems by providing reliable and real-time information,while the control system's decisions,in turn,affect the communication topology and channel state.Depending on the coupling between communication and control,radio resource allocation(RRA)should be controlaware.However,current RRA methods often focus on optimizing communication metrics,neglecting the needs of the control system.To promote the co-design of communication and control,this paper proposes a novel RRA method that integrates both communication and control considerations.From the communication perspective,the Age of Information(AoI)is introduced to measure the freshness of packets.From the control perspective,a weighted utility function based on Time-to-Collision(TTC)and driving distance is designed,emphasizing the neighboring importance and potentially dangerous vehicles.By synthesizing these two metrics,an optimization objective minimizing weighted AoI based on TTC and driving distance is formulated.The RRA process is modeled as a partially observable Markov decision process,and a multi-agent reinforcement learning algorithm incorporating positional encoding and attention mechanisms(PAMARL)is proposed.Simulation results show that PAMARL can reduce Collision Risk(CR)with better Packet Delivery Ratio(PDR)than others. 展开更多
关键词 age of information multi-agent reinforcement learning resource allocation time to collision
在线阅读 下载PDF
Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks
11
作者 Jian-Dong Yao Wen-Bin Hao +3 位作者 Zhi-Gao Meng Bo Xie Jian-Hua Chen Jia-Qi Wei 《Journal of Electronic Science and Technology》 2025年第1期35-59,共25页
This paper presents a novel approach to dynamic pricing and distributed energy management in virtual power plant(VPP)networks using multi-agent reinforcement learning(MARL).As the energy landscape evolves towards grea... This paper presents a novel approach to dynamic pricing and distributed energy management in virtual power plant(VPP)networks using multi-agent reinforcement learning(MARL).As the energy landscape evolves towards greater decentralization and renewable integration,traditional optimization methods struggle to address the inherent complexities and uncertainties.Our proposed MARL framework enables adaptive,decentralized decision-making for both the distribution system operator and individual VPPs,optimizing economic efficiency while maintaining grid stability.We formulate the problem as a Markov decision process and develop a custom MARL algorithm that leverages actor-critic architectures and experience replay.Extensive simulations across diverse scenarios demonstrate that our approach consistently outperforms baseline methods,including Stackelberg game models and model predictive control,achieving an 18.73%reduction in costs and a 22.46%increase in VPP profits.The MARL framework shows particular strength in scenarios with high renewable energy penetration,where it improves system performance by 11.95%compared with traditional methods.Furthermore,our approach demonstrates superior adaptability to unexpected events and mis-predictions,highlighting its potential for real-world implementation. 展开更多
关键词 Distributed energy management Dynamic pricing multi-agent reinforcement learning Renewable energy integration Virtual power plants
在线阅读 下载PDF
Multi-hop UAV relay covert communication:A multi-agent reinforcement learning approach
12
作者 Hengzhi BAI Haichao WANG +4 位作者 Rongrong HE Jiatao DU Guoxin LI Yuhua XU Yutao JIAO 《Chinese Journal of Aeronautics》 2025年第10期120-133,共14页
Due to the characteristics of line-of-sight(LoS)communication in unmanned aerial vehicle(UAV)networks,these systems are highly susceptible to eavesdropping and surveillance.To effectively address the security concerns... Due to the characteristics of line-of-sight(LoS)communication in unmanned aerial vehicle(UAV)networks,these systems are highly susceptible to eavesdropping and surveillance.To effectively address the security concerns in UAV communication,covert communication methods have been adopted.This paper explores the joint optimization problem of trajectory and transmission power in a multi-hop UAV relay covert communication system.Considering the communication covertness,power constraints,and trajectory limitations,an algorithm based on multi-agent proximal policy optimization(MAPPO),named covert-MAPPO(C-MAPPO),is proposed.The proposed method leverages the strengths of both optimization algorithms and reinforcement learning to analyze and make joint decisions on the transmission power and flight trajectory strategies for UAVs to achieve cooperation.Simulation results demonstrate that the proposed method can maximize the system throughput while satisfying covertness constraints,and it outperforms benchmark algorithms in terms of system throughput and reward convergence speed. 展开更多
关键词 Covert communication Unmanned aerial vehicle(UAV) Power optimization Trajectory planning multi-agent reinforcement learning(MARL)
原文传递
Autonomous Conflict Resolution(AutoCR)Based on Improved Multi-agent Reinforcement Learning
13
作者 HUANG Xiao TIAN Yong +1 位作者 LI Jiangchen ZHANG Naizhong 《Transactions of Nanjing University of Aeronautics and Astronautics》 2025年第S1期91-101,共11页
Conflict resolution(CR)is a fundamental component of air traffic management,where recent progress in artificial intelligence has led to the effective application of deep reinforcement learning(DRL)techniques to enhanc... Conflict resolution(CR)is a fundamental component of air traffic management,where recent progress in artificial intelligence has led to the effective application of deep reinforcement learning(DRL)techniques to enhance CR strategies.However,existing DRL models applied to CR are often limited to simple scenarios.This approach frequently leads to the neglect of the high risks associated with multiple intersections in the high-density and multi-airport system terminal area(MAS-TMA),and suffers from poor interpretability.This paper addresses the aforementioned gap by introducing an improved multi-agent DRL model that adopted to autonomous CR(AutoCR)within MAS-TMA.Specifically,dynamic weather conditions are incorporated into the state space to enhance adaptability.In the action space,the flight intent is considered and transformed into optimal maneuvers according to overload,thus improving interpretability.On these bases,the deep Q-network(DQN)algorithm is further improved to address the AutoCR problem in MAS-TMA.Simulation experiments conducted in the“Guangdong-Hong Kong-Macao”greater bay area(GBA)MAS-TMA demonstrate the effectiveness of the proposed method,successfully resolving over eight potential conflicts and performing robustly across various air traffic densities. 展开更多
关键词 air traffic management conflict resolution multi-airport system terminal area(MAS-TMA) multi-agent reinforcement learning
在线阅读 下载PDF
MARCS:A Mobile Crowdsensing Framework Based on Data Shapley Value Enabled Multi-Agent Deep Reinforcement Learning
14
作者 Yiqin Wang Yufeng Wang +1 位作者 Jianhua Ma Qun Jin 《Computers, Materials & Continua》 2025年第3期4431-4449,共19页
Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.Howeve... Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.However,in practice,opportunistic MCS has several challenges from both the perspectives of MCS participants and the data platform.On the one hand,participants face uncertainties in conducting MCS tasks,including their mobility and implicit interactions among participants,and participants’economic returns given by the MCS data platform are determined by not only their own actions but also other participants’strategic actions.On the other hand,the platform can only observe the participants’uploaded sensing data that depends on the unknown effort/action exerted by participants to the platform,while,for optimizing its overall objective,the platform needs to properly reward certain participants for incentivizing them to provide high-quality data.To address the challenge of balancing individual incentives and platform objectives in MCS,this paper proposes MARCS,an online sensing policy based on multi-agent deep reinforcement learning(MADRL)with centralized training and decentralized execution(CTDE).Specifically,the interactions between MCS participants and the data platform are modeled as a partially observable Markov game,where participants,acting as agents,use DRL-based policies to make decisions based on local observations,such as task trajectories and platform payments.To align individual and platform goals effectively,the platform leverages Shapley value to estimate the contribution of each participant’s sensed data,using these estimates as immediate rewards to guide agent training.The experimental results on real mobility trajectory datasets indicate that the revenue of MARCS reaches almost 35%,53%,and 100%higher than DDPG,Actor-Critic,and model predictive control(MPC)respectively on the participant side and similar results on the platform side,which show superior performance compared to baselines. 展开更多
关键词 Mobile crowdsensing online data acquisition data Shapley value multi-agent deep reinforcement learning centralized training and decentralized execution(CTDE)
在线阅读 下载PDF
Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management 被引量:1
15
作者 FENG Bingyi FENG Mingxiao +2 位作者 WANG Minrui ZHOU Wengang LI Houqiang 《ZTE Communications》 2023年第3期11-21,共11页
The increasing adoption of renewable energy has posed challenges for voltage regulation in power distribution networks.Gridaware energy management,which includes the control of smart inverters and energy management sy... The increasing adoption of renewable energy has posed challenges for voltage regulation in power distribution networks.Gridaware energy management,which includes the control of smart inverters and energy management systems,is a trending way to mitigate this problem.However,existing multi-agent reinforcement learning methods for grid-aware energy management have not sufficiently considered the importance of agent cooperation and the unique characteristics of the grid,which leads to limited performance.In this study,we propose a new approach named multi-agent hierarchical graph attention reinforcement learning framework(MAHGA)to stabilize the voltage.Specifically,under the paradigm of centralized training and decentralized execution,we model the power distribution network as a novel hierarchical graph containing the agent-level topology and the bus-level topology.Then a hierarchical graph attention model is devised to capture the complex correlation between agents.Moreover,we incorporate graph contrastive learning as an auxiliary task in the reinforcement learning process to improve representation learning from graphs.Experiments on several real-world scenarios reveal that our approach achieves the best performance and can reduce the number of voltage violations remarkably. 展开更多
关键词 demand-side management graph neural networks multi-agent reinforcement learning voltage regulation
在线阅读 下载PDF
Task Offloading and Resource Allocation in NOMA-VEC:A Multi-Agent Deep Graph Reinforcement Learning Algorithm
16
作者 Hu Yonghui Jin Zuodong +1 位作者 Qi Peng Tao Dan 《China Communications》 SCIE CSCD 2024年第8期79-88,共10页
Vehicular edge computing(VEC)is emerging as a promising solution paradigm to meet the requirements of compute-intensive applications in internet of vehicle(IoV).Non-orthogonal multiple access(NOMA)has advantages in im... Vehicular edge computing(VEC)is emerging as a promising solution paradigm to meet the requirements of compute-intensive applications in internet of vehicle(IoV).Non-orthogonal multiple access(NOMA)has advantages in improving spectrum efficiency and dealing with bandwidth scarcity and cost.It is an encouraging progress combining VEC and NOMA.In this paper,we jointly optimize task offloading decision and resource allocation to maximize the service utility of the NOMA-VEC system.To solve the optimization problem,we propose a multiagent deep graph reinforcement learning algorithm.The algorithm extracts the topological features and relationship information between agents from the system state as observations,outputs task offloading decision and resource allocation simultaneously with local policy network,which is updated by a local learner.Simulation results demonstrate that the proposed method achieves a 1.52%∼5.80%improvement compared with the benchmark algorithms in system service utility. 展开更多
关键词 edge computing graph convolutional network reinforcement learning task offloading
在线阅读 下载PDF
A knowledge graph-based reinforcement learning approach for cooperative caching in MEC-enabled heterogeneous networks
17
作者 Dan Wang Yalu Bai Bin Song 《Digital Communications and Networks》 2025年第4期1236-1244,共9页
Existing wireless networks are flooded with video data transmissions,and the demand for high-speed and low-latency video services continues to surge.This has brought with it challenges to networks in the form of conge... Existing wireless networks are flooded with video data transmissions,and the demand for high-speed and low-latency video services continues to surge.This has brought with it challenges to networks in the form of congestion as well as the need for more resources and more dedicated caching schemes.Recently,Multi-access Edge Computing(MEC)-enabled heterogeneous networks,which leverage edge caches for proximity delivery,have emerged as a promising solution to all of these problems.Designing an effective edge caching scheme is critical to its success,however,in the face of limited resources.We propose a novel Knowledge Graph(KG)-based Dueling Deep Q-Network(KG-DDQN)for cooperative caching in MEC-enabled heterogeneous networks.The KGDDQN scheme leverages a KG to uncover video relations,providing valuable insights into user preferences for the caching scheme.Specifically,the KG guides the selection of related videos as caching candidates(i.e.,actions in the DDQN),thus providing a rich reference for implementing a personalized caching scheme while also improving the decision efficiency of the DDQN.Extensive simulation results validate the convergence effectiveness of the KG-DDQN,and it also outperforms baselines regarding cache hit rate and service delay. 展开更多
关键词 Multi-access edge computing Cooperative caching Resource allocation Knowledge graph reinforcement learning
在线阅读 下载PDF
Container cluster placement in edge computing based on reinforcement learning incorporating graph convolutional networks scheme
18
作者 Zhuo Chen Bowen Zhu Chuan Zhou 《Digital Communications and Networks》 2025年第1期60-70,共11页
Container-based virtualization technology has been more widely used in edge computing environments recently due to its advantages of lighter resource occupation, faster startup capability, and better resource utilizat... Container-based virtualization technology has been more widely used in edge computing environments recently due to its advantages of lighter resource occupation, faster startup capability, and better resource utilization efficiency. To meet the diverse needs of tasks, it usually needs to instantiate multiple network functions in the form of containers interconnect various generated containers to build a Container Cluster(CC). Then CCs will be deployed on edge service nodes with relatively limited resources. However, the increasingly complex and timevarying nature of tasks brings great challenges to optimal placement of CC. This paper regards the charges for various resources occupied by providing services as revenue, the service efficiency and energy consumption as cost, thus formulates a Mixed Integer Programming(MIP) model to describe the optimal placement of CC on edge service nodes. Furthermore, an Actor-Critic based Deep Reinforcement Learning(DRL) incorporating Graph Convolutional Networks(GCN) framework named as RL-GCN is proposed to solve the optimization problem. The framework obtains an optimal placement strategy through self-learning according to the requirements and objectives of the placement of CC. Particularly, through the introduction of GCN, the features of the association relationship between multiple containers in CCs can be effectively extracted to improve the quality of placement.The experiment results show that under different scales of service nodes and task requests, the proposed method can obtain the improved system performance in terms of placement error ratio, time efficiency of solution output and cumulative system revenue compared with other representative baseline methods. 展开更多
关键词 Edge computing Network virtualization Container cluster Deep reinforcement learning graph convolutional network
在线阅读 下载PDF
Priority-Aware Resource Allocation for VNF Deployment in Service Function Chains Based on Graph Reinforcement Learning
19
作者 Seyha Ros Seungwoo Kang +3 位作者 Taikuong Iv Inseok Song Prohim Tam Seokhoon Kim 《Computers, Materials & Continua》 2025年第5期1649-1665,共17页
Recently,Network Functions Virtualization(NFV)has become a critical resource for optimizing capability utilization in the 5G/B5G era.NFV decomposes the network resource paradigm,demonstrating the efficient utilization... Recently,Network Functions Virtualization(NFV)has become a critical resource for optimizing capability utilization in the 5G/B5G era.NFV decomposes the network resource paradigm,demonstrating the efficient utilization of Network Functions(NFs)to enable configurable service priorities and resource demands.Telecommunications Service Providers(TSPs)face challenges in network utilization,as the vast amounts of data generated by the Internet of Things(IoT)overwhelm existing infrastructures.IoT applications,which generate massive volumes of diverse data and require real-time communication,contribute to bottlenecks and congestion.In this context,Multiaccess Edge Computing(MEC)is employed to support resource and priority-aware IoT applications by implementing Virtual Network Function(VNF)sequences within Service Function Chaining(SFC).This paper proposes the use of Deep Reinforcement Learning(DRL)combined with Graph Neural Networks(GNN)to enhance network processing,performance,and resource pooling capabilities.GNN facilitates feature extraction through Message-Passing Neural Network(MPNN)mechanisms.Together with DRL,Deep Q-Networks(DQN)are utilized to dynamically allocate resources based on IoT network priorities and demands.Our focus is on minimizing delay times for VNF instance execution,ensuring effective resource placement,and allocation in SFC deployments,offering flexibility to adapt to real-time changes in priority and workload.Simulation results demonstrate that our proposed scheme outperforms reference models in terms of reward,delay,delivery,service drop ratios,and average completion ratios,proving its potential for IoT applications. 展开更多
关键词 Deep reinforcement learning graph neural network multi-access edge computing network functions virtualization software-defined networking
在线阅读 下载PDF
Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning 被引量:12
20
作者 Wenhong ZHOU Jie LI +1 位作者 Zhihong LIU Lincheng SHEN 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2022年第7期100-112,共13页
Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising techn... Multi-Target Tracking Guidance(MTTG)in unknown environments has great potential values in applications for Unmanned Aerial Vehicle(UAV)swarms.Although Multi-Agent Deep Reinforcement Learning(MADRL)is a promising technique for learning cooperation,most of the existing methods cannot scale well to decentralized UAV swarms due to their computational complexity or global information requirement.This paper proposes a decentralized MADRL method using the maximum reciprocal reward to learn cooperative tracking policies for UAV swarms.This method reshapes each UAV’s reward with a regularization term that is defined as the dot product of the reward vector of all neighbor UAVs and the corresponding dependency vector between the UAV and the neighbors.And the dependence between UAVs can be directly captured by the Pointwise Mutual Information(PMI)neural network without complicated aggregation statistics.Then,the experience sharing Reciprocal Reward Multi-Agent Actor-Critic(MAAC-R)algorithm is proposed to learn the cooperative sharing policy for all homogeneous UAVs.Experiments demonstrate that the proposed algorithm can improve the UAVs’cooperation more effectively than the baseline algorithms,and can stimulate a rich form of cooperative tracking behaviors of UAV swarms.Besides,the learned policy can better scale to other scenarios with more UAVs and targets. 展开更多
关键词 Decentralized cooperation Maximum reciprocal reward multi-agent actor-critic Pointwise mutual information reinforcement learning
原文传递
上一页 1 2 20 下一页 到第
使用帮助 返回顶部