期刊文献+
共找到8,555篇文章
< 1 2 250 >
每页显示 20 50 100
Graph-based multi-agent reinforcement learning for collaborative search and tracking of multiple UAVs 被引量:2
1
作者 Bocheng ZHAO Mingying HUO +4 位作者 Zheng LI Wenyu FENG Ze YU Naiming QI Shaohai WANG 《Chinese Journal of Aeronautics》 2025年第3期109-123,共15页
This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary obj... This paper investigates the challenges associated with Unmanned Aerial Vehicle (UAV) collaborative search and target tracking in dynamic and unknown environments characterized by limited field of view. The primary objective is to explore the unknown environments to locate and track targets effectively. To address this problem, we propose a novel Multi-Agent Reinforcement Learning (MARL) method based on Graph Neural Network (GNN). Firstly, a method is introduced for encoding continuous-space multi-UAV problem data into spatial graphs which establish essential relationships among agents, obstacles, and targets. Secondly, a Graph AttenTion network (GAT) model is presented, which focuses exclusively on adjacent nodes, learns attention weights adaptively and allows agents to better process information in dynamic environments. Reward functions are specifically designed to tackle exploration challenges in environments with sparse rewards. By introducing a framework that integrates centralized training and distributed execution, the advancement of models is facilitated. Simulation results show that the proposed method outperforms the existing MARL method in search rate and tracking performance with less collisions. The experiments show that the proposed method can be extended to applications with a larger number of agents, which provides a potential solution to the challenging problem of multi-UAV autonomous tracking in dynamic unknown environments. 展开更多
关键词 Unmanned aerial vehicle(UAV) multi-agent reinforcement learning(MARL) Graph attention network(GAT) Tracking Dynamic and unknown environment
原文传递
A Survey of Cooperative Multi-agent Reinforcement Learning for Multi-task Scenarios 被引量:1
2
作者 Jiajun CHAI Zijie ZHAO +1 位作者 Yuanheng ZHU Dongbin ZHAO 《Artificial Intelligence Science and Engineering》 2025年第2期98-121,共24页
Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-... Cooperative multi-agent reinforcement learning(MARL)is a key technology for enabling cooperation in complex multi-agent systems.It has achieved remarkable progress in areas such as gaming,autonomous driving,and multi-robot control.Empowering cooperative MARL with multi-task decision-making capabilities is expected to further broaden its application scope.In multi-task scenarios,cooperative MARL algorithms need to address 3 types of multi-task problems:reward-related multi-task,arising from different reward functions;multi-domain multi-task,caused by differences in state and action spaces,state transition functions;and scalability-related multi-task,resulting from the dynamic variation in the number of agents.Most existing studies focus on scalability-related multitask problems.However,with the increasing integration between large language models(LLMs)and multi-agent systems,a growing number of LLM-based multi-agent systems have emerged,enabling more complex multi-task cooperation.This paper provides a comprehensive review of the latest advances in this field.By combining multi-task reinforcement learning with cooperative MARL,we categorize and analyze the 3 major types of multi-task problems under multi-agent settings,offering more fine-grained classifications and summarizing key insights for each.In addition,we summarize commonly used benchmarks and discuss future directions of research in this area,which hold promise for further enhancing the multi-task cooperation capabilities of multi-agent systems and expanding their practical applications in the real world. 展开更多
关键词 MULTI-TASK multi-agent reinforcement learning large language models
在线阅读 下载PDF
Achievement of Fish School Milling Motion Based on Distributed Multi-agent Reinforcement Learning
3
作者 Jincun Liu Yinjie Ren +3 位作者 Yang Liu Yan Meng Dong An Yaoguang Wei 《Journal of Bionic Engineering》 2025年第4期1683-1701,共19页
In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agen... In recent years,significant research attention has been directed towards swarm intelligence.The Milling behavior of fish schools,a prime example of swarm intelligence,shows how simple rules followed by individual agents lead to complex collective behaviors.This paper studies Multi-Agent Reinforcement Learning to simulate fish schooling behavior,overcoming the challenges of tuning parameters in traditional models and addressing the limitations of single-agent methods in multi-agent environments.Based on this foundation,a novel Graph Convolutional Networks(GCN)-Critic MADDPG algorithm leveraging GCN is proposed to enhance cooperation among agents in a multi-agent system.Simulation experiments demonstrate that,compared to traditional single-agent algorithms,the proposed method not only exhibits significant advantages in terms of convergence speed and stability but also achieves tighter group formations and more naturally aligned Milling behavior.Additionally,a fish school self-organizing behavior research platform based on an event-triggered mechanism has been developed,providing a robust tool for exploring dynamic behavioral changes under various conditions. 展开更多
关键词 Collective motion Collective behavior SELF-ORGANIZATION Fish school multi-agent reinforcement learning
在线阅读 下载PDF
Multi-Agent Reinforcement Learning for Moving Target Defense Temporal Decision-Making Approach Based on Stackelberg-FlipIt Games
4
作者 Rongbo Sun Jinlong Fei +1 位作者 Yuefei Zhu Zhongyu Guo 《Computers, Materials & Continua》 2025年第8期3765-3786,共22页
Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,... Moving Target Defense(MTD)necessitates scientifically effective decision-making methodologies for defensive technology implementation.While most MTD decision studies focus on accurately identifying optimal strategies,the issue of optimal defense timing remains underexplored.Current default approaches—periodic or overly frequent MTD triggers—lead to suboptimal trade-offs among system security,performance,and cost.The timing of MTD strategy activation critically impacts both defensive efficacy and operational overhead,yet existing frameworks inadequately address this temporal dimension.To bridge this gap,this paper proposes a Stackelberg-FlipIt game model that formalizes asymmetric cyber conflicts as alternating control over attack surfaces,thereby capturing the dynamic security state evolution of MTD systems.We introduce a belief factor to quantify information asymmetry during adversarial interactions,enhancing the precision of MTD trigger timing.Leveraging this game-theoretic foundation,we employMulti-Agent Reinforcement Learning(MARL)to derive adaptive temporal strategies,optimized via a novel four-dimensional reward function that holistically balances security,performance,cost,and timing.Experimental validation using IP addressmutation against scanning attacks demonstrates stable strategy convergence and accelerated defense response,significantly improving cybersecurity affordability and effectiveness. 展开更多
关键词 Cyber security moving target defense multi-agent reinforcement learning security metrics game theory
在线阅读 下载PDF
A sample selection mechanism for multi-UCAV air combat policy training using multi-agent reinforcement learning
5
作者 Zihui YAN Xiaolong LIANG +3 位作者 Yueqi HOU Aiwu YANG Jiaqiang ZHANG Ning WANG 《Chinese Journal of Aeronautics》 2025年第6期501-516,共16页
Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes... Policy training against diverse opponents remains a challenge when using Multi-Agent Reinforcement Learning(MARL)in multiple Unmanned Combat Aerial Vehicle(UCAV)air combat scenarios.In view of this,this paper proposes a novel Dominant and Non-dominant strategy sample selection(DoNot)mechanism and a Local Observation Enhanced Multi-Agent Proximal Policy Optimization(LOE-MAPPO)algorithm to train the multi-UCAV air combat policy and improve its generalization.Specifically,the LOE-MAPPO algorithm adopts a mixed state that concatenates the global state and individual agent's local observation to enable efficient value function learning in multi-UCAV air combat.The DoNot mechanism classifies opponents into dominant or non-dominant strategy opponents,and samples from easier to more challenging opponents to form an adaptive training curriculum.Empirical results demonstrate that the proposed LOE-MAPPO algorithm outperforms baseline MARL algorithms in multi-UCAV air combat scenarios,and the DoNot mechanism leads to stronger policy generalization when facing diverse opponents.The results pave the way for the fast generation of cooperative strategies for air combat agents with MARLalgorithms. 展开更多
关键词 Unmanned combat aerial vehicle Air combat Sample selection multi-agent reinforcement learning Policyproximal optimization
原文传递
Adaptive multi-agent reinforcement learning for dynamic pricing and distributed energy management in virtual power plant networks
6
作者 Jian-Dong Yao Wen-Bin Hao +3 位作者 Zhi-Gao Meng Bo Xie Jian-Hua Chen Jia-Qi Wei 《Journal of Electronic Science and Technology》 2025年第1期35-59,共25页
This paper presents a novel approach to dynamic pricing and distributed energy management in virtual power plant(VPP)networks using multi-agent reinforcement learning(MARL).As the energy landscape evolves towards grea... This paper presents a novel approach to dynamic pricing and distributed energy management in virtual power plant(VPP)networks using multi-agent reinforcement learning(MARL).As the energy landscape evolves towards greater decentralization and renewable integration,traditional optimization methods struggle to address the inherent complexities and uncertainties.Our proposed MARL framework enables adaptive,decentralized decision-making for both the distribution system operator and individual VPPs,optimizing economic efficiency while maintaining grid stability.We formulate the problem as a Markov decision process and develop a custom MARL algorithm that leverages actor-critic architectures and experience replay.Extensive simulations across diverse scenarios demonstrate that our approach consistently outperforms baseline methods,including Stackelberg game models and model predictive control,achieving an 18.73%reduction in costs and a 22.46%increase in VPP profits.The MARL framework shows particular strength in scenarios with high renewable energy penetration,where it improves system performance by 11.95%compared with traditional methods.Furthermore,our approach demonstrates superior adaptability to unexpected events and mis-predictions,highlighting its potential for real-world implementation. 展开更多
关键词 Distributed energy management Dynamic pricing multi-agent reinforcement learning Renewable energy integration Virtual power plants
在线阅读 下载PDF
Defending Against Jamming and Interference for Internet of UAVs Using Cooperative Multi-Agent Reinforcement Learning with Mutual Information
7
作者 Lin Yan Wu Zhijuan +4 位作者 Peng Nuoheng Zhao Tianyu Zhang Yijin Shu Feng Li Jun 《China Communications》 2025年第5期220-237,共18页
The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defendin... The Internet of Unmanned Aerial Vehicles(I-UAVs)is expected to execute latency-sensitive tasks,but limited by co-channel interference and malicious jamming.In the face of unknown prior environmental knowledge,defending against jamming and interference through spectrum allocation becomes challenging,especially when each UAV pair makes decisions independently.In this paper,we propose a cooperative multi-agent reinforcement learning(MARL)-based anti-jamming framework for I-UAVs,enabling UAV pairs to learn their own policies cooperatively.Specifically,we first model the problem as a modelfree multi-agent Markov decision process(MAMDP)to maximize the long-term expected system throughput.Then,for improving the exploration of the optimal policy,we resort to optimizing a MARL objective function with a mutual-information(MI)regularizer between states and actions,which can dynamically assign the probability for actions frequently used by the optimal policy.Next,through sharing their current channel selections and local learning experience(their soft Q-values),the UAV pairs can learn their own policies cooperatively relying on only preceding observed information and predicting others’actions.Our simulation results show that for both sweep jamming and Markov jamming patterns,the proposed scheme outperforms the benchmarkers in terms of throughput,convergence and stability for different numbers of jammers,channels and UAV pairs. 展开更多
关键词 anti-jamming communication internet of UAVs multi-agent reinforcement learning spectrum allocation
在线阅读 下载PDF
AoI and TTC Based Resource Allocation in C-V2X Sidelink via Multi-Agent Reinforcement Learning
8
作者 Tong Xiaolu Shi Yan +2 位作者 Xu Yaqi Chen Shanzhi Ge Yuming 《China Communications》 2025年第8期281-297,共17页
The rapid development of the Internet of Vehicles(IoVs)underscores the importance of Vehicle-to-Everything(V2X)communication for ensuring driving safety.V2X supports control systems by providing reliable and real-time... The rapid development of the Internet of Vehicles(IoVs)underscores the importance of Vehicle-to-Everything(V2X)communication for ensuring driving safety.V2X supports control systems by providing reliable and real-time information,while the control system's decisions,in turn,affect the communication topology and channel state.Depending on the coupling between communication and control,radio resource allocation(RRA)should be controlaware.However,current RRA methods often focus on optimizing communication metrics,neglecting the needs of the control system.To promote the co-design of communication and control,this paper proposes a novel RRA method that integrates both communication and control considerations.From the communication perspective,the Age of Information(AoI)is introduced to measure the freshness of packets.From the control perspective,a weighted utility function based on Time-to-Collision(TTC)and driving distance is designed,emphasizing the neighboring importance and potentially dangerous vehicles.By synthesizing these two metrics,an optimization objective minimizing weighted AoI based on TTC and driving distance is formulated.The RRA process is modeled as a partially observable Markov decision process,and a multi-agent reinforcement learning algorithm incorporating positional encoding and attention mechanisms(PAMARL)is proposed.Simulation results show that PAMARL can reduce Collision Risk(CR)with better Packet Delivery Ratio(PDR)than others. 展开更多
关键词 age of information multi-agent reinforcement learning resource allocation time to collision
在线阅读 下载PDF
Autonomous Conflict Resolution(AutoCR)Based on Improved Multi-agent Reinforcement Learning
9
作者 HUANG Xiao TIAN Yong +1 位作者 LI Jiangchen ZHANG Naizhong 《Transactions of Nanjing University of Aeronautics and Astronautics》 2025年第S1期91-101,共11页
Conflict resolution(CR)is a fundamental component of air traffic management,where recent progress in artificial intelligence has led to the effective application of deep reinforcement learning(DRL)techniques to enhanc... Conflict resolution(CR)is a fundamental component of air traffic management,where recent progress in artificial intelligence has led to the effective application of deep reinforcement learning(DRL)techniques to enhance CR strategies.However,existing DRL models applied to CR are often limited to simple scenarios.This approach frequently leads to the neglect of the high risks associated with multiple intersections in the high-density and multi-airport system terminal area(MAS-TMA),and suffers from poor interpretability.This paper addresses the aforementioned gap by introducing an improved multi-agent DRL model that adopted to autonomous CR(AutoCR)within MAS-TMA.Specifically,dynamic weather conditions are incorporated into the state space to enhance adaptability.In the action space,the flight intent is considered and transformed into optimal maneuvers according to overload,thus improving interpretability.On these bases,the deep Q-network(DQN)algorithm is further improved to address the AutoCR problem in MAS-TMA.Simulation experiments conducted in the“Guangdong-Hong Kong-Macao”greater bay area(GBA)MAS-TMA demonstrate the effectiveness of the proposed method,successfully resolving over eight potential conflicts and performing robustly across various air traffic densities. 展开更多
关键词 air traffic management conflict resolution multi-airport system terminal area(MAS-TMA) multi-agent reinforcement learning
在线阅读 下载PDF
Dynamic Decoupling-Driven Cooperative Pursuit for Multi-UAV Systems:A Multi-Agent Reinforcement Learning Policy Optimization Approach
10
作者 Lei Lei Chengfu Wu Huaimin Chen 《Computers, Materials & Continua》 2025年第10期1339-1363,共25页
This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing... This paper proposes a Multi-Agent Attention Proximal Policy Optimization(MA2PPO)algorithm aiming at the problems such as credit assignment,low collaboration efficiency and weak strategy generalization ability existing in the cooperative pursuit tasks of multiple unmanned aerial vehicles(UAVs).Traditional algorithms often fail to effectively identify critical cooperative relationships in such tasks,leading to low capture efficiency and a significant decline in performance when the scale expands.To tackle these issues,based on the proximal policy optimization(PPO)algorithm,MA2PPO adopts the centralized training with decentralized execution(CTDE)framework and introduces a dynamic decoupling mechanism,that is,sharing the multi-head attention(MHA)mechanism for critics during centralized training to solve the credit assignment problem.This method enables the pursuers to identify highly correlated interactions with their teammates,effectively eliminate irrelevant and weakly relevant interactions,and decompose large-scale cooperation problems into decoupled sub-problems,thereby enhancing the collaborative efficiency and policy stability among multiple agents.Furthermore,a reward function has been devised to facilitate the pursuers to encircle the escapee by combining a formation reward with a distance reward,which incentivizes UAVs to develop sophisticated cooperative pursuit strategies.Experimental results demonstrate the effectiveness of the proposed algorithm in achieving multi-UAV cooperative pursuit and inducing diverse cooperative pursuit behaviors among UAVs.Moreover,experiments on scalability have demonstrated that the algorithm is suitable for large-scale multi-UAV systems. 展开更多
关键词 multi-agent reinforcement learning multi-UAV systems pursuit-evasion games
在线阅读 下载PDF
A pipelining task offloading strategy via delay-aware multi-agent reinforcement learning in Cybertwin-enabled 6G network
11
作者 Haiwen Niu Luhan Wang +3 位作者 Keliang Du Zhaoming Lu Xiangming Wen Yu Liu 《Digital Communications and Networks》 2025年第1期92-105,共14页
Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies dri... Cybertwin-enabled 6th Generation(6G)network is envisioned to support artificial intelligence-native management to meet changing demands of 6G applications.Multi-Agent Deep Reinforcement Learning(MADRL)technologies driven by Cybertwins have been proposed for adaptive task offloading strategies.However,the existence of random transmission delay between Cybertwin-driven agents and underlying networks is not considered in related works,which destroys the standard Markov property and increases the decision reaction time to reduce the task offloading strategy performance.In order to address this problem,we propose a pipelining task offloading method to lower the decision reaction time and model it as a delay-aware Markov Decision Process(MDP).Then,we design a delay-aware MADRL algorithm to minimize the weighted sum of task execution latency and energy consumption.Firstly,the state space is augmented using the lastly-received state and historical actions to rebuild the Markov property.Secondly,Gate Transformer-XL is introduced to capture historical actions'importance and maintain the consistent input dimension dynamically changed due to random transmission delays.Thirdly,a sampling method and a new loss function with the difference between the current and target state value and the difference between real state-action value and augmented state-action value are designed to obtain state transition trajectories close to the real ones.Numerical results demonstrate that the proposed methods are effective in reducing reaction time and improving the task offloading performance in the random-delay Cybertwin-enabled 6G networks. 展开更多
关键词 Cybertwin multi-agent Deep reinforcement learning(MADRL) Task offloading PIPELINING Delay-aware
在线阅读 下载PDF
Energy-saving control strategy for ultra-dense network base stations based on multi-agent reinforcement learning
12
作者 Yan Zhen Litianyi Tao +2 位作者 Dapeng Wu Tong Tang Ruyan Wang 《Digital Communications and Networks》 2025年第4期1006-1016,共11页
Aiming at the problem of mobile data traffic surge in 5G networks,this paper proposes an effective solution combining massive multiple-input multiple-output techniques with Ultra-Dense Network(UDN)and focuses on solvi... Aiming at the problem of mobile data traffic surge in 5G networks,this paper proposes an effective solution combining massive multiple-input multiple-output techniques with Ultra-Dense Network(UDN)and focuses on solving the resulting challenge of increased energy consumption.A base station control algorithm based on Multi-Agent Proximity Policy Optimization(MAPPO)is designed.In the constructed 5G UDN model,each base station is considered as an agent,and the MAPPO algorithm enables inter-base station collaboration and interference management to optimize the network performance.To reduce the extra power consumption due to frequent sleep mode switching of base stations,a sleep mode switching decision algorithm is proposed.The algorithm reduces unnecessary power consumption by evaluating the network state similarity and intelligently adjusting the agent’s action strategy.Simulation results show that the proposed algorithm reduces the power consumption by 24.61% compared to the no-sleep strategy and further reduces the power consumption by 5.36% compared to the traditional MAPPO algorithm under the premise of guaranteeing the quality of service of users. 展开更多
关键词 Ultra dense networks Base station sleep Multiple input multiple output reinforcement learning
在线阅读 下载PDF
A survey on multi-agent reinforcement learning and its application 被引量:4
13
作者 Zepeng Ning Lihua Xie 《Journal of Automation and Intelligence》 2024年第2期73-91,共19页
Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and di... Multi-agent reinforcement learning(MARL)has been a rapidly evolving field.This paper presents a comprehensive survey of MARL and its applications.We trace the historical evolution of MARL,highlight its progress,and discuss related survey works.Then,we review the existing works addressing inherent challenges and those focusing on diverse applications.Some representative stochastic games,MARL means,spatial forms of MARL,and task classification are revisited.We then conduct an in-depth exploration of a variety of challenges encountered in MARL applications.We also address critical operational aspects,such as hyperparameter tuning and computational complexity,which are pivotal in practical implementations of MARL.Afterward,we make a thorough overview of the applications of MARL to intelligent machines and devices,chemical engineering,biotechnology,healthcare,and societal issues,which highlights the extensive potential and relevance of MARL within both current and future technological contexts.Our survey also encompasses a detailed examination of benchmark environments used in MARL research,which are instrumental in evaluating MARL algorithms and demonstrate the adaptability of MARL to diverse application scenarios.In the end,we give our prospect for MARL and discuss their related techniques and potential future applications. 展开更多
关键词 Benchmark environments multi-agent reinforcement learning multi-agent systems Stochastic games
在线阅读 下载PDF
Cooperative decision-making algorithm with efficient convergence for UCAV formation in beyond-visual-range air combat based on multi-agent reinforcement learning 被引量:2
14
作者 Yaoming ZHOU Fan YANG +2 位作者 Chaoyue ZHANG Shida LI Yongchao WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2024年第8期311-328,共18页
Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance ... Highly intelligent Unmanned Combat Aerial Vehicle(UCAV)formation is expected to bring out strengths in Beyond-Visual-Range(BVR)air combat.Although Multi-Agent Reinforcement Learning(MARL)shows outstanding performance in cooperative decision-making,it is challenging for existing MARL algorithms to quickly converge to an optimal strategy for UCAV formation in BVR air combat where confrontation is complicated and reward is extremely sparse and delayed.Aiming to solve this problem,this paper proposes an Advantage Highlight Multi-Agent Proximal Policy Optimization(AHMAPPO)algorithm.First,at every step,the AHMAPPO records the degree to which the best formation exceeds the average of formations in parallel environments and carries out additional advantage sampling according to it.Then,the sampling result is introduced into the updating process of the actor network to improve its optimization efficiency.Finally,the simulation results reveal that compared with some state-of-the-art MARL algorithms,the AHMAPPO can obtain a more excellent strategy utilizing fewer sample episodes in the UCAV formation BVR air combat simulation environment built in this paper,which can reflect the critical features of BVR air combat.The AHMAPPO can significantly increase the convergence efficiency of the strategy for UCAV formation in BVR air combat,with a maximum increase of 81.5%relative to other algorithms. 展开更多
关键词 Unmanned combat aerial vehicle(UCAV)formation DECISION-MAKING Beyond-visual-range(BVR)air combat Advantage highlight multi-agent reinforcement learning(MARL)
原文传递
Collision-free parking recommendation based on multi-agent reinforcement learning in vehicular crowdsensing
15
作者 Xin Li Xinghua Lei +1 位作者 Xiuwen Liu Hang Xiao 《Digital Communications and Networks》 SCIE CSCD 2024年第3期609-619,共11页
The recent proliferation of Fifth-Generation(5G)networks and Sixth-Generation(6G)networks has given rise to Vehicular Crowd Sensing(VCS)systems which solve parking collisions by effectively incentivizing vehicle parti... The recent proliferation of Fifth-Generation(5G)networks and Sixth-Generation(6G)networks has given rise to Vehicular Crowd Sensing(VCS)systems which solve parking collisions by effectively incentivizing vehicle participation.However,instead of being an isolated module,the incentive mechanism usually interacts with other modules.Based on this,we capture this synergy and propose a Collision-free Parking Recommendation(CPR),a novel VCS system framework that integrates an incentive mechanism,a non-cooperative VCS game,and a multi-agent reinforcement learning algorithm,to derive an optimal parking strategy in real time.Specifically,we utilize an LSTM method to predict parking areas roughly for recommendations accurately.Its incentive mechanism is designed to motivate vehicle participation by considering dynamically priced parking tasks and social network effects.In order to cope with stochastic parking collisions,its non-cooperative VCS game further analyzes the uncertain interactions between vehicles in parking decision-making.Then its multi-agent reinforcement learning algorithm models the VCS campaign as a multi-agent Markov decision process that not only derives the optimal collision-free parking strategy for each vehicle independently,but also proves that the optimal parking strategy for each vehicle is Pareto-optimal.Finally,numerical results demonstrate that CPR can accomplish parking tasks at a 99.7%accuracy compared with other baselines,efficiently recommending parking spaces. 展开更多
关键词 Incentive mechanism Non-cooperative VCS game multi-agent reinforcement learning Collision-free parking strategy Vehicular crowdsensing
在线阅读 下载PDF
Unleashing the Power of Multi-Agent Reinforcement Learning for Algorithmic Trading in the Digital Financial Frontier and Enterprise Information Systems
16
作者 Saket Sarin Sunil K.Singh +4 位作者 Sudhakar Kumar Shivam Goyal Brij Bhooshan Gupta Wadee Alhalabi Varsha Arya 《Computers, Materials & Continua》 SCIE EI 2024年第8期3123-3138,共16页
In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading... In the rapidly evolving landscape of today’s digital economy,Financial Technology(Fintech)emerges as a trans-formative force,propelled by the dynamic synergy between Artificial Intelligence(AI)and Algorithmic Trading.Our in-depth investigation delves into the intricacies of merging Multi-Agent Reinforcement Learning(MARL)and Explainable AI(XAI)within Fintech,aiming to refine Algorithmic Trading strategies.Through meticulous examination,we uncover the nuanced interactions of AI-driven agents as they collaborate and compete within the financial realm,employing sophisticated deep learning techniques to enhance the clarity and adaptability of trading decisions.These AI-infused Fintech platforms harness collective intelligence to unearth trends,mitigate risks,and provide tailored financial guidance,fostering benefits for individuals and enterprises navigating the digital landscape.Our research holds the potential to revolutionize finance,opening doors to fresh avenues for investment and asset management in the digital age.Additionally,our statistical evaluation yields encouraging results,with metrics such as Accuracy=0.85,Precision=0.88,and F1 Score=0.86,reaffirming the efficacy of our approach within Fintech and emphasizing its reliability and innovative prowess. 展开更多
关键词 Neurodynamic Fintech multi-agent reinforcement learning algorithmic trading digital financial frontier
在线阅读 下载PDF
Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning
17
作者 Kun Jiang Wenzhang Liu +2 位作者 Yuanda Wang Lu Dong Changyin Sun 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第7期1591-1604,共14页
Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that ... Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning(MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable(MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience.Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms. 展开更多
关键词 Latent variable model maximum entropy multi-agent reinforcement learning(MARL) multi-agent system
在线阅读 下载PDF
Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
18
作者 DONG Yubo CUI Tao +3 位作者 ZHOU Yufan SONG Xun ZHU Yue DONG Peng 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第4期646-655,共10页
Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting... Multi-agent reinforcement learning has recently been applied to solve pursuit problems.However,it suffers from a large number of time steps per training episode,thus always struggling to converge effectively,resulting in low rewards and an inability for agents to learn strategies.This paper proposes a deep reinforcement learning(DRL)training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before.The ensemble reward function combines the advantages of two reward functions,which enhances the training effect of agents in long episode.Then,we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation.Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’policy scores of the task.These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems,leading to an improved model training performance. 展开更多
关键词 multi-agent reinforcement learning deep reinforcement learning(DRL) long episode reward function
原文传递
Safety-Constrained Multi-Agent Reinforcement Learning for Power Quality Control in Distributed Renewable Energy Networks
19
作者 Yongjiang Zhao Haoyi Zhong Chang Cyoon Lim 《Computers, Materials & Continua》 SCIE EI 2024年第4期449-471,共23页
This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature i... This paper examines the difficulties of managing distributed power systems,notably due to the increasing use of renewable energy sources,and focuses on voltage control challenges exacerbated by their variable nature in modern power grids.To tackle the unique challenges of voltage control in distributed renewable energy networks,researchers are increasingly turning towards multi-agent reinforcement learning(MARL).However,MARL raises safety concerns due to the unpredictability in agent actions during their exploration phase.This unpredictability can lead to unsafe control measures.To mitigate these safety concerns in MARL-based voltage control,our study introduces a novel approach:Safety-ConstrainedMulti-Agent Reinforcement Learning(SC-MARL).This approach incorporates a specialized safety constraint module specifically designed for voltage control within the MARL framework.This module ensures that the MARL agents carry out voltage control actions safely.The experiments demonstrate that,in the 33-buses,141-buses,and 322-buses power systems,employing SC-MARL for voltage control resulted in a reduction of the Voltage Out of Control Rate(%V.out)from0.43,0.24,and 2.95 to 0,0.01,and 0.03,respectively.Additionally,the Reactive Power Loss(Q loss)decreased from 0.095,0.547,and 0.017 to 0.062,0.452,and 0.016 in the corresponding systems. 展开更多
关键词 Power quality control multi-agent reinforcement learning safety-constrained MARL
在线阅读 下载PDF
Tactical reward shaping for large-scale combat by multi-agent reinforcement learning
20
作者 DUO Nanxun WANG Qinzhao +1 位作者 LYU Qiang WANG Wei 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1516-1529,共14页
Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the comb... Future unmanned battles desperately require intelli-gent combat policies,and multi-agent reinforcement learning offers a promising solution.However,due to the complexity of combat operations and large size of the combat group,this task suffers from credit assignment problem more than other rein-forcement learning tasks.This study uses reward shaping to relieve the credit assignment problem and improve policy train-ing for the new generation of large-scale unmanned combat operations.We first prove that multiple reward shaping func-tions would not change the Nash Equilibrium in stochastic games,providing theoretical support for their use.According to the characteristics of combat operations,we propose tactical reward shaping(TRS)that comprises maneuver shaping advice and threat assessment-based attack shaping advice.Then,we investigate the effects of different types and combinations of shaping advice on combat policies through experiments.The results show that TRS improves both the efficiency and attack accuracy of combat policies,with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the base-line strategy. 展开更多
关键词 deep reinforcement learning multi-agent reinforce-ment learning multi-agent combat unmanned battle reward shaping
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部