期刊文献+
共找到8篇文章
< 1 >
每页显示 20 50 100
Optimization methods in fully cooperative scenarios:a review ofmultiagent reinforcement learning
1
作者 Tao YANG Xinhao SHI +3 位作者 Qinghan ZENG Yulin YANG Cheng XU Hongzhe LIU 《Frontiers of Information Technology & Electronic Engineering》 2025年第4期479-509,共31页
Multiagent reinforcement learning(MARL)has become a dazzling new star in the field of reinforcement learning in recent years,demonstrating its immense potential across many application scenarios.The reward function di... Multiagent reinforcement learning(MARL)has become a dazzling new star in the field of reinforcement learning in recent years,demonstrating its immense potential across many application scenarios.The reward function directs agents to explore their environments and make optimal decisions within them by establishing evaluation criteria and feedback mechanisms.Concurrently,cooperative objectives at the macro level provide a trajectory for agents’learning,ensuring alignment between individual behavioral strategies and the overarching system goals.The interplay between reward structures and cooperative objectives not only bolsters the effectiveness of individual agents but also fosters interagent collaboration,offering both momentum and direction for the development of swarm intelligence and the harmonious operation of multiagent systems.This review delves deeply into the methods for designing reward structures and optimizing cooperative objectives in MARL,along with the most recent scientific advancements in this field.The article meticulously reviews the application of simulation environments in cooperative scenarios and discusses future trends and potential research directions in the field,providing a forward-looking perspective and inspiration for subsequent research efforts. 展开更多
关键词 multiagent reinforcement learning(MARL) Cooperative framework Reward function Cooperative objective optimization
原文传递
Multi-UAV Cooperative Pursuit Strategy With Limited Visual Field in Urban Airspace:A Multi-Agent Reinforcement Learning Approach
2
作者 Zhe Peng Guohua Wu +1 位作者 Biao Luo Ling Wang 《IEEE/CAA Journal of Automatica Sinica》 2025年第7期1350-1367,共18页
The application of multiple unmanned aerial vehicles(UAVs)for the pursuit and capture of unauthorized UAVs has emerged as a novel approach to ensuring the safety of urban airspace.However,pursuit UAVs necessitate the ... The application of multiple unmanned aerial vehicles(UAVs)for the pursuit and capture of unauthorized UAVs has emerged as a novel approach to ensuring the safety of urban airspace.However,pursuit UAVs necessitate the utilization of their own sensors to proactively gather information from the unauthorized UAV.Considering the restricted sensing range of sensors,this paper proposes a multi-UAV with limited visual field pursuit-evasion(MUV-PE)problem.Each pursuer has a visual field characterized by limited perception distance and viewing angle,potentially obstructed by buildings.Only when the unauthorized UAV,i.e.,the evader,enters the visual field of any pursuer can its position be acquired.The objective of the pursuers is to capture the evader as soon as possible without collision.To address this problem,we propose the normalizing flow actor with graph attention critic(NAGC)algorithm,a multi-agent reinforcement learning(MARL)approach.NAGC executes normalizing flows to augment the flexibility of policy network,enabling the agent to sample actions from more intricate distributions rather than common distributions.To enhance the capability of simultaneously comprehending spatial relationships among multiple UAVs and environmental obstacles,NAGC integrates the“obstacle-target”graph attention networks,significantly aiding pursuers in supporting search or pursuit activities.Extensive experiments conducted in a high-precision simulator validate the promising performance of the NAGC algorithm. 展开更多
关键词 Graph attention network limited visual field multiagent reinforcement learning(MARL) normalizing flow pursuitevasion
在线阅读 下载PDF
Multi-agent Deep Reinforcement Learning Approach for Temporally Coordinated Demand Response in Microgrids
3
作者 Chunchao Hu Zexiang Cai Yanxu Zhang 《CSEE Journal of Power and Energy Systems》 2025年第4期1512-1522,共11页
Price-based and incentive-based demand response(DR)are both recognized as promising solutions to address the increasing uncertainties of renewable energy sources(RES)in microgrids.However,since the temporally optimiza... Price-based and incentive-based demand response(DR)are both recognized as promising solutions to address the increasing uncertainties of renewable energy sources(RES)in microgrids.However,since the temporally optimization horizons of price-based and incentive-based DR are different,few existing methods consider their coordination.In this paper,a multi-agent deep reinforcement learning(MA-DRL)approach is proposed for the temporally coordinated DR in microgrids.The proposed method enhances micrigrid operation revenue by coordinating day-ahead price-based demand response(PBDR)and hourly direct load control(DLC).The operation at different time scales is decided by different DRL agents,and optimized by a multiagent deep deterministic policy gradient(MA-DDPG)using a shared critic to guide agents to attain a global objective.The effectiveness of the proposed approach is validated on a modified IEEE 33-bus distribution system and a modified heavily loaded 69-bus distribution system. 展开更多
关键词 Day-ahead price-based demand response demand response hourly direct load control microgrid multiagent deep reinforcement learning
原文传递
An Improved Multi-Actor Hybrid Attention Critic Algorithm for Cooperative Navigation in Urban Low-Altitude Logistics Environments
4
作者 Chao Li Quanzhi Feng +1 位作者 Caichang Ding Zhiwei Ye 《Computers, Materials & Continua》 2025年第8期3605-3621,共17页
The increasing adoption of unmanned aerial vehicles(UAVs)in urban low-altitude logistics systems,particularly for time-sensitive applications like parcel delivery and supply distribution,necessitates sophisticated coo... The increasing adoption of unmanned aerial vehicles(UAVs)in urban low-altitude logistics systems,particularly for time-sensitive applications like parcel delivery and supply distribution,necessitates sophisticated coordination mechanisms to optimize operational efficiency.However,the limited capability of UAVs to extract stateaction information in complex environments poses significant challenges to achieving effective cooperation in dynamic and uncertain scenarios.To address this,we presents an Improved Multi-Agent Hybrid Attention Critic(IMAHAC)framework that advances multi-agent deep reinforcement learning(MADRL)through two key innovations.Firstly,a Temporal Difference Error and Time-based Prioritized Experience Replay(TT-PER)mechanism that dynamically adjusts sample weights based on temporal relevance and prediction error magnitude,effectively reducing the interference from obsolete collaborative experiences while maintaining training stability.Secondly,a hybrid attention mechanism is developed,integrating a sensor fusion layer—which aggregates features from multi-sensor data to enhance decision-making—and a dissimilarity layer that evaluates the similarity between key-value pairs and query values.By combining this hybrid attention mechanism with theMulti-Actor Attention Critic(MAAC)framework,our approach strengthens UAVs’capability to extract critical state-action features in diverse environments.Comprehensive simulations in urban air mobility scenarios demonstrate IMAHAC’s superiority over conventional MADRL baselines and MAAC,achieving higher cumulative rewards,fewer collisions,and enhanced cooperative capabilities.This work provides both algorithmic advancements and empirical validation for developing robust autonomous aerial systems in smart city infrastructures. 展开更多
关键词 Unmanned aerial vehicles multiagent deep reinforcement learning attention mechanism
在线阅读 下载PDF
Optimized Consensus for Blockchain in Internet of Things Networks via Reinforcement Learning
5
作者 Yifei Zou Zongjing Jin +2 位作者 Yanwei Zheng Dongxiao Yu Tian Lan 《Tsinghua Science and Technology》 SCIE EI CAS CSCD 2023年第6期1009-1022,共14页
Most blockchain systems currently adopt resource-consuming protocols to achieve consensus between miners;for example,the Proof-of-Work(PoW)and Practical Byzantine Fault Tolerant(PBFT)schemes,which have a high consumpt... Most blockchain systems currently adopt resource-consuming protocols to achieve consensus between miners;for example,the Proof-of-Work(PoW)and Practical Byzantine Fault Tolerant(PBFT)schemes,which have a high consumption of computing/communication resources and usually require reliable communications with bounded delay.However,these protocols may be unsuitable for Internet of Things(IoT)networks because the IoT devices are usually lightweight,battery-operated,and deployed in an unreliable wireless environment.Therefore,this paper studies an efficient consensus protocol for blockchain in IoT networks via reinforcement learning.Specifically,the consensus protocol in this work is designed on the basis of the Proof-of-Communication(PoC)scheme directly in a single-hop wireless network with unreliable communications.A distributed MultiAgent Reinforcement Learning(MARL)algorithm is proposed to improve the efficiency and fairness of consensus for miners in the blockchain system.In this algorithm,each agent uses a matrix to depict the efficiency and fairness of the recent consensus and tunes its actions and rewards carefully in an actor-critic framework to seek effective performance.Empirical results from the simulation show that the fairness of consensus in the proposed algorithm is guaranteed,and the efficiency nearly reaches a centralized optimal solution. 展开更多
关键词 consensus in blockchain Proof-of-Communication(PoC) multiagent reinforcement learning(MARL) Internet of Things(IoT)networks
原文传递
Deep Reinforcement Learning with Fuse Adaptive Weighted Demonstration Data
6
作者 Baofu Fang Taifeng Guo 《国际计算机前沿大会会议论文集》 2022年第1期163-177,共15页
Traditional multi-agent deep reinforcement learning has difficulty obtaining rewards,slow convergence,and effective cooperation among agents in the pretraining period due to the large joint state space and sparse rewa... Traditional multi-agent deep reinforcement learning has difficulty obtaining rewards,slow convergence,and effective cooperation among agents in the pretraining period due to the large joint state space and sparse rewards for action.Therefore,this paper discusses the role of demonstration data in multiagent systems and proposes a multi-agent deep reinforcement learning algorithm from fuse adaptive weight fusion demonstration data.The algorithm sets the weights according to the performance and uses the importance sampling method to bridge the deviation in the mixed sampled data to combine the expert data obtained in the simulation environment with the distributed multi-agent reinforcement learning algorithm to solve the difficult problem.The problem of global exploration improves the convergence speed of the algorithm.The results in the RoboCup2D soccer simulation environment show that the algorithm improves the ability of the agent to hold and shoot the ball,enabling the agent to achieve a higher goal scoring rate and convergence speed relative to demonstration policies and mainstream multi-agent reinforcement learning algorithms. 展开更多
关键词 multiagent deep reinforcement learning Exploration Offline reinforcement learning Importance sampling
原文传递
Safety-critical scenario test for intelligent vehicles via hybrid participation of natural and adversarial agents
7
作者 Yong Wang Daifeng Zhang +3 位作者 Yanqiang Li Liguo Shuai Zhicheng Tang Yuxiang Hou 《Journal of Intelligent and Connected Vehicles》 2025年第3期74-86,共13页
The intensity levels of autonomous vehicles should be thoroughly evaluated before deployment,while vehicle tests are difficult for the sake of heavy experimental resources and large numbers of cases,especially tests t... The intensity levels of autonomous vehicles should be thoroughly evaluated before deployment,while vehicle tests are difficult for the sake of heavy experimental resources and large numbers of cases,especially tests that include safety-critical scenarios.In this study,a new scenario generation method is proposed to accelerate the test,which is based on a multiagent reinforcement learning(MARL)framework incorporating the driving potential field(DPF).This framework is used to train some background vehicles to enable high-risk and marginal scenes,where the DPF is applied to enact the rewards of the adversarial background agents.Other background vehicles that use reasonable driving policies,which serve as naturalistic agents to increase scenario diversity,are also considered.The coexistence of naturalistic and adversarial agents enriches the experiences learned by the background cars,providing more marginal and risky scenarios for accelerating the test.The experimental results demonstrate the efficiency of the generation of high-risk and marginal scenes,with comprehensive assessments via a novel field-based dynamic risk evaluation method. 展开更多
关键词 autonomous vehicle safety-critical scenario natural and adversarial agents multiagent reinforcement learning(MARL) potential fields
在线阅读 下载PDF
Deep MARL-Based Resilient Motion Planning for Decentralized Space Manipulator
8
作者 Jiawei Zhang Chengchao Bai +1 位作者 C.Patrick Yue Jifeng Guo 《Space(Science & Technology)》 2024年第1期160-169,共10页
Space manipulators play an important role in the on-orbit services and planetary surface operation.In the extreme environment of space,space manipulators are susceptible to a variety of unknown disturbances.How to hav... Space manipulators play an important role in the on-orbit services and planetary surface operation.In the extreme environment of space,space manipulators are susceptible to a variety of unknown disturbances.How to have a resilient guarantee in failure or disturbance is the core capability of its future development.Compared with traditional motion planning,learning-based motion planning has gradually become a hot spot in current research.However,no matter what kind of research ideas,the single robotic manipulator is studied as an independent agent,making it unable to provide sufficient flexibility under conditions such as external force disturbance,observation noise,and mechanical failure.Therefore,this paper puts forward the idea of“discretization of the traditional single manipulator”.Different discretization forms are given through the analysis of the multi-degree-of-freedom single-manipulator joint relationship,and a single-manipulator representation composed of multiple new subagents is obtained.Simultaneously,to verify the ability of the new multiagent representation to deal with interference,we adopted a centralized multiagent reinforcement learning framework.The influence of the number of agents and communication distances on learning-based planning results is analyzed in detail.In addition,by imposing joint locking failures on the manipulator and introducing observation and action interference,it is verified that the“multiagent robotic manipulator”obtained after discretization has stronger antidisturbance resilient capability than the traditional single manipulator. 展开更多
关键词 planetary surface motion planning multiagent reinforcement learning space manipulators resilient motion planning robotic manipula decentralized space manipulator deep marl
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部