期刊文献+
共找到4,253篇文章
< 1 2 213 >
每页显示 20 50 100
Research on UAV-MEC Cooperative Scheduling Algorithms Based on Multi-Agent Deep Reinforcement Learning
1
作者 Yonghua Huo Ying Liu +1 位作者 Anni Jiang Yang Yang 《Computers, Materials & Continua》 2026年第3期1823-1850,共28页
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier... With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics. 展开更多
关键词 UAV-MEC networks multi-agent deep reinforcement learning MATD3 task offloading
在线阅读 下载PDF
MARCS:A Mobile Crowdsensing Framework Based on Data Shapley Value Enabled Multi-Agent Deep Reinforcement Learning
2
作者 Yiqin Wang Yufeng Wang +1 位作者 Jianhua Ma Qun Jin 《Computers, Materials & Continua》 2025年第3期4431-4449,共19页
Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.Howeve... Opportunistic mobile crowdsensing(MCS)non-intrusively exploits human mobility trajectories,and the participants’smart devices as sensors have become promising paradigms for various urban data acquisition tasks.However,in practice,opportunistic MCS has several challenges from both the perspectives of MCS participants and the data platform.On the one hand,participants face uncertainties in conducting MCS tasks,including their mobility and implicit interactions among participants,and participants’economic returns given by the MCS data platform are determined by not only their own actions but also other participants’strategic actions.On the other hand,the platform can only observe the participants’uploaded sensing data that depends on the unknown effort/action exerted by participants to the platform,while,for optimizing its overall objective,the platform needs to properly reward certain participants for incentivizing them to provide high-quality data.To address the challenge of balancing individual incentives and platform objectives in MCS,this paper proposes MARCS,an online sensing policy based on multi-agent deep reinforcement learning(MADRL)with centralized training and decentralized execution(CTDE).Specifically,the interactions between MCS participants and the data platform are modeled as a partially observable Markov game,where participants,acting as agents,use DRL-based policies to make decisions based on local observations,such as task trajectories and platform payments.To align individual and platform goals effectively,the platform leverages Shapley value to estimate the contribution of each participant’s sensed data,using these estimates as immediate rewards to guide agent training.The experimental results on real mobility trajectory datasets indicate that the revenue of MARCS reaches almost 35%,53%,and 100%higher than DDPG,Actor-Critic,and model predictive control(MPC)respectively on the participant side and similar results on the platform side,which show superior performance compared to baselines. 展开更多
关键词 Mobile crowdsensing online data acquisition data Shapley value multi-agent deep reinforcement learning centralized training and decentralized execution(CTDE)
在线阅读 下载PDF
Multi-agent deep reinforcement learning based resource management in heterogeneous V2X networks
3
作者 Junhui Zhao Fajin Hu +1 位作者 Jiahang Li Yiwen Nie 《Digital Communications and Networks》 2025年第1期182-190,共9页
In Heterogeneous Vehicle-to-Everything Networks(HVNs),multiple users such as vehicles and handheld devices and infrastructure can communicate with each other to obtain more advanced services.However,the increasing num... In Heterogeneous Vehicle-to-Everything Networks(HVNs),multiple users such as vehicles and handheld devices and infrastructure can communicate with each other to obtain more advanced services.However,the increasing number of entities accessing HVNs presents a huge technical challenge to allocate the limited wireless resources.Traditional model-driven resource allocation approaches are no longer applicable because of rich data and the interference problem of multiple communication modes reusing resources in HVNs.In this paper,we investigate a wireless resource allocation scheme including power control and spectrum allocation based on the resource block reuse strategy.To meet the high capacity of cellular users and the high reliability of Vehicle-to-Vehicle(V2V)user pairs,we propose a data-driven Multi-Agent Deep Reinforcement Learning(MADRL)resource allocation scheme for the HVN.Simulation results demonstrate that compared to existing algorithms,the proposed MADRL-based scheme achieves a high sum capacity and probability of successful V2V transmission,while providing close-to-limit performance. 展开更多
关键词 DATA-DRIVEN deep reinforcement learning Resource allocation V2X communications
在线阅读 下载PDF
UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach 被引量:3
4
作者 Jiawen Kang Junlong Chen +6 位作者 Minrui Xu Zehui Xiong Yutao Jiao Luchao Han Dusit Niyato Yongju Tong Shengli Xie 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第2期430-445,共16页
Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metavers... Avatars, as promising digital representations and service assistants of users in Metaverses, can enable drivers and passengers to immerse themselves in 3D virtual services and spaces of UAV-assisted vehicular Metaverses. However, avatar tasks include a multitude of human-to-avatar and avatar-to-avatar interactive applications, e.g., augmented reality navigation,which consumes intensive computing resources. It is inefficient and impractical for vehicles to process avatar tasks locally. Fortunately, migrating avatar tasks to the nearest roadside units(RSU)or unmanned aerial vehicles(UAV) for execution is a promising solution to decrease computation overhead and reduce task processing latency, while the high mobility of vehicles brings challenges for vehicles to independently perform avatar migration decisions depending on current and future vehicle status. To address these challenges, in this paper, we propose a novel avatar task migration system based on multi-agent deep reinforcement learning(MADRL) to execute immersive vehicular avatar tasks dynamically. Specifically, we first formulate the problem of avatar task migration from vehicles to RSUs/UAVs as a partially observable Markov decision process that can be solved by MADRL algorithms. We then design the multi-agent proximal policy optimization(MAPPO) approach as the MADRL algorithm for the avatar task migration problem. To overcome slow convergence resulting from the curse of dimensionality and non-stationary issues caused by shared parameters in MAPPO, we further propose a transformer-based MAPPO approach via sequential decision-making models for the efficient representation of relationships among agents. Finally, to motivate terrestrial or non-terrestrial edge servers(e.g., RSUs or UAVs) to share computation resources and ensure traceability of the sharing records, we apply smart contracts and blockchain technologies to achieve secure sharing management. Numerical results demonstrate that the proposed approach outperforms the MAPPO approach by around 2% and effectively reduces approximately 20% of the latency of avatar task execution in UAV-assisted vehicular Metaverses. 展开更多
关键词 AVATAR blockchain metaverses multi-agent deep reinforcement learning transformer UAVS
在线阅读 下载PDF
Multi-Agent Deep Reinforcement Learning for Efficient Computation Offloading in Mobile Edge Computing
5
作者 Tianzhe Jiao Xiaoyue Feng +2 位作者 Chaopeng Guo Dongqi Wang Jie Song 《Computers, Materials & Continua》 SCIE EI 2023年第9期3585-3603,共19页
Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtua... Mobile-edge computing(MEC)is a promising technology for the fifth-generation(5G)and sixth-generation(6G)architectures,which provides resourceful computing capabilities for Internet of Things(IoT)devices,such as virtual reality,mobile devices,and smart cities.In general,these IoT applications always bring higher energy consumption than traditional applications,which are usually energy-constrained.To provide persistent energy,many references have studied the offloading problem to save energy consumption.However,the dynamic environment dramatically increases the optimization difficulty of the offloading decision.In this paper,we aim to minimize the energy consumption of the entireMECsystemunder the latency constraint by fully considering the dynamic environment.UnderMarkov games,we propose amulti-agent deep reinforcement learning approach based on the bi-level actorcritic learning structure to jointly optimize the offloading decision and resource allocation,which can solve the combinatorial optimization problem using an asymmetric method and compute the Stackelberg equilibrium as a better convergence point than Nash equilibrium in terms of Pareto superiority.Our method can better adapt to a dynamic environment during the data transmission than the single-agent strategy and can effectively tackle the coordination problem in the multi-agent environment.The simulation results show that the proposed method could decrease the total computational overhead by 17.8%compared to the actor-critic-based method and reduce the total computational overhead by 31.3%,36.5%,and 44.7%compared with randomoffloading,all local execution,and all offloading execution,respectively. 展开更多
关键词 Computation offloading multi-agent deep reinforcement learning mobile-edge computing latency energy efficiency
在线阅读 下载PDF
UAV Frequency-based Crowdsensing Using Grouping Multi-agent Deep Reinforcement Learning
6
作者 Cui ZHANG En WANG +2 位作者 Funing YANG Yong jian YANG Nan JIANG 《计算机科学》 CSCD 北大核心 2023年第2期57-68,共12页
Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user partic... Mobile CrowdSensing(MCS)is a promising sensing paradigm that recruits users to cooperatively perform sensing tasks.Recently,unmanned aerial vehicles(UAVs)as the powerful sensing devices are used to replace user participation and carry out some special tasks,such as epidemic monitoring and earthquakes rescue.In this paper,we focus on scheduling UAVs to sense the task Point-of-Interests(PoIs)with different frequency coverage requirements.To accomplish the sensing task,the scheduling strategy needs to consider the coverage requirement,geographic fairness and energy charging simultaneously.We consider the complex interaction among UAVs and propose a grouping multi-agent deep reinforcement learning approach(G-MADDPG)to schedule UAVs distributively.G-MADDPG groups all UAVs into some teams by a distance-based clustering algorithm(DCA),then it regards each team as an agent.In this way,G-MADDPG solves the problem that the training time of traditional MADDPG is too long to converge when the number of UAVs is large,and the trade-off between training time and result accuracy could be controlled flexibly by adjusting the number of teams.Extensive simulation results show that our scheduling strategy has better performance compared with three baselines and is flexible in balancing training time and result accuracy. 展开更多
关键词 UAV Crowdsensing Frequency coverage Grouping multi-agent deep reinforcement learning
在线阅读 下载PDF
Multi-Agent Deep Reinforcement Learning-Based Resource Allocation in HPC/AI Converged Cluster 被引量:1
7
作者 Jargalsaikhan Narantuya Jun-Sik Shin +1 位作者 Sun Park JongWon Kim 《Computers, Materials & Continua》 SCIE EI 2022年第9期4375-4395,共21页
As the complexity of deep learning(DL)networks and training data grows enormously,methods that scale with computation are becoming the future of artificial intelligence(AI)development.In this regard,the interplay betw... As the complexity of deep learning(DL)networks and training data grows enormously,methods that scale with computation are becoming the future of artificial intelligence(AI)development.In this regard,the interplay between machine learning(ML)and high-performance computing(HPC)is an innovative paradigm to speed up the efficiency of AI research and development.However,building and operating an HPC/AI converged system require broad knowledge to leverage the latest computing,networking,and storage technologies.Moreover,an HPC-based AI computing environment needs an appropriate resource allocation and monitoring strategy to efficiently utilize the system resources.In this regard,we introduce a technique for building and operating a high-performance AI-computing environment with the latest technologies.Specifically,an HPC/AI converged system is configured inside Gwangju Institute of Science and Technology(GIST),called GIST AI-X computing cluster,which is built by leveraging the latest Nvidia DGX servers,high-performance storage and networking devices,and various open source tools.Therefore,it can be a good reference for building a small or middlesized HPC/AI converged system for research and educational institutes.In addition,we propose a resource allocation method for DL jobs to efficiently utilize the computing resources with multi-agent deep reinforcement learning(mDRL).Through extensive simulations and experiments,we validate that the proposed mDRL algorithm can help the HPC/AI converged cluster to achieve both system utilization and power consumption improvement.By deploying the proposed resource allocation method to the system,total job completion time is reduced by around 20%and inefficient power consumption is reduced by around 40%. 展开更多
关键词 deep learning HPC/AI converged cluster reinforcement learning
在线阅读 下载PDF
Multi-Agent Deep Reinforcement Learning for Cross-Layer Scheduling in Mobile Ad-Hoc Networks 被引量:1
8
作者 Xinxing Zheng Yu Zhao +1 位作者 Joohyun Lee Wei Chen 《China Communications》 SCIE CSCD 2023年第8期78-88,共11页
Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus o... Due to the fading characteristics of wireless channels and the burstiness of data traffic,how to deal with congestion in Ad-hoc networks with effective algorithms is still open and challenging.In this paper,we focus on enabling congestion control to minimize network transmission delays through flexible power control.To effectively solve the congestion problem,we propose a distributed cross-layer scheduling algorithm,which is empowered by graph-based multi-agent deep reinforcement learning.The transmit power is adaptively adjusted in real-time by our algorithm based only on local information(i.e.,channel state information and queue length)and local communication(i.e.,information exchanged with neighbors).Moreover,the training complexity of the algorithm is low due to the regional cooperation based on the graph attention network.In the evaluation,we show that our algorithm can reduce the transmission delay of data flow under severe signal interference and drastically changing channel states,and demonstrate the adaptability and stability in different topologies.The method is general and can be extended to various types of topologies. 展开更多
关键词 Ad-hoc network cross-layer scheduling multi agent deep reinforcement learning interference elimination power control queue scheduling actorcritic methods markov decision process
在线阅读 下载PDF
Exploring Local Chemical Space in De Novo Molecular Generation Using Multi-Agent Deep Reinforcement Learning 被引量:2
9
作者 Wei Hu 《Natural Science》 2021年第9期412-424,共13页
Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also ... Single-agent reinforcement learning (RL) is commonly used to learn how to play computer games, in which the agent makes one move before making the next in a sequential decision process. Recently single agent was also employed in the design of molecules and drugs. While a single agent is a good fit for computer games, it has limitations when used in molecule design. Its sequential learning makes it impossible to modify or improve the previous steps while working on the current step. In this paper, we proposed to apply the multi-agent RL approach to the research of molecules, which can optimize all sites of a molecule simultaneously. To elucidate the validity of our approach, we chose one chemical compound Favipiravir to explore its local chemical space. Favipiravir is a broad-spectrum inhibitor of viral RNA polymerase, and is one of the compounds that are currently being used in SARS-CoV-2 (COVID-19) clinical trials. Our experiments revealed the collaborative learning of a team of deep RL agents as well as the learning of its individual learning agent in the exploration of Favipiravir. In particular, our multi-agents not only discovered the molecules near Favipiravir in chemical space, but also the learnability of each site in the string representation of Favipiravir, critical information for us to understand the underline mechanism that supports machine learning of molecules. 展开更多
关键词 multi-agent reinforcement learning Actor-Critic Molecule Design SARS-CoV-2 COVID-19
在线阅读 下载PDF
Service Function Chain Deployment Algorithm Based on Multi-Agent Deep Reinforcement Learning
10
作者 Wanwei Huang Qiancheng Zhang +2 位作者 Tao Liu YaoliXu Dalei Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第9期4875-4893,共19页
Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(S... Aiming at the rapid growth of network services,which leads to the problems of long service request processing time and high deployment cost in the deployment of network function virtualization service function chain(SFC)under 5G networks,this paper proposes a multi-agent deep deterministic policy gradient optimization algorithm for SFC deployment(MADDPG-SD).Initially,an optimization model is devised to enhance the request acceptance rate,minimizing the latency and deploying the cost SFC is constructed for the network resource-constrained case.Subsequently,we model the dynamic problem as a Markov decision process(MDP),facilitating adaptation to the evolving states of network resources.Finally,by allocating SFCs to different agents and adopting a collaborative deployment strategy,each agent aims to maximize the request acceptance rate or minimize latency and costs.These agents learn strategies from historical data of virtual network functions in SFCs to guide server node selection,and achieve approximately optimal SFC deployment strategies through a cooperative framework of centralized training and distributed execution.Experimental simulation results indicate that the proposed method,while simultaneously meeting performance requirements and resource capacity constraints,has effectively increased the acceptance rate of requests compared to the comparative algorithms,reducing the end-to-end latency by 4.942%and the deployment cost by 8.045%. 展开更多
关键词 Network function virtualization service function chain Markov decision process multi-agent reinforcement learning
在线阅读 下载PDF
Energy Optimization for Autonomous Mobile Robot Path Planning Based on Deep Reinforcement Learning
11
作者 Longfei Gao Weidong Wang Dieyun Ke 《Computers, Materials & Continua》 2026年第1期984-998,共15页
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ... At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems. 展开更多
关键词 Autonomous mobile robot deep reinforcement learning energy optimization multi-attention mechanism prioritized experience replay dueling deep Q-Network
在线阅读 下载PDF
A Deep Reinforcement Learning-Based Partitioning Method for Power System Parallel Restoration
12
作者 Changcheng Li Weimeng Chang +1 位作者 Dahai Zhang Jinghan He 《Energy Engineering》 2026年第1期243-264,共22页
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision... Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training. 展开更多
关键词 Partitioning method parallel restoration deep reinforcement learning experience replay buffer partitioning modularity
在线阅读 下载PDF
A Multi-Objective Deep Reinforcement Learning Algorithm for Computation Offloading in Internet of Vehicles
13
作者 Junjun Ren Guoqiang Chen +1 位作者 Zheng-Yi Chai Dong Yuan 《Computers, Materials & Continua》 2026年第1期2111-2136,共26页
Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain... Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively. 展开更多
关键词 deep reinforcement learning internet of vehicles multi-objective optimization cloud-edge computing computation offloading service caching
在线阅读 下载PDF
Noise-driven enhancement for exploration:Deep reinforcement learning for UAV autonomous navigation in complex environments
14
作者 Haotian ZHANG Yiyang LI +1 位作者 Lingquan CHENG Jianliang AI 《Chinese Journal of Aeronautics》 2026年第1期454-471,共18页
Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressin... Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results. 展开更多
关键词 Action space exploration Autonomous navigation deep reinforcement learning Twin delay deep deterministic policy gradient Unmanned aerial vehicle
原文传递
Deep reinforcement learning-based adaptive collision avoidance method for UAV in joint operational airspace
15
作者 Yan Shen Xuejun Zhang +1 位作者 Yan Li Weidong Zhang 《Defence Technology(防务技术)》 2026年第2期142-159,共18页
As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,t... As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios. 展开更多
关键词 Unmanned aerial vehicle Collision avoidance deep reinforcement learning Joint operational airspace Hierarchical prioritized experience replay
在线阅读 下载PDF
AquaTree:Deep Reinforcement Learning-Driven Monte Carlo Tree Search for Underwater Image Enhancement
16
作者 Chao Li Jianing Wang +1 位作者 Caichang Ding Zhiwei Ye 《Computers, Materials & Continua》 2026年第3期1444-1464,共21页
Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)meth... Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics. 展开更多
关键词 Underwater image enhancement(UIE) Monte Carlo tree search(MCTS) deep reinforcement learning(DRL) Markov decision process(MDP)
在线阅读 下载PDF
Multi-objective optimization of hybrid electric vehicles energy management using multi-agent deep reinforcement learning framework
17
作者 Xiaoyu Li Zaihang Zhou +2 位作者 Changyin Wei Xiao Gao Yibo Zhang 《Energy and AI》 2025年第2期287-297,共11页
Hybrid electric vehicles(HEVs)have the advantages of lower emissions and less noise pollution than traditional fuel vehicles.Developing reasonable energy management strategies(EMSs)can effectively reduce fuel consumpt... Hybrid electric vehicles(HEVs)have the advantages of lower emissions and less noise pollution than traditional fuel vehicles.Developing reasonable energy management strategies(EMSs)can effectively reduce fuel consumption and improve the fuel economy of HEVs.However,current EMSs still have problems,such as complex multi-objective optimization and poor algorithm robustness.Herein,a multi-agent reinforcement learning(MADRL)framework is proposed based on Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm to solve such problems.Specifically,a vehicle model and dynamics model are established,and based on this,a multi-objective EMS is developed by considering fuel economy,maintaining the battery State of Charge(SOC),and reducing battery degradation.Secondly,the proposed strategy regards the engine and battery as two agents,and the agents cooperate with each other to realize optimal power distribution and achieve the optimal control strategy.Finally,the WLTC and HWFET driving cycles are employed to verify the performances of the proposed method,the fuel consumption decreases by 26.91%and 8.41%on average compared to the other strategies.The simulation results demonstrate that the proposed strategy has remarkable superiority in multi-objective optimization. 展开更多
关键词 Energy management strategy Hybrid electric vehicle reinforcement learning multi-agent deep deterministicstrategy gradient
在线阅读 下载PDF
Multi-agent deep reinforcement learning with traffic flow for traffic signal control
18
作者 Liang Hou Dailin Huang +1 位作者 Jie Cao Jialin Ma 《Journal of Control and Decision》 2025年第1期81-92,共12页
Multi-agent Reinforcement Learning(MARL)has become one of the best methods in Adaptive Traffic Signal Control(ATSC).Traffic flow is a very regular traffic volume,which is highly critical to signal control policy.Howev... Multi-agent Reinforcement Learning(MARL)has become one of the best methods in Adaptive Traffic Signal Control(ATSC).Traffic flow is a very regular traffic volume,which is highly critical to signal control policy.However,dynamic control policies will directly affect traffic flow formation,and it is impossible to provide observation through the original traffic flow prediction.This paper proposes a method for estimating traffic flow according to the time window in Reinforcement Learning(RL)training.Therefore,it is verified on both the regular road network and the real road network.Our method further reduces the intersection delay and queue length compared with the original method. 展开更多
关键词 multi-agent reinforcement learning actor-critic adaptive traffic signal control traffic flow
原文传递
Novel multi-agent action masked deep reinforcement learning for general industrial assembly lines balancing problems
19
作者 Ali M.Ali Luca Tirel Hashim A.Hashim 《Journal of Automation and Intelligence》 2025年第4期299-311,共13页
Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to... Efficient planning of activities is essential for modern industrial assembly lines to uphold manufacturing standards,prevent project constraint violations,and achieve cost-effective operations.While exact solutions to such challenges can be obtained through Integer Programming(IP),the dependence of the search space on input parameters often makes IP computationally infeasible for large-scale scenarios.Heuristic methods,such as Genetic Algorithms,can also be applied,but they frequently produce suboptimal solutions in extensive cases.This paper introduces a novel mathematical model of a generic industrial assembly line formulated as a Markov Decision Process(MDP),without imposing assumptions on the type of assembly line a notable distinction from most existing models.The proposed model is employed to create a virtual environment for training Deep Reinforcement Learning(DRL)agents to optimize task and resource scheduling.To enhance the efficiency of agent training,the paper proposes two innovative tools.The first is an action-masking technique,which ensures the agent selects only feasible actions,thereby reducing training time.The second is a multi-agent approach,where each workstation is managed by an individual agent,as a result,the state and action spaces were reduced.A centralized training framework with decentralized execution is adopted,offering a scalable learning architecture for optimizing industrial assembly lines.This framework allows the agents to learn offline and subsequently provide real-time solutions during operations by leveraging a neural network that maps the current factory state to the optimal action.The effectiveness of the proposed scheme is validated through numerical simulations,demonstrating significantly faster convergence to the optimal solution compared to a comparable model-based approach. 展开更多
关键词 Artificial intelligence in industrial engineering Autonomous decision making Distributed multi-agent learning reinforcement learning
在线阅读 下载PDF
Multi-agent Deep Reinforcement Learning Approach for Temporally Coordinated Demand Response in Microgrids
20
作者 Chunchao Hu Zexiang Cai Yanxu Zhang 《CSEE Journal of Power and Energy Systems》 2025年第4期1512-1522,共11页
Price-based and incentive-based demand response(DR)are both recognized as promising solutions to address the increasing uncertainties of renewable energy sources(RES)in microgrids.However,since the temporally optimiza... Price-based and incentive-based demand response(DR)are both recognized as promising solutions to address the increasing uncertainties of renewable energy sources(RES)in microgrids.However,since the temporally optimization horizons of price-based and incentive-based DR are different,few existing methods consider their coordination.In this paper,a multi-agent deep reinforcement learning(MA-DRL)approach is proposed for the temporally coordinated DR in microgrids.The proposed method enhances micrigrid operation revenue by coordinating day-ahead price-based demand response(PBDR)and hourly direct load control(DLC).The operation at different time scales is decided by different DRL agents,and optimized by a multiagent deep deterministic policy gradient(MA-DDPG)using a shared critic to guide agents to attain a global objective.The effectiveness of the proposed approach is validated on a modified IEEE 33-bus distribution system and a modified heavily loaded 69-bus distribution system. 展开更多
关键词 Day-ahead price-based demand response demand response hourly direct load control microgrid multiagent deep reinforcement learning
原文传递
上一页 1 2 213 下一页 到第
使用帮助 返回顶部