期刊文献+
共找到440篇文章
< 1 2 22 >
每页显示 20 50 100
A Multi-Objective Deep Reinforcement Learning Algorithm for Computation Offloading in Internet of Vehicles
1
作者 Junjun Ren Guoqiang Chen +1 位作者 Zheng-Yi Chai Dong Yuan 《Computers, Materials & Continua》 2026年第1期2111-2136,共26页
Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain... Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively. 展开更多
关键词 deep reinforcement learning internet of vehicles multi-objective optimization cloud-edge computing computation offloading service caching
在线阅读 下载PDF
Energy Optimization for Autonomous Mobile Robot Path Planning Based on Deep Reinforcement Learning
2
作者 Longfei Gao Weidong Wang Dieyun Ke 《Computers, Materials & Continua》 2026年第1期984-998,共15页
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ... At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems. 展开更多
关键词 Autonomous mobile robot deep reinforcement learning energy optimization multi-attention mechanism prioritized experience replay dueling deep Q-Network
在线阅读 下载PDF
A Deep Reinforcement Learning-Based Partitioning Method for Power System Parallel Restoration
3
作者 Changcheng Li Weimeng Chang +1 位作者 Dahai Zhang Jinghan He 《Energy Engineering》 2026年第1期243-264,共22页
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision... Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training. 展开更多
关键词 Partitioning method parallel restoration deep reinforcement learning experience replay buffer partitioning modularity
在线阅读 下载PDF
Deep reinforcement learning-based adaptive collision avoidance method for UAV in joint operational airspace
4
作者 Yan Shen Xuejun Zhang +1 位作者 Yan Li Weidong Zhang 《Defence Technology(防务技术)》 2026年第2期142-159,共18页
As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,t... As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios. 展开更多
关键词 Unmanned aerial vehicle Collision avoidance deep reinforcement learning Joint operational airspace Hierarchical prioritized experience replay
在线阅读 下载PDF
Noise-driven enhancement for exploration:Deep reinforcement learning for UAV autonomous navigation in complex environments
5
作者 Haotian ZHANG Yiyang LI +1 位作者 Lingquan CHENG Jianliang AI 《Chinese Journal of Aeronautics》 2026年第1期454-471,共18页
Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressin... Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results. 展开更多
关键词 Action space exploration Autonomous navigation deep reinforcement learning Twin delay deep deterministic policy gradient Unmanned aerial vehicle
原文传递
Research on UAV-MEC Cooperative Scheduling Algorithms Based on Multi-Agent Deep Reinforcement Learning
6
作者 Yonghua Huo Ying Liu +1 位作者 Anni Jiang Yang Yang 《Computers, Materials & Continua》 2026年第3期1823-1850,共28页
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier... With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics. 展开更多
关键词 UAV-MEC networks multi-agent deep reinforcement learning MATD3 task offloading
在线阅读 下载PDF
AquaTree:Deep Reinforcement Learning-Driven Monte Carlo Tree Search for Underwater Image Enhancement
7
作者 Chao Li Jianing Wang +1 位作者 Caichang Ding Zhiwei Ye 《Computers, Materials & Continua》 2026年第3期1444-1464,共21页
Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)meth... Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics. 展开更多
关键词 Underwater image enhancement(UIE) Monte Carlo tree search(MCTS) deep reinforcement learning(DRL) Markov decision process(MDP)
在线阅读 下载PDF
Pathfinder:Deep Reinforcement Learning-Based Scheduling for Multi-Robot Systems in Smart Factories with Mass Customization 被引量:1
8
作者 Chenxi Lyu Chen Dong +3 位作者 Qiancheng Xiong Yuzhong Chen Qian Weng Zhenyi Chen 《Computers, Materials & Continua》 2025年第8期3371-3391,共21页
The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability an... The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments. 展开更多
关键词 Smart factory CUSTOMIZATION deep reinforcement learning production scheduling multi-robot system task allocation
在线阅读 下载PDF
Deep reinforcement learning based integrated evasion and impact hierarchical intelligent policy of exo-atmospheric vehicles 被引量:1
9
作者 Leliang REN Weilin GUO +3 位作者 Yong XIAN Zhenyu LIU Daqiao ZHANG Shaopeng LI 《Chinese Journal of Aeronautics》 2025年第1期409-426,共18页
Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision u... Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value. 展开更多
关键词 Exo-atmospheric vehicle Integrated evasion and impact deep reinforcement learning Hierarchical intelligent policy Single-chip microcomputer Miss distance
原文传递
Multi-station multi-robot task assignment method based on deep reinforcement learning 被引量:1
10
作者 Junnan Zhang Ke Wang Chaoxu Mu 《CAAI Transactions on Intelligence Technology》 2025年第1期134-146,共13页
This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent... This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent policy networks.The graph of welding spots distribution is encoded using the graph attention network.Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks.The policy network is used to convert the large scale welding spots allocation problem to multiple small scale singlerobot welding path planning problems,and the path planning problem is quickly solved through existing methods.Then,the model is trained through reinforcement learning.In addition,the task balancing method is used to allocate tasks to multiple stations.The proposed algorithm is compared with classical algorithms,and the results show that the algorithm based on DRL can produce higher quality solutions. 展开更多
关键词 attention mechanism deep reinforcement learning graph neural network industrial robot task allocation
在线阅读 下载PDF
Optimization of Robotic Arm Grasping Strategy Based on Deep Reinforcement Learning
11
作者 Dongjun He 《计算机科学与技术汇刊(中英文版)》 2025年第2期1-7,共7页
In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role ... In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role in enhancing the effectiveness,efficiency,and reliability of robotic systems.This paper presents a novel approach to optimizing robotic arm grasping strategies based on deep reinforcement learning(DRL).Through the utilization of advanced DRL algorithms,such as Q-Learning,Deep Q-Networks(DQN),Policy Gradient Methods,and Proximal Policy Optimization(PPO),the study aims to improve the performance of robotic arms in grasping objects with varying shapes,sizes,and environmental conditions.The paper provides a detailed analysis of the various deep reinforcement learning methods used for grasping strategy optimization,emphasizing the strengths and weaknesses of each algorithm.It also presents a comprehensive framework for training the DRL models,including simulation environment setup,the optimization process,and the evaluation metrics for grasping success.The results demonstrate that the proposed approach significantly enhances the accuracy and stability of the robotic arm in performing grasping tasks.The study further explores the challenges in training deep reinforcement learning models for real-time robotic applications and offers solutions for improving the efficiency and reliability of grasping strategies. 展开更多
关键词 Robotic Arm Grasping Strategy deep reinforcement learning Q-learning DQN Policy Gradient PPO OPTIMIZATION Simulation Robotics
在线阅读 下载PDF
Priority-Based Scheduling and Orchestration in Edge-Cloud Computing:A Deep Reinforcement Learning-Enhanced Concurrency Control Approach
12
作者 Mohammad A Al Khaldy Ahmad Nabot +4 位作者 Ahmad Al-Qerem Mohammad Alauthman Amina Salhi Suhaila Abuowaida Naceur Chihaoui 《Computer Modeling in Engineering & Sciences》 2025年第10期673-697,共25页
The exponential growth of Internet of Things(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet... The exponential growth of Internet of Things(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees. 展开更多
关键词 Edge computing cloud computing scheduling algorithms orchestration strategies deep reinforcement learning concurrency control real-time systems IoT
在线阅读 下载PDF
Bearings-Only Target Motion Analysis via Deep Reinforcement Learning
13
作者 Chengyi Zhou Meiqin Liu +2 位作者 Senlin Zhang Ronghao Zheng Shanling Dong 《IEEE/CAA Journal of Automatica Sinica》 2025年第6期1298-1300,共3页
Dear Editor,This letter introduces a novel approach to address the bearings-only target motion analysis(BO-TMA)problem by incorporating deep reinforcement learning(DRL)techniques.Conventional methods often exhibit bia... Dear Editor,This letter introduces a novel approach to address the bearings-only target motion analysis(BO-TMA)problem by incorporating deep reinforcement learning(DRL)techniques.Conventional methods often exhibit biases and struggle to achieve accurate results,especially when confronted with high levels of noise.In this letter,we formulate the BO-TMA problem as a Markov decision process(MDP)and process it within a DRL framework.Simulation results demonstrate that the proposed DRL-based estimator achieves reduced bias and lower errors compared to existing estimators. 展开更多
关键词 deep reinforcement ESTIMATOR Markov decision process errors BIAS deep reinforcement learning markov decision process mdp bearings only target motion analysis
在线阅读 下载PDF
Deep Reinforcement Learning for Zero-Shot Coverage Path Planning With Mobile Robots
14
作者 JoséPedro Carvalho A.Pedro Aguiar 《IEEE/CAA Journal of Automatica Sinica》 2025年第8期1594-1609,共16页
The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,thes... The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,these often struggle with flexibility and adaptability in unknown environments.On the other hand,recent advances in Reinforcement Learning offer promising approaches,yet a significant gap in the literature remains when it comes to generalization over a large number of parameters.This paper presents a unified,generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques.The novelty of the framework comes from the design of an observation space that accommodates different map sizes,an action masking scheme that guarantees safety and robustness while also serving as a learning-fromdemonstration technique during training,and a unique reward function that yields value functions that are size-invariant.These are coupled with a curriculum learning-based training strategy and parametric environment randomization,enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes,configurations,sensor payloads,and sub-tasks.Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training,outperforming a greedy heuristic by sixfold.Furthermore,in out-of-distribution environments,our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios,paving the way for generalizable and adaptable path-planning algorithms. 展开更多
关键词 Autonomous robots coverage path planning deep reinforcement learning mobile robot partially observable markov decision processes path planning zero-shot generalization
在线阅读 下载PDF
A deep reinforcement learning framework and its implementation for UAV-aided covert communication
15
作者 Shu FU Yi SU +1 位作者 Zhi ZHANG Liuguo YIN 《Chinese Journal of Aeronautics》 2025年第2期403-417,共15页
In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Comm... In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Communication Node(PCN).Specifically,all nodes transmit to the UAV exploiting uplink non-Orthogonal Multiple Access(NOMA),while the UAV performs covert transmission to the PCN at the same frequency.To minimize the average age of covert information,we formulate a joint optimization problem of UAV trajectory and power allocation designing subject to multi-dimensional constraints including covertness demand,communication quality requirement,maximum flying speed,and the maximum available resources.To address this problem,we embed Signomial Programming(SP)into Deep Reinforcement Learning(DRL)and propose a DRL framework capable of handling the constrained Markov decision processes,named SP embedded Soft Actor-Critic(SSAC).By adopting SSAC,we achieve the joint optimization of UAV trajectory and power allocation.Our simulations show the optimized UAV trajectory and verify the superiority of SSAC compared with various existing baseline schemes.The results of this study suggest that by maintaining appropriate distances from both the PCN and CNs,one can effectively enhance the performance of covert communication by reducing the detection probability of the CNs. 展开更多
关键词 Covert communication Unmanned aerial vehicle deep reinforcement learning Trajectory planning Power allocation Communication systems
原文传递
Deep reinforcement learning based latency-energy minimization in smart healthcare network
16
作者 Xin Su Xin Fang +2 位作者 Zhen Cheng Ziyang Gong Chang Choi 《Digital Communications and Networks》 2025年第3期795-805,共11页
Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mob... Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mobile Edge Computing(MEC)is considered as an efficient solution to provide powerful computing capabilities to latency or energy sensitive nodes.The low-latency and high-reliability requirements of healthcare application services can be met through optimal offloading and resource allocation for the computational tasks of the nodes.In this study,we established a system model consisting of two types of nodes by considering nondivisible and trade-off computational tasks between latency and energy consumption.To minimize processing cost of the system tasks,a Mixed-Integer Nonlinear Programming(MINLP)task offloading problem is proposed.Furthermore,this problem is decomposed into task offloading decisions and resource allocation problems.The resource allocation problem is solved using traditional optimization algorithms,and the offloading decision problem is solved using a deep reinforcement learning algorithm.We propose an Online Offloading based on the Deep Reinforcement Learning(OO-DRL)algorithm with parallel deep neural networks and a weightsensitive experience replay mechanism.Simulation results show that,compared with several existing methods,our proposed algorithm can perform real-time task offloading in a smart healthcare network in dynamically varying environments and reduce the system task processing cost. 展开更多
关键词 Smart healthcare network Mobile edge computing Resource allocation Computation offloading deep reinforcement learning
在线阅读 下载PDF
Intelligent Scheduling of Virtual Power Plants Based on Deep Reinforcement Learning
17
作者 Shaowei He Wenchao Cui +3 位作者 Gang Li Hairun Xu Xiang Chen Yu Tai 《Computers, Materials & Continua》 2025年第7期861-886,共26页
The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant ... The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants. 展开更多
关键词 deep reinforcement learning deep q-network virtual power plant lntelligent scheduling markov decision process
在线阅读 下载PDF
Combining deep reinforcement learning with heuristics to solve the traveling salesman problem
18
作者 Li Hong Yu Liu +1 位作者 Mengqiao Xu Wenhui Deng 《Chinese Physics B》 2025年第1期96-106,共11页
Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs... Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%. 展开更多
关键词 traveling salesman problem deep reinforcement learning simulated annealing algorithm transformer model whale optimization algorithm
原文传递
A Two-Layer UAV Cooperative Computing Offloading Strategy Based on Deep Reinforcement Learning
19
作者 Zhang Jianfei Wang Zhen +1 位作者 Hu Yun Chang Zheng 《China Communications》 2025年第10期251-268,共18页
In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapi... In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy. 展开更多
关键词 cooperative computational offloading deep reinforcement learning mobile edge computing prioritized experience replay two-layer unmanned aerial vehicles
在线阅读 下载PDF
Simultaneous Depth and Heading Control for Autonomous Underwater Vehicle Docking Maneuvers Using Deep Reinforcement Learning within a Digital Twin System
20
作者 Yu-Hsien Lin Po-Cheng Chuang Joyce Yi-Tzu Huang 《Computers, Materials & Continua》 2025年第9期4907-4948,共42页
This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion... This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks. 展开更多
关键词 Autonomous underwater vehicle docking maneuver digital twin deep reinforcement learning twin delayed deep deterministic policy gradient
在线阅读 下载PDF
上一页 1 2 22 下一页 到第
使用帮助 返回顶部