Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrain...Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.展开更多
At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown ...At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.展开更多
Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision...Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.展开更多
As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,t...As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios.展开更多
Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressin...Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results.展开更多
With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier...With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.展开更多
Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)meth...Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics.展开更多
The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability an...The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.展开更多
Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision u...Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.展开更多
This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent...This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent policy networks.The graph of welding spots distribution is encoded using the graph attention network.Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks.The policy network is used to convert the large scale welding spots allocation problem to multiple small scale singlerobot welding path planning problems,and the path planning problem is quickly solved through existing methods.Then,the model is trained through reinforcement learning.In addition,the task balancing method is used to allocate tasks to multiple stations.The proposed algorithm is compared with classical algorithms,and the results show that the algorithm based on DRL can produce higher quality solutions.展开更多
In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role ...In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role in enhancing the effectiveness,efficiency,and reliability of robotic systems.This paper presents a novel approach to optimizing robotic arm grasping strategies based on deep reinforcement learning(DRL).Through the utilization of advanced DRL algorithms,such as Q-Learning,Deep Q-Networks(DQN),Policy Gradient Methods,and Proximal Policy Optimization(PPO),the study aims to improve the performance of robotic arms in grasping objects with varying shapes,sizes,and environmental conditions.The paper provides a detailed analysis of the various deep reinforcement learning methods used for grasping strategy optimization,emphasizing the strengths and weaknesses of each algorithm.It also presents a comprehensive framework for training the DRL models,including simulation environment setup,the optimization process,and the evaluation metrics for grasping success.The results demonstrate that the proposed approach significantly enhances the accuracy and stability of the robotic arm in performing grasping tasks.The study further explores the challenges in training deep reinforcement learning models for real-time robotic applications and offers solutions for improving the efficiency and reliability of grasping strategies.展开更多
The exponential growth of Internet of Things(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet...The exponential growth of Internet of Things(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees.展开更多
Dear Editor,This letter introduces a novel approach to address the bearings-only target motion analysis(BO-TMA)problem by incorporating deep reinforcement learning(DRL)techniques.Conventional methods often exhibit bia...Dear Editor,This letter introduces a novel approach to address the bearings-only target motion analysis(BO-TMA)problem by incorporating deep reinforcement learning(DRL)techniques.Conventional methods often exhibit biases and struggle to achieve accurate results,especially when confronted with high levels of noise.In this letter,we formulate the BO-TMA problem as a Markov decision process(MDP)and process it within a DRL framework.Simulation results demonstrate that the proposed DRL-based estimator achieves reduced bias and lower errors compared to existing estimators.展开更多
The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,thes...The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,these often struggle with flexibility and adaptability in unknown environments.On the other hand,recent advances in Reinforcement Learning offer promising approaches,yet a significant gap in the literature remains when it comes to generalization over a large number of parameters.This paper presents a unified,generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques.The novelty of the framework comes from the design of an observation space that accommodates different map sizes,an action masking scheme that guarantees safety and robustness while also serving as a learning-fromdemonstration technique during training,and a unique reward function that yields value functions that are size-invariant.These are coupled with a curriculum learning-based training strategy and parametric environment randomization,enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes,configurations,sensor payloads,and sub-tasks.Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training,outperforming a greedy heuristic by sixfold.Furthermore,in out-of-distribution environments,our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios,paving the way for generalizable and adaptable path-planning algorithms.展开更多
In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Comm...In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Communication Node(PCN).Specifically,all nodes transmit to the UAV exploiting uplink non-Orthogonal Multiple Access(NOMA),while the UAV performs covert transmission to the PCN at the same frequency.To minimize the average age of covert information,we formulate a joint optimization problem of UAV trajectory and power allocation designing subject to multi-dimensional constraints including covertness demand,communication quality requirement,maximum flying speed,and the maximum available resources.To address this problem,we embed Signomial Programming(SP)into Deep Reinforcement Learning(DRL)and propose a DRL framework capable of handling the constrained Markov decision processes,named SP embedded Soft Actor-Critic(SSAC).By adopting SSAC,we achieve the joint optimization of UAV trajectory and power allocation.Our simulations show the optimized UAV trajectory and verify the superiority of SSAC compared with various existing baseline schemes.The results of this study suggest that by maintaining appropriate distances from both the PCN and CNs,one can effectively enhance the performance of covert communication by reducing the detection probability of the CNs.展开更多
Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mob...Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mobile Edge Computing(MEC)is considered as an efficient solution to provide powerful computing capabilities to latency or energy sensitive nodes.The low-latency and high-reliability requirements of healthcare application services can be met through optimal offloading and resource allocation for the computational tasks of the nodes.In this study,we established a system model consisting of two types of nodes by considering nondivisible and trade-off computational tasks between latency and energy consumption.To minimize processing cost of the system tasks,a Mixed-Integer Nonlinear Programming(MINLP)task offloading problem is proposed.Furthermore,this problem is decomposed into task offloading decisions and resource allocation problems.The resource allocation problem is solved using traditional optimization algorithms,and the offloading decision problem is solved using a deep reinforcement learning algorithm.We propose an Online Offloading based on the Deep Reinforcement Learning(OO-DRL)algorithm with parallel deep neural networks and a weightsensitive experience replay mechanism.Simulation results show that,compared with several existing methods,our proposed algorithm can perform real-time task offloading in a smart healthcare network in dynamically varying environments and reduce the system task processing cost.展开更多
The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant ...The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants.展开更多
Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs...Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%.展开更多
In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapi...In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy.展开更多
This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion...This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks.展开更多
基金supported by Key Science and Technology Program of Henan Province,China(Grant Nos.242102210147,242102210027)Fujian Province Young and Middle aged Teacher Education Research Project(Science and Technology Category)(No.JZ240101)(Corresponding author:Dong Yuan).
文摘Vehicle Edge Computing(VEC)and Cloud Computing(CC)significantly enhance the processing efficiency of delay-sensitive and computation-intensive applications by offloading compute-intensive tasks from resource-constrained onboard devices to nearby Roadside Unit(RSU),thereby achieving lower delay and energy consumption.However,due to the limited storage capacity and energy budget of RSUs,it is challenging to meet the demands of the highly dynamic Internet of Vehicles(IoV)environment.Therefore,determining reasonable service caching and computation offloading strategies is crucial.To address this,this paper proposes a joint service caching scheme for cloud-edge collaborative IoV computation offloading.By modeling the dynamic optimization problem using Markov Decision Processes(MDP),the scheme jointly optimizes task delay,energy consumption,load balancing,and privacy entropy to achieve better quality of service.Additionally,a dynamic adaptive multi-objective deep reinforcement learning algorithm is proposed.Each Double Deep Q-Network(DDQN)agent obtains rewards for different objectives based on distinct reward functions and dynamically updates the objective weights by learning the value changes between objectives using Radial Basis Function Networks(RBFN),thereby efficiently approximating the Pareto-optimal decisions for multiple objectives.Extensive experiments demonstrate that the proposed algorithm can better coordinate the three-tier computing resources of cloud,edge,and vehicles.Compared to existing algorithms,the proposed method reduces task delay and energy consumption by 10.64%and 5.1%,respectively.
文摘At present,energy consumption is one of the main bottlenecks in autonomous mobile robot development.To address the challenge of high energy consumption in path planning for autonomous mobile robots navigating unknown and complex environments,this paper proposes an Attention-Enhanced Dueling Deep Q-Network(ADDueling DQN),which integrates a multi-head attention mechanism and a prioritized experience replay strategy into a Dueling-DQN reinforcement learning framework.A multi-objective reward function,centered on energy efficiency,is designed to comprehensively consider path length,terrain slope,motion smoothness,and obstacle avoidance,enabling optimal low-energy trajectory generation in 3D space from the source.The incorporation of a multihead attention mechanism allows the model to dynamically focus on energy-critical state features—such as slope gradients and obstacle density—thereby significantly improving its ability to recognize and avoid energy-intensive paths.Additionally,the prioritized experience replay mechanism accelerates learning from key decision-making experiences,suppressing inefficient exploration and guiding the policy toward low-energy solutions more rapidly.The effectiveness of the proposed path planning algorithm is validated through simulation experiments conducted in multiple off-road scenarios.Results demonstrate that AD-Dueling DQN consistently achieves the lowest average energy consumption across all tested environments.Moreover,the proposed method exhibits faster convergence and greater training stability compared to baseline algorithms,highlighting its global optimization capability under energy-aware objectives in complex terrains.This study offers an efficient and scalable intelligent control strategy for the development of energy-conscious autonomous navigation systems.
基金funded by the Beijing Engineering Research Center of Electric Rail Transportation.
文摘Effective partitioning is crucial for enabling parallel restoration of power systems after blackouts.This paper proposes a novel partitioning method based on deep reinforcement learning.First,the partitioning decision process is formulated as a Markov decision process(MDP)model to maximize the modularity.Corresponding key partitioning constraints on parallel restoration are considered.Second,based on the partitioning objective and constraints,the reward function of the partitioning MDP model is set by adopting a relative deviation normalization scheme to reduce mutual interference between the reward and penalty in the reward function.The soft bonus scaling mechanism is introduced to mitigate overestimation caused by abrupt jumps in the reward.Then,the deep Q network method is applied to solve the partitioning MDP model and generate partitioning schemes.Two experience replay buffers are employed to speed up the training process of the method.Finally,case studies on the IEEE 39-bus test system demonstrate that the proposed method can generate a high-modularity partitioning result that meets all key partitioning constraints,thereby improving the parallelism and reliability of the restoration process.Moreover,simulation results demonstrate that an appropriate discount factor is crucial for ensuring both the convergence speed and the stability of the partitioning training.
基金supported by the National Key Research and Development Program of China(No.2022YFB4300902).
文摘As joint operations have become a key trend in modern military development,unmanned aerial vehicles(UAVs)play an increasingly important role in enhancing the intelligence and responsiveness of combat systems.However,the heterogeneity of aircraft,partial observability,and dynamic uncertainty in operational airspace pose significant challenges to autonomous collision avoidance using traditional methods.To address these issues,this paper proposes an adaptive collision avoidance approach for UAVs based on deep reinforcement learning.First,a unified uncertainty model incorporating dynamic wind fields is constructed to capture the complexity of joint operational environments.Then,to effectively handle the heterogeneity between manned and unmanned aircraft and the limitations of dynamic observations,a sector-based partial observation mechanism is designed.A Dynamic Threat Prioritization Assessment algorithm is also proposed to evaluate potential collision threats from multiple dimensions,including time to closest approach,minimum separation distance,and aircraft type.Furthermore,a Hierarchical Prioritized Experience Replay(HPER)mechanism is introduced,which classifies experience samples into high,medium,and low priority levels to preferentially sample critical experiences,thereby improving learning efficiency and accelerating policy convergence.Simulation results show that the proposed HPER-D3QN algorithm outperforms existing methods in terms of learning speed,environmental adaptability,and robustness,significantly enhancing collision avoidance performance and convergence rate.Finally,transfer experiments on a high-fidelity battlefield airspace simulation platform validate the proposed method's deployment potential and practical applicability in complex,real-world joint operational scenarios.
基金the Collaborative Innovation Project of Shanghai,China for the financial support。
文摘Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results.
文摘With the advent of sixth-generation mobile communications(6G),space-air-ground integrated networks have become mainstream.This paper focuses on collaborative scheduling for mobile edge computing(MEC)under a three-tier heterogeneous architecture composed of mobile devices,unmanned aerial vehicles(UAVs),and macro base stations(BSs).This scenario typically faces fast channel fading,dynamic computational loads,and energy constraints,whereas classical queuing-theoretic or convex-optimization approaches struggle to yield robust solutions in highly dynamic settings.To address this issue,we formulate a multi-agent Markov decision process(MDP)for an air-ground-fused MEC system,unify link selection,bandwidth/power allocation,and task offloading into a continuous action space and propose a joint scheduling strategy that is based on an improved MATD3 algorithm.The improvements include Alternating Layer Normalization(ALN)in the actor to suppress gradient variance,Residual Orthogonalization(RO)in the critic to reduce the correlation between the twin Q-value estimates,and a dynamic-temperature reward to enable adaptive trade-offs during training.On a multi-user,dual-link simulation platform,we conduct ablation and baseline comparisons.The results reveal that the proposed method has better convergence and stability.Compared with MADDPG,TD3,and DSAC,our algorithm achieves more robust performance across key metrics.
基金supported by theHubei Provincial Technology Innovation Special Project and the Natural Science Foundation of Hubei Province under Grants 2023BEB024,2024AFC066,respectively.
文摘Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics.
基金supported by National Natural Science Foundation of China under Grant No.62372110Fujian Provincial Natural Science of Foundation under Grants 2023J02008,2024H0009.
文摘The rapid advancement of Industry 4.0 has revolutionized manufacturing,shifting production from centralized control to decentralized,intelligent systems.Smart factories are now expected to achieve high adaptability and resource efficiency,particularly in mass customization scenarios where production schedules must accommodate dynamic and personalized demands.To address the challenges of dynamic task allocation,uncertainty,and realtime decision-making,this paper proposes Pathfinder,a deep reinforcement learning-based scheduling framework.Pathfinder models scheduling data through three key matrices:execution time(the time required for a job to complete),completion time(the actual time at which a job is finished),and efficiency(the performance of executing a single job).By leveraging neural networks,Pathfinder extracts essential features from these matrices,enabling intelligent decision-making in dynamic production environments.Unlike traditional approaches with fixed scheduling rules,Pathfinder dynamically selects from ten diverse scheduling rules,optimizing decisions based on real-time environmental conditions.To further enhance scheduling efficiency,a specialized reward function is designed to support dynamic task allocation and real-time adjustments.This function helps Pathfinder continuously refine its scheduling strategy,improving machine utilization and minimizing job completion times.Through reinforcement learning,Pathfinder adapts to evolving production demands,ensuring robust performance in real-world applications.Experimental results demonstrate that Pathfinder outperforms traditional scheduling approaches,offering improved coordination and efficiency in smart factories.By integrating deep reinforcement learning,adaptable scheduling strategies,and an innovative reward function,Pathfinder provides an effective solution to the growing challenges of multi-robot job scheduling in mass customization environments.
基金co-supported by the National Natural Science Foundation of China(No.62103432)the China Postdoctoral Science Foundation(No.284881)the Young Talent fund of University Association for Science and Technology in Shaanxi,China(No.20210108)。
文摘Exo-atmospheric vehicles are constrained by limited maneuverability,which leads to the contradiction between evasive maneuver and precision strike.To address the problem of Integrated Evasion and Impact(IEI)decision under multi-constraint conditions,a hierarchical intelligent decision-making method based on Deep Reinforcement Learning(DRL)was proposed.First,an intelligent decision-making framework of“DRL evasion decision”+“impact prediction guidance decision”was established:it takes the impact point deviation correction ability as the constraint and the maximum miss distance as the objective,and effectively solves the problem of poor decisionmaking effect caused by the large IEI decision space.Second,to solve the sparse reward problem faced by evasion decision-making,a hierarchical decision-making method consisting of maneuver timing decision and maneuver duration decision was proposed,and the corresponding Markov Decision Process(MDP)was designed.A detailed simulation experiment was designed to analyze the advantages and computational complexity of the proposed method.Simulation results show that the proposed model has good performance and low computational resource requirement.The minimum miss distance is 21.3 m under the condition of guaranteeing the impact point accuracy,and the single decision-making time is 4.086 ms on an STM32F407 single-chip microcomputer,which has engineering application value.
基金National Key Research and Development Program of China,Grant/Award Number:2021YFB1714700Postdoctoral Research Foundation of China,Grant/Award Number:2024M752364Postdoctoral Fellowship Program of CPSF,Grant/Award Number:GZB20240525。
文摘This paper focuses on the problem of multi-station multi-robot spot welding task assignment,and proposes a deep reinforcement learning(DRL)framework,which is made up of a public graph attention network and independent policy networks.The graph of welding spots distribution is encoded using the graph attention network.Independent policy networks with attention mechanism as a decoder can handle the encoded graph and decide to assign robots to different tasks.The policy network is used to convert the large scale welding spots allocation problem to multiple small scale singlerobot welding path planning problems,and the path planning problem is quickly solved through existing methods.Then,the model is trained through reinforcement learning.In addition,the task balancing method is used to allocate tasks to multiple stations.The proposed algorithm is compared with classical algorithms,and the results show that the algorithm based on DRL can produce higher quality solutions.
文摘In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role in enhancing the effectiveness,efficiency,and reliability of robotic systems.This paper presents a novel approach to optimizing robotic arm grasping strategies based on deep reinforcement learning(DRL).Through the utilization of advanced DRL algorithms,such as Q-Learning,Deep Q-Networks(DQN),Policy Gradient Methods,and Proximal Policy Optimization(PPO),the study aims to improve the performance of robotic arms in grasping objects with varying shapes,sizes,and environmental conditions.The paper provides a detailed analysis of the various deep reinforcement learning methods used for grasping strategy optimization,emphasizing the strengths and weaknesses of each algorithm.It also presents a comprehensive framework for training the DRL models,including simulation environment setup,the optimization process,and the evaluation metrics for grasping success.The results demonstrate that the proposed approach significantly enhances the accuracy and stability of the robotic arm in performing grasping tasks.The study further explores the challenges in training deep reinforcement learning models for real-time robotic applications and offers solutions for improving the efficiency and reliability of grasping strategies.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R909),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The exponential growth of Internet of Things(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees.
基金supported by the Zhejiang Provincial Natural Science Foundation of China(LZ23F030006)the National Natural Science Foundation of China(62173299,U23B2060)+1 种基金the Joint Fund of Ministry of Education for Pre-Research of Equipment(8091B022147,8091B032234,8091B042220)the Fundamental Research Funds for Xi’an Jiaotong University(xtr072022001).
文摘Dear Editor,This letter introduces a novel approach to address the bearings-only target motion analysis(BO-TMA)problem by incorporating deep reinforcement learning(DRL)techniques.Conventional methods often exhibit biases and struggle to achieve accurate results,especially when confronted with high levels of noise.In this letter,we formulate the BO-TMA problem as a Markov decision process(MDP)and process it within a DRL framework.Simulation results demonstrate that the proposed DRL-based estimator achieves reduced bias and lower errors compared to existing estimators.
基金supported by project RELIABLE(PTDC/EEI-AUT/3522/2020)R&D Unit SYSTEC-Base(UIDB001472020)+1 种基金Programmatic(UIDP001472020)funds-and Associate Laboratory Advanced Production and Intelligent Systems ARISE-LAP01122020funded by national funds through the FCT/MCTES(PIDDAC).
文摘The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges,particularly Coverage Path Planning.While this task has been typically tackled with classical algorithms,these often struggle with flexibility and adaptability in unknown environments.On the other hand,recent advances in Reinforcement Learning offer promising approaches,yet a significant gap in the literature remains when it comes to generalization over a large number of parameters.This paper presents a unified,generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques.The novelty of the framework comes from the design of an observation space that accommodates different map sizes,an action masking scheme that guarantees safety and robustness while also serving as a learning-fromdemonstration technique during training,and a unique reward function that yields value functions that are size-invariant.These are coupled with a curriculum learning-based training strategy and parametric environment randomization,enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes,configurations,sensor payloads,and sub-tasks.Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training,outperforming a greedy heuristic by sixfold.Furthermore,in out-of-distribution environments,our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios,paving the way for generalizable and adaptable path-planning algorithms.
基金This study was co-supported by the National Natural Science Foundation of China(No.62025110&62271093)the Natural Science Foundation of Chongqing,China(No.CSTB2023NSCQ-LZX0108).
文摘In this work,we consider an Unmanned Aerial Vehicle(UAV)-aided covert transmission network,which adopts the uplink transmission of Communication Nodes(CNs)as a cover to facilitate covert transmission to a Primary Communication Node(PCN).Specifically,all nodes transmit to the UAV exploiting uplink non-Orthogonal Multiple Access(NOMA),while the UAV performs covert transmission to the PCN at the same frequency.To minimize the average age of covert information,we formulate a joint optimization problem of UAV trajectory and power allocation designing subject to multi-dimensional constraints including covertness demand,communication quality requirement,maximum flying speed,and the maximum available resources.To address this problem,we embed Signomial Programming(SP)into Deep Reinforcement Learning(DRL)and propose a DRL framework capable of handling the constrained Markov decision processes,named SP embedded Soft Actor-Critic(SSAC).By adopting SSAC,we achieve the joint optimization of UAV trajectory and power allocation.Our simulations show the optimized UAV trajectory and verify the superiority of SSAC compared with various existing baseline schemes.The results of this study suggest that by maintaining appropriate distances from both the PCN and CNs,one can effectively enhance the performance of covert communication by reducing the detection probability of the CNs.
基金supported in part by the National Natural Science Foundation of China under Grant 62371181in part by the Changzhou Science and Technology International Cooperation Program under Grant CZ20230029+1 种基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government.(MSIT)(2021R1A2B5B02087169)supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Center)support program(RS-202300259004)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)。
文摘Significant breakthroughs in the Internet of Things(IoT)and 5G technologies have driven several smart healthcare activities,leading to a flood of computationally intensive applications in smart healthcare networks.Mobile Edge Computing(MEC)is considered as an efficient solution to provide powerful computing capabilities to latency or energy sensitive nodes.The low-latency and high-reliability requirements of healthcare application services can be met through optimal offloading and resource allocation for the computational tasks of the nodes.In this study,we established a system model consisting of two types of nodes by considering nondivisible and trade-off computational tasks between latency and energy consumption.To minimize processing cost of the system tasks,a Mixed-Integer Nonlinear Programming(MINLP)task offloading problem is proposed.Furthermore,this problem is decomposed into task offloading decisions and resource allocation problems.The resource allocation problem is solved using traditional optimization algorithms,and the offloading decision problem is solved using a deep reinforcement learning algorithm.We propose an Online Offloading based on the Deep Reinforcement Learning(OO-DRL)algorithm with parallel deep neural networks and a weightsensitive experience replay mechanism.Simulation results show that,compared with several existing methods,our proposed algorithm can perform real-time task offloading in a smart healthcare network in dynamically varying environments and reduce the system task processing cost.
基金supported by the National Key Research and Development Program of China,Grant No.2020YFB0905900.
文摘The Virtual Power Plant(VPP),as an innovative power management architecture,achieves flexible dispatch and resource optimization of power systems by integrating distributed energy resources.However,due to significant differences in operational costs and flexibility of various types of generation resources,as well as the volatility and uncertainty of renewable energy sources(such as wind and solar power)and the complex variability of load demand,the scheduling optimization of virtual power plants has become a critical issue that needs to be addressed.To solve this,this paper proposes an intelligent scheduling method for virtual power plants based on Deep Reinforcement Learning(DRL),utilizing Deep Q-Networks(DQN)for real-time optimization scheduling of dynamic peaking unit(DPU)and stable baseload unit(SBU)in the virtual power plant.By modeling the scheduling problem as a Markov Decision Process(MDP)and designing an optimization objective function that integrates both performance and cost,the scheduling efficiency and economic performance of the virtual power plant are significantly improved.Simulation results show that,compared with traditional scheduling methods and other deep reinforcement learning algorithms,the proposed method demonstrates significant advantages in key performance indicators:response time is shortened by up to 34%,task success rate is increased by up to 46%,and costs are reduced by approximately 26%.Experimental results verify the efficiency and scalability of the method under complex load environments and the volatility of renewable energy,providing strong technical support for the intelligent scheduling of virtual power plants.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.72101046 and 61672128)。
文摘Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%.
基金supported by the Basic Scientific Research Business Fund Project of Higher Education Institutions in Heilongjiang Province(145409601)the First Batch of Experimental Teaching and Teaching Laboratory Construction Research Projects in Heilongjiang Province(SJGZ20240038).
文摘In the wake of major natural disasters or human-made disasters,the communication infrastruc-ture within disaster-stricken areas is frequently dam-aged.Unmanned aerial vehicles(UAVs),thanks to their merits such as rapid deployment and high mobil-ity,are commonly regarded as an ideal option for con-structing temporary communication networks.Con-sidering the limited computing capability and battery power of UAVs,this paper proposes a two-layer UAV cooperative computing offloading strategy for emer-gency disaster relief scenarios.The multi-agent twin delayed deep deterministic policy gradient(MATD3)algorithm integrated with prioritized experience replay(PER)is utilized to jointly optimize the scheduling strategies of UAVs,task offloading ratios,and their mobility,aiming to diminish the energy consumption and delay of the system to the minimum.In order to address the aforementioned non-convex optimiza-tion issue,a Markov decision process(MDP)has been established.The results of simulation experiments demonstrate that,compared with the other four base-line algorithms,the algorithm introduced in this paper exhibits better convergence performance,verifying its feasibility and efficacy.
基金supported by the National Science and Technology Council,Taiwan[Grant NSTC 111-2628-E-006-005-MY3]supported by the Ocean Affairs Council,Taiwansponsored in part by Higher Education Sprout Project,Ministry of Education to the Headquarters of University Advancement at National Cheng Kung University(NCKU).
文摘This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks.