期刊文献+
共找到640篇文章
< 1 2 32 >
每页显示 20 50 100
AquaTree:Deep Reinforcement Learning-Driven Monte Carlo Tree Search for Underwater Image Enhancement
1
作者 Chao Li Jianing Wang +1 位作者 Caichang Ding Zhiwei Ye 《Computers, Materials & Continua》 2026年第3期1444-1464,共21页
Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)meth... Underwater images frequently suffer from chromatic distortion,blurred details,and low contrast,posing significant challenges for enhancement.This paper introduces AquaTree,a novel underwater image enhancement(UIE)method that reformulates the task as a Markov Decision Process(MDP)through the integration of Monte Carlo Tree Search(MCTS)and deep reinforcement learning(DRL).The framework employs an action space of 25 enhancement operators,strategically grouped for basic attribute adjustment,color component balance,correction,and deblurring.Exploration within MCTS is guided by a dual-branch convolutional network,enabling intelligent sequential operator selection.Our core contributions include:(1)a multimodal state representation combining CIELab color histograms with deep perceptual features,(2)a dual-objective reward mechanism optimizing chromatic fidelity and perceptual consistency,and(3)an alternating training strategy co-optimizing enhancement sequences and network parameters.We further propose two inference schemes:an MCTS-based approach prioritizing accuracy at higher computational cost,and an efficient network policy enabling real-time processing with minimal quality loss.Comprehensive evaluations on the UIEB Dataset and Color correction and haze removal comparisons on the U45 Dataset demonstrate AquaTree’s superiority,significantly outperforming nine state-of-the-art methods across five established underwater image quality metrics. 展开更多
关键词 Underwater image enhancement(UIE) Monte Carlo tree search(MCTS) deep reinforcement learning(drl) Markov decision process(MDP)
在线阅读 下载PDF
Deep Synchronization Control of Grid-Forming Converters:A Reinforcement Learning Approach
2
作者 Zhuorui Wu Meng Zhang +2 位作者 Bo Fan Yang Shi Xiaohong Guan 《IEEE/CAA Journal of Automatica Sinica》 2025年第1期273-275,共3页
Dear Editor,This letter proposes a deep synchronization control(DSC) method to synchronize grid-forming converters with power grids. The method involves constructing a novel controller for grid-forming converters base... Dear Editor,This letter proposes a deep synchronization control(DSC) method to synchronize grid-forming converters with power grids. The method involves constructing a novel controller for grid-forming converters based on the stable deep dynamics model. To enhance the performance of the controller, the dynamics model is optimized within the deep reinforcement learning(DRL) framework. Simulation results verify that the proposed method can reduce frequency deviation and improve active power responses. 展开更多
关键词 reduce frequency deviation enhance performance stable deep dynamics model improve active power responses deep reinforcement learning drl dynamics model deep synchronization control dsc deep synchronization control
在线阅读 下载PDF
Combining deep reinforcement learning with heuristics to solve the traveling salesman problem
3
作者 Li Hong Yu Liu +1 位作者 Mengqiao Xu Wenhui Deng 《Chinese Physics B》 2025年第1期96-106,共11页
Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs... Recent studies employing deep learning to solve the traveling salesman problem(TSP)have mainly focused on learning construction heuristics.Such methods can improve TSP solutions,but still depend on additional programs.However,methods that focus on learning improvement heuristics to iteratively refine solutions remain insufficient.Traditional improvement heuristics are guided by a manually designed search strategy and may only achieve limited improvements.This paper proposes a novel framework for learning improvement heuristics,which automatically discovers better improvement policies for heuristics to iteratively solve the TSP.Our framework first designs a new architecture based on a transformer model to make the policy network parameterized,which introduces an action-dropout layer to prevent action selection from overfitting.It then proposes a deep reinforcement learning approach integrating a simulated annealing mechanism(named RL-SA)to learn the pairwise selected policy,aiming to improve the 2-opt algorithm's performance.The RL-SA leverages the whale optimization algorithm to generate initial solutions for better sampling efficiency and uses the Gaussian perturbation strategy to tackle the sparse reward problem of reinforcement learning.The experiment results show that the proposed approach is significantly superior to the state-of-the-art learning-based methods,and further reduces the gap between learning-based methods and highly optimized solvers in the benchmark datasets.Moreover,our pre-trained model M can be applied to guide the SA algorithm(named M-SA(ours)),which performs better than existing deep models in small-,medium-,and large-scale TSPLIB datasets.Additionally,the M-SA(ours)achieves excellent generalization performance in a real-world dataset on global liner shipping routes,with the optimization percentages in distance reduction ranging from3.52%to 17.99%. 展开更多
关键词 traveling salesman problem deep reinforcement learning simulated annealing algorithm transformer model whale optimization algorithm
原文传递
Priority-Based Scheduling and Orchestration in Edge-Cloud Computing:A Deep Reinforcement Learning-Enhanced Concurrency Control Approach
4
作者 Mohammad A Al Khaldy Ahmad Nabot +4 位作者 Ahmad Al-Qerem Mohammad Alauthman Amina Salhi Suhaila Abuowaida Naceur Chihaoui 《Computer Modeling in Engineering & Sciences》 2025年第10期673-697,共25页
The exponential growth of Internet of Things(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet... The exponential growth of Internet of Things(IoT)devices has created unprecedented challenges in data processing and resource management for time-critical applications.Traditional cloud computing paradigms cannot meet the stringent latency requirements of modern IoT systems,while pure edge computing faces resource constraints that limit processing capabilities.This paper addresses these challenges by proposing a novel Deep Reinforcement Learning(DRL)-enhanced priority-based scheduling framework for hybrid edge-cloud computing environments.Our approach integrates adaptive priority assignment with a two-level concurrency control protocol that ensures both optimal performance and data consistency.The framework introduces three key innovations:(1)a DRL-based dynamic priority assignmentmechanism that learns fromsystem behavior,(2)a hybrid concurrency control protocol combining local edge validation with global cloud coordination,and(3)an integrated mathematical model that formalizes sensor-driven transactions across edge-cloud architectures.Extensive simulations across diverse workload scenarios demonstrate significant quantitative improvements:40%latency reduction,25%throughput increase,85%resource utilization(compared to 60%for heuristicmethods),40%reduction in energy consumption(300 vs.500 J per task),and 50%improvement in scalability factor(1.8 vs.1.2 for EDF)compared to state-of-the-art heuristic and meta-heuristic approaches.These results establish the framework as a robust solution for large-scale IoT and autonomous applications requiring real-time processing with consistency guarantees. 展开更多
关键词 Edge computing cloud computing scheduling algorithms orchestration strategies deep reinforcement learning concurrency control real-time systems IoT
在线阅读 下载PDF
Deep reinforcement learning based communication resource allocation driven by radar point cloud for urban air mobility
5
作者 Leyan CHEN Kai LIU +1 位作者 Qiang GAO Zhibo ZHANG 《Chinese Journal of Aeronautics》 2025年第12期404-414,共11页
In the future smart cities,unmanned aerial vehicles(UAVs)or electric vertical take-off and landing aircraft(e VTOL)are widely employed for urban air mobility(UAM).Considering such real-world scenarios,a deep reinforce... In the future smart cities,unmanned aerial vehicles(UAVs)or electric vertical take-off and landing aircraft(e VTOL)are widely employed for urban air mobility(UAM).Considering such real-world scenarios,a deep reinforcement learning based communication resource allocation method is proposed for UAVs to provide communication services for e VTOL swarms to ensure their reliable communication and safe operation.To save energy consumption,UAVs can ride on a moving interaction station(MIS),such as an urban bus.By using UAV trajectory control and communication power allocation,a joint fair optimization problem is formulated to maximize the channel capacity while optimizing radar sensing performance.To address the optimization problem,a Point Cloud based deep Q-network(PCDQN)algorithm is proposed.It contains a point neural network that can determine the action space of the UAV directly originating from the threedimensional(3D)radar point clouds,and a deep reinforcement learning based decision model for deciding the action from action spaces.Simulation results demonstrate that the proposed method exhibits competitive performance compared to the benchmarks. 展开更多
关键词 Urban air mobility(UAM) Unmanned aerial vehicles(UAVs) Electric vertical takeoff and landing aircraft(eVTOL) Radar point cloud Trajectory control Resource allocation deep reinforcement learning(drl)
原文传递
Multi-Robot Task Allocation Using Multimodal Multi-Objective Evolutionary Algorithm Based on Deep Reinforcement Learning 被引量:6
6
作者 苗镇华 黄文焘 +1 位作者 张依恋 范勤勤 《Journal of Shanghai Jiaotong university(Science)》 EI 2024年第3期377-387,共11页
The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multi... The overall performance of multi-robot collaborative systems is significantly affected by the multi-robot task allocation.To improve the effectiveness,robustness,and safety of multi-robot collaborative systems,a multimodal multi-objective evolutionary algorithm based on deep reinforcement learning is proposed in this paper.The improved multimodal multi-objective evolutionary algorithm is used to solve multi-robot task allo-cation problems.Moreover,a deep reinforcement learning strategy is used in the last generation to provide a high-quality path for each assigned robot via an end-to-end manner.Comparisons with three popular multimodal multi-objective evolutionary algorithms on three different scenarios of multi-robot task allocation problems are carried out to verify the performance of the proposed algorithm.The experimental test results show that the proposed algorithm can generate sufficient equivalent schemes to improve the availability and robustness of multi-robot collaborative systems in uncertain environments,and also produce the best scheme to improve the overall task execution efficiency of multi-robot collaborative systems. 展开更多
关键词 multi-robot task allocation multi-robot cooperation path planning multimodal multi-objective evo-lutionary algorithm deep reinforcement learning
原文传递
A dynamic fusion path planning algorithm for mobile robots incorporating improved IB-RRT∗and deep reinforcement learning 被引量:1
7
作者 刘安东 ZHANG Baixin +2 位作者 CUI Qi ZHANG Dan NI Hongjie 《High Technology Letters》 EI CAS 2023年第4期365-376,共12页
Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path pl... Dynamic path planning is crucial for mobile robots to navigate successfully in unstructured envi-ronments.To achieve globally optimal path and real-time dynamic obstacle avoidance during the movement,a dynamic path planning algorithm incorporating improved IB-RRT∗and deep reinforce-ment learning(DRL)is proposed.Firstly,an improved IB-RRT∗algorithm is proposed for global path planning by combining double elliptic subset sampling and probabilistic central circle target bi-as.Then,to tackle the slow response to dynamic obstacles and inadequate obstacle avoidance of tra-ditional local path planning algorithms,deep reinforcement learning is utilized to predict the move-ment trend of dynamic obstacles,leading to a dynamic fusion path planning.Finally,the simulation and experiment results demonstrate that the proposed improved IB-RRT∗algorithm has higher con-vergence speed and search efficiency compared with traditional Bi-RRT∗,Informed-RRT∗,and IB-RRT∗algorithms.Furthermore,the proposed fusion algorithm can effectively perform real-time obsta-cle avoidance and navigation tasks for mobile robots in unstructured environments. 展开更多
关键词 mobile robot improved IB-RRT∗algorithm deep reinforcement learning(drl) real-time dynamic obstacle avoidance
在线阅读 下载PDF
Relevant experience learning:A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments 被引量:24
8
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN Yiwei ZHAI Qianglong WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2021年第12期187-204,共18页
Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a ... Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions. 展开更多
关键词 Autonomous Motion Planning(AMP) deep Deterministic Policy Gradient(DDPG) deep reinforcement learning(drl) Sampling method UAV
原文传递
Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations 被引量:15
9
作者 Dimitri P.Bertsekas 《IEEE/CAA Journal of Automatica Sinica》 EI CSCD 2019年第1期1-31,共31页
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinfor... In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement. 展开更多
关键词 reinforcement learning dynamic programming Markovian DECISION problems AGGREGATION feature-based ARCHITECTURES policy ITERATION deep neural networks rollout algorithms
在线阅读 下载PDF
Airport gate assignment problem with deep reinforcement learning 被引量:3
10
作者 Zhao Jiaming Wu Wenjun +3 位作者 Liu Zhiming Han Changhao Zhang Xuanyi Zhang Yanhua 《High Technology Letters》 EI CAS 2020年第1期102-107,共6页
With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time... With the rapid development of air transportation in recent years,airport operations have attracted a lot of attention.Among them,airport gate assignment problem(AGAP)has become a research hotspot.However,the real-time AGAP algorithm is still an open issue.In this study,a deep reinforcement learning based AGAP(DRL-AGAP)is proposed.The optimization object is to maximize the rate of flights assigned to fixed gates.The real-time AGAP is modeled as a Markov decision process(MDP).The state space,action space,value and rewards have been defined.The DRL-AGAP algorithm is evaluated via simulation and it is compared with the flight pre-assignment results of the optimization software Gurobiand Greedy.Simulation results show that the performance of the proposed DRL-AGAP algorithm is close to that of pre-assignment obtained by the Gurobi optimization solver.Meanwhile,the real-time assignment ability is ensured by the proposed DRL-AGAP algorithm due to the dynamic modeling and lower complexity. 展开更多
关键词 AIRPORT gate ASSIGNMENT problem(AGAP) deep reinforcement learning(drl) MARKOV decision process(MDP)
在线阅读 下载PDF
Active control of flow past an elliptic cylinder using an artificial neural network trained by deep reinforcement learning 被引量:2
11
作者 Bofu WANG Qiang WANG +1 位作者 Quan ZHOU Yulu LIU 《Applied Mathematics and Mechanics(English Edition)》 SCIE EI CSCD 2022年第12期1921-1934,共14页
The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°... The active control of flow past an elliptical cylinder using the deep reinforcement learning(DRL)method is conducted.The axis ratio of the elliptical cylinderΓvaries from 1.2 to 2.0,and four angles of attackα=0°,15°,30°,and 45°are taken into consideration for a fixed Reynolds number Re=100.The mass flow rates of two synthetic jets imposed on different positions of the cylinderθ1andθ2are trained to control the flow.The optimal jet placement that achieves the highest drag reduction is determined for each case.For a low axis ratio ellipse,i.e.,Γ=1.2,the controlled results atα=0°are similar to those for a circular cylinder with control jets applied atθ1=90°andθ2=270°.It is found that either applying the jets asymmetrically or increasing the angle of attack can achieve a higher drag reduction rate,which,however,is accompanied by increased fluctuation.The control jets elongate the vortex shedding,and reduce the pressure drop.Meanwhile,the flow topology is modified at a high angle of attack.For an ellipse with a relatively higher axis ratio,i.e.,Γ1.6,the drag reduction is achieved for all the angles of attack studied.The larger the angle of attack is,the higher the drag reduction ratio is.The increased fluctuation in the drag coefficient under control is encountered,regardless of the position of the control jets.The control jets modify the flow topology by inducing an external vortex near the wall,causing the drag reduction.The results suggest that the DRL can learn an active control strategy for the present configuration. 展开更多
关键词 drag reduction deep reinforcement learning(drl) elliptical cylinder active control
在线阅读 下载PDF
Deep reinforcement learning for UAV swarm rendezvous behavior 被引量:2
12
作者 ZHANG Yaozhong LI Yike +1 位作者 WU Zhuoran XU Jialin 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2023年第2期360-373,共14页
The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the mai... The unmanned aerial vehicle(UAV)swarm technology is one of the research hotspots in recent years.With the continuous improvement of autonomous intelligence of UAV,the swarm technology of UAV will become one of the main trends of UAV development in the future.This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network(DDQN)algorithm.We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning(DRL)for the long period task.We also propose the concept of temporary storage area,optimizing the memory playback unit of the traditional DDQN algorithm,improving the convergence speed of the algorithm,and speeding up the training process of the algorithm.Different from traditional task environment,this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment.Based on the DDQN algorithm,the collaborative tasks of UAV swarm in different task scenarios are trained.The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy,and improving the intelligence of UAV swarm collaborative task execution.The simulation results show that after training,the proposed UAV swarm can carry out the rendezvous task well,and the success rate of the mission reaches 90%. 展开更多
关键词 double deep Q network(DDQN)algorithms unmanned aerial vehicle(UAV)swarm task decision deep reinforcement learning(drl) sparse returns
在线阅读 下载PDF
Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection 被引量:1
13
作者 Fei Ming Wenyin Gong +1 位作者 Ling Wang Yaochu Jin 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第4期919-931,共13页
Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been dev... Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention.Various constrained multi-objective optimization evolutionary algorithms(CMOEAs)have been developed with the use of different algorithmic strategies,evolutionary operators,and constraint-handling techniques.The performance of CMOEAs may be heavily dependent on the operators used,however,it is usually difficult to select suitable operators for the problem at hand.Hence,improving operator selection is promising and necessary for CMOEAs.This work proposes an online operator selection framework assisted by Deep Reinforcement Learning.The dynamics of the population,including convergence,diversity,and feasibility,are regarded as the state;the candidate operators are considered as actions;and the improvement of the population state is treated as the reward.By using a Q-network to learn a policy to estimate the Q-values of all actions,the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance.The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems.The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs. 展开更多
关键词 Constrained multi-objective optimization deep Qlearning deep reinforcement learning(drl) evolutionary algorithms evolutionary operator selection
在线阅读 下载PDF
Obstacle Avoidance in Multi-Agent Formation Process Based on Deep Reinforcement Learning 被引量:1
14
作者 JI Xiukun HAI Jintao +4 位作者 LUO Wenguang LIN Cuixia XIONG Yu OU Zengkai WEN Jiayan 《Journal of Shanghai Jiaotong university(Science)》 EI 2021年第5期680-685,共6页
To solve the problems of difficult control law design,poor portability,and poor stability of traditional multi-agent formation obstacle avoidance algorithms,a multi-agent formation obstacle avoidance method based on d... To solve the problems of difficult control law design,poor portability,and poor stability of traditional multi-agent formation obstacle avoidance algorithms,a multi-agent formation obstacle avoidance method based on deep reinforcement learning(DRL)is proposed.This method combines the perception ability of convolutional neural networks(CNNs)with the decision-making ability of reinforcement learning in a general form and realizes direct output control from the visual perception input of the environment to the action through an end-to-end learning method.The multi-agent system(MAS)model of the follow-leader formation method was designed with the wheelbarrow as the control object.An improved deep Q netwrok(DQN)algorithm(we improved its discount factor and learning efficiency and designed a reward value function that considers the distance relationship between the agent and the obstacle and the coordination factor between the multi-agents)was designed to achieve obstacle avoidance and collision avoidance in the process of multi-agent formation into the desired formation.The simulation results show that the proposed method achieves the expected goal of multi-agent formation obstacle avoidance and has stronger portability compared with the traditional algorithm. 展开更多
关键词 wheelbarrow MULTI-AGENT deep reinforcement learning(drl) FORMATION obstacle avoidance
原文传递
Navigation Method Based on Improved Rapid Exploration Random Tree Star-Smart(RRT^(*)-Smart) and Deep Reinforcement Learning 被引量:2
15
作者 ZHANG Jue LI Xiangjian +3 位作者 LIU Xiaoyan LI Nan YANG Kaiqiang ZHU Heng 《Journal of Donghua University(English Edition)》 CAS 2022年第5期490-495,共6页
A large number of logistics operations are needed to transport fabric rolls and dye barrels to different positions in printing and dyeing plants, and increasing labor cost is making it difficult for plants to recruit ... A large number of logistics operations are needed to transport fabric rolls and dye barrels to different positions in printing and dyeing plants, and increasing labor cost is making it difficult for plants to recruit workers to complete manual operations. Artificial intelligence and robotics, which are rapidly evolving, offer potential solutions to this problem. In this paper, a navigation method dedicated to solving the issues of the inability to pass smoothly at corners in practice and local obstacle avoidance is presented. In the system, a Gaussian fitting smoothing rapid exploration random tree star-smart(GFS RRT^(*)-Smart) algorithm is proposed for global path planning and enhances the performance when the robot makes a sharp turn around corners. In local obstacle avoidance, a deep reinforcement learning determiner mixed actor critic(MAC) algorithm is used for obstacle avoidance decisions. The navigation system is implemented in a scaled-down simulation factory. 展开更多
关键词 rapid exploration random tree star smart(RRT*-Smart) Gaussian fitting deep reinforcement learning(drl) mixed actor critic(MAC)
在线阅读 下载PDF
Real-time dispatch strategy for microgrid considering source-load uncertainty:a tailored TD3 reinforcement learning approach
16
作者 Shenpeng Xiang Mohan Lin +3 位作者 Zhe Chen Pingliang Zeng Xiangjin Wang Diyang Gong 《Global Energy Interconnection》 2025年第6期905-917,共13页
The integration of large-scale-distributed new energy resources has led to heightened source‒load uncertainty.As energy prosumers,microgrids urgently require enhanced real-time regulation capabilities over controllabl... The integration of large-scale-distributed new energy resources has led to heightened source‒load uncertainty.As energy prosumers,microgrids urgently require enhanced real-time regulation capabilities over controllable resources amid uncertain environments,rendering real-time and rapid decision-making a critical issue.This paper proposes a tailored twin delayed deep deterministic policy gradient(TD3)reinforcement learning algorithm that explicitly accounts for source‒load uncertainty.First,following an expert experience-based methodology,Gaussian process regression was implemented using the radial basis function covariance with historical source and load data.The parameters were adaptively adjusted by maximum likelihood estimation to generate the expected curves of demand and wind‒solar power generation,along with their 95%confidence regions,which were treated as representative uncertainty scenarios.Second,the traditional scheduling model was transformed into a deep reinforcement learning(DRL)environment through a Markov process.To minimize the total operational cost of the microgrid,the tailored TD3 algorithm was applied to formulate rapid intraday scheduling decisions.Finally,simulations were conducted using real historical data from an actual region in Zhejiang province,China,to verify the efficacy of the proposed method.The results demonstrate the potential of the algorithm for achieving economic scheduling for microgrids. 展开更多
关键词 MICROGRID deep reinforcement learning Tailored TD3 algorithm Intraday real-time scheduling Gaussian process regression
在线阅读 下载PDF
DRL-based federated self-supervised learning for task offloading and resource allocation in ISAC-enabled vehicle edge computing
17
作者 Xueying Gu Qiong Wu +3 位作者 Pingyi Fan Nan Cheng Wen Chen Khaled B.Letaief 《Digital Communications and Networks》 2025年第5期1614-1627,共14页
Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably incr... Intelligent Transportation Systems(ITS)leverage Integrated Sensing and Communications(ISAC)to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles(IoV).This integration inevitably increases computing demands,risking real-time system stability.Vehicle Edge Computing(VEC)addresses this by offloading tasks to Road Side Units(RSUs),ensuring timely services.Our previous work,the FLSimCo algorithm,which uses local resources for federated Self-Supervised Learning(SSL),has a limitation:vehicles often can’t complete all iteration tasks.Our improved algorithm offloads partial tasks to RSUs and optimizes energy consumption by adjusting transmission power,CPU frequency,and task assignment ratios,balancing local and RSU-based training.Meanwhile,setting an offloading threshold further prevents inefficiencies.Simulation results show that the enhanced algorithm reduces energy consumption and improves offloading efficiency and accuracy of federated SSL. 展开更多
关键词 Integrated sensing and communications(ISAC) Federated self-supervised learning Resource allocation and offloading deep reinforcement learning(drl) Vehicle edge computing(VEC)
在线阅读 下载PDF
Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems
18
作者 Min Yang Guanjun Liu +1 位作者 Ziyuan Zhou Jiacun Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2024年第11期2327-2339,共13页
Deep reinforcement learning(DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However, due to the model's inherent uncertainty... Deep reinforcement learning(DRL) has demonstrated significant potential in industrial manufacturing domains such as workshop scheduling and energy system management.However, due to the model's inherent uncertainty, rigorous validation is requisite for its application in real-world tasks. Specific tests may reveal inadequacies in the performance of pre-trained DRL models, while the “black-box” nature of DRL poses a challenge for testing model behavior. We propose a novel performance improvement framework based on probabilistic automata,which aims to proactively identify and correct critical vulnerabilities of DRL systems, so that the performance of DRL models in real tasks can be improved with minimal model modifications.First, a probabilistic automaton is constructed from the historical trajectory of the DRL system by abstracting the state to generate probabilistic decision-making units(PDMUs), and a reverse breadth-first search(BFS) method is used to identify the key PDMU-action pairs that have the greatest impact on adverse outcomes. This process relies only on the state-action sequence and final result of each trajectory. Then, under the key PDMU, we search for the new action that has the greatest impact on favorable results. Finally, the key PDMU, undesirable action and new action are encapsulated as monitors to guide the DRL system to obtain more favorable results through real-time monitoring and correction mechanisms. Evaluations in two standard reinforcement learning environments and three actual job scheduling scenarios confirmed the effectiveness of the method, providing certain guarantees for the deployment of DRL models in real-world applications. 展开更多
关键词 deep reinforcement learning(drl) performance improvement framework probabilistic automata real-time monitoring the key probabilistic decision-making units(PDMU)-action pair
在线阅读 下载PDF
Optimizing MDS-coded cache-enable wireless network:a blockchain-based cooperative deep reinforcement learning approach
19
作者 Zhang Zheng Yang Ruizhe +2 位作者 Yu Fei Richard Zhang Yanhua Li Meng 《High Technology Letters》 EI CAS 2021年第2期129-138,共10页
Mobile distributed caching(MDC)as an emerging technology has drawn attentions for its ability to shorten the distance between users and data in the wireless network.However,the DC network state in the existing work is... Mobile distributed caching(MDC)as an emerging technology has drawn attentions for its ability to shorten the distance between users and data in the wireless network.However,the DC network state in the existing work is always assumed to be either static or real-time updated.To be more realistic,a periodically updated wireless network using maximum distance separable(MDS)-coded DC is studied,in each period of which the devices may arrive and leave.For the efficient optimization of the system with large scale,this work proposes a blockchain-based cooperative deep reinforcement learning(DRL)approach,which enhances the efficiency of learning by cooperating and guarantees the security in cooperation by the practical Byzantine fault tolerance(PBFT)-based blockchain mechanism.Numerical results are presented,and it illustrates that the proposed scheme can dramatically reduce the total file download delay in DC network under the guarantee of security and efficiency. 展开更多
关键词 caching technology blockchain deep reinforcement learning(drl)
在线阅读 下载PDF
Reliable Scheduling Method for Sensitive Power Business Based on Deep Reinforcement Learning
20
作者 Shen Guo Jiaying Lin +2 位作者 Shuaitao Bai Jichuan Zhang Peng Wang 《Intelligent Automation & Soft Computing》 SCIE 2023年第7期1053-1066,共14页
The main function of the power communication business is to monitor,control and manage the power communication network to ensure normal and stable operation of the power communication network.Commu-nication services r... The main function of the power communication business is to monitor,control and manage the power communication network to ensure normal and stable operation of the power communication network.Commu-nication services related to dispatching data networks and the transmission of fault information or feeder automation have high requirements for delay.If processing time is prolonged,a power business cascade reaction may be triggered.In order to solve the above problems,this paper establishes an edge object-linked agent business deployment model for power communication network to unify the management of data collection,resource allocation and task scheduling within the system,realizes the virtualization of object-linked agent computing resources through Docker container technology,designs the target model of network latency and energy consumption,and introduces A3C algorithm in deep reinforcement learning,improves it according to scene characteristics,and sets corresponding optimization strategies.Mini-mize network delay and energy consumption;At the same time,to ensure that sensitive power business is handled in time,this paper designs the business dispatch model and task migration model,and solves the problem of server failure.Finally,the corresponding simulation program is designed to verify the feasibility and validity of this method,and to compare it with other existing mechanisms. 展开更多
关键词 Power communication network dispatching data networks resource allocation A3C algorithm deep reinforcement learning
在线阅读 下载PDF
上一页 1 2 32 下一页 到第
使用帮助 返回顶部