期刊文献+
共找到516篇文章
< 1 2 26 >
每页显示 20 50 100
A Dynamic Deceptive Defense Framework for Zero-Day Attacks in IIoT:Integrating Stackelberg Game and Multi-Agent Distributed Deep Deterministic Policy Gradient
1
作者 Shigen Shen Xiaojun Ji Yimeng Liu 《Computers, Materials & Continua》 2025年第11期3997-4021,共25页
The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address th... The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios. 展开更多
关键词 Industrial internet of things zero-day attacks Stackelberg game distributed deep deterministic policy gradient defensive spoofing dynamic defense
在线阅读 下载PDF
Noise-driven enhancement for exploration:Deep reinforcement learning for UAV autonomous navigation in complex environments
2
作者 Haotian ZHANG Yiyang LI +1 位作者 Lingquan CHENG Jianliang AI 《Chinese Journal of Aeronautics》 2026年第1期454-471,共18页
Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressin... Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results. 展开更多
关键词 Action space exploration Autonomous navigation deep reinforcement learning Twin delay deep deterministic policy gradient Unmanned aerial vehicle
原文传递
Optimization of Robotic Arm Grasping Strategy Based on Deep Reinforcement Learning
3
作者 Dongjun He 《计算机科学与技术汇刊(中英文版)》 2025年第2期1-7,共7页
In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role ... In recent years,robotic arm grasping has become a pivotal task in the field of robotics,with applications spanning from industrial automation to healthcare.The optimization of grasping strategies plays a crucial role in enhancing the effectiveness,efficiency,and reliability of robotic systems.This paper presents a novel approach to optimizing robotic arm grasping strategies based on deep reinforcement learning(DRL).Through the utilization of advanced DRL algorithms,such as Q-Learning,Deep Q-Networks(DQN),Policy Gradient Methods,and Proximal Policy Optimization(PPO),the study aims to improve the performance of robotic arms in grasping objects with varying shapes,sizes,and environmental conditions.The paper provides a detailed analysis of the various deep reinforcement learning methods used for grasping strategy optimization,emphasizing the strengths and weaknesses of each algorithm.It also presents a comprehensive framework for training the DRL models,including simulation environment setup,the optimization process,and the evaluation metrics for grasping success.The results demonstrate that the proposed approach significantly enhances the accuracy and stability of the robotic arm in performing grasping tasks.The study further explores the challenges in training deep reinforcement learning models for real-time robotic applications and offers solutions for improving the efficiency and reliability of grasping strategies. 展开更多
关键词 Robotic Arm Grasping strategy deep Reinforcement learning Q-learning DQN policy gradient PPO OPTIMIZATION Simulation Robotics
在线阅读 下载PDF
Perception Enhanced Deep Deterministic Policy Gradient for Autonomous Driving in Complex Scenarios
4
作者 Lyuchao Liao Hankun Xiao +3 位作者 Pengqi Xing Zhenhua Gan Youpeng He Jiajun Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期557-576,共20页
Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom... Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data. 展开更多
关键词 Autonomous driving traffic roundabouts deep deterministic policy gradient spatial attention mechanisms
在线阅读 下载PDF
Optimizing the Multi-Objective Discrete Particle Swarm Optimization Algorithm by Deep Deterministic Policy Gradient Algorithm
5
作者 Sun Yang-Yang Yao Jun-Ping +2 位作者 Li Xiao-Jun Fan Shou-Xiang Wang Zi-Wei 《Journal on Artificial Intelligence》 2022年第1期27-35,共9页
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ... Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO. 展开更多
关键词 deep deterministic policy gradient multi-objective discrete particle swarm optimization deep reinforcement learning machine learning
在线阅读 下载PDF
Simultaneous Depth and Heading Control for Autonomous Underwater Vehicle Docking Maneuvers Using Deep Reinforcement Learning within a Digital Twin System
6
作者 Yu-Hsien Lin Po-Cheng Chuang Joyce Yi-Tzu Huang 《Computers, Materials & Continua》 2025年第9期4907-4948,共42页
This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion... This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks. 展开更多
关键词 Autonomous underwater vehicle docking maneuver digital twin deep reinforcement learning twin delayed deep deterministic policy gradient
在线阅读 下载PDF
Enhanced Deep Reinforcement Learning Strategy for Energy Management in Plug-in Hybrid Electric Vehicles with Entropy Regularization and Prioritized Experience Replay
7
作者 Li Wang Xiaoyong Wang 《Energy Engineering》 EI 2024年第12期3953-3979,共27页
Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different ... Plug-in Hybrid Electric Vehicles(PHEVs)represent an innovative breed of transportation,harnessing diverse power sources for enhanced performance.Energy management strategies(EMSs)that coordinate and control different energy sources is a critical component of PHEV control technology,directly impacting overall vehicle performance.This study proposes an improved deep reinforcement learning(DRL)-based EMSthat optimizes realtime energy allocation and coordinates the operation of multiple power sources.Conventional DRL algorithms struggle to effectively explore all possible state-action combinations within high-dimensional state and action spaces.They often fail to strike an optimal balance between exploration and exploitation,and their assumption of a static environment limits their ability to adapt to changing conditions.Moreover,these algorithms suffer from low sample efficiency.Collectively,these factors contribute to convergence difficulties,low learning efficiency,and instability.To address these challenges,the Deep Deterministic Policy Gradient(DDPG)algorithm is enhanced using entropy regularization and a summation tree-based Prioritized Experience Replay(PER)method,aiming to improve exploration performance and learning efficiency from experience samples.Additionally,the correspondingMarkovDecision Process(MDP)is established.Finally,an EMSbased on the improvedDRLmodel is presented.Comparative simulation experiments are conducted against rule-based,optimization-based,andDRL-based EMSs.The proposed strategy exhibitsminimal deviation fromthe optimal solution obtained by the dynamic programming(DP)strategy that requires global information.In the typical driving scenarios based onWorld Light Vehicle Test Cycle(WLTC)and New European Driving Cycle(NEDC),the proposed method achieved a fuel consumption of 2698.65 g and an Equivalent Fuel Consumption(EFC)of 2696.77 g.Compared to the DP strategy baseline,the proposed method improved the fuel efficiency variances(FEV)by 18.13%,15.1%,and 8.37%over the Deep QNetwork(DQN),Double DRL(DDRL),and original DDPG methods,respectively.The observational outcomes demonstrate that the proposed EMS based on improved DRL framework possesses good real-time performance,stability,and reliability,effectively optimizing vehicle economy and fuel consumption. 展开更多
关键词 Plug-in hybrid electric vehicles deep reinforcement learning energy management strategy deep deterministic policy gradient entropy regularization prioritized experience replay
在线阅读 下载PDF
Relevant experience learning:A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments 被引量:24
8
作者 Zijian HU Xiaoguang GAO +2 位作者 Kaifang WAN Yiwei ZHAI Qianglong WANG 《Chinese Journal of Aeronautics》 SCIE EI CAS CSCD 2021年第12期187-204,共18页
Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a ... Unmanned Aerial Vehicles(UAVs)play a vital role in military warfare.In a variety of battlefield mission scenarios,UAVs are required to safely fly to designated locations without human intervention.Therefore,finding a suitable method to solve the UAV Autonomous Motion Planning(AMP)problem can improve the success rate of UAV missions to a certain extent.In recent years,many studies have used Deep Reinforcement Learning(DRL)methods to address the AMP problem and have achieved good results.From the perspective of sampling,this paper designs a sampling method with double-screening,combines it with the Deep Deterministic Policy Gradient(DDPG)algorithm,and proposes the Relevant Experience Learning-DDPG(REL-DDPG)algorithm.The REL-DDPG algorithm uses a Prioritized Experience Replay(PER)mechanism to break the correlation of continuous experiences in the experience pool,finds the experiences most similar to the current state to learn according to the theory in human education,and expands the influence of the learning process on action selection at the current state.All experiments are applied in a complex unknown simulation environment constructed based on the parameters of a real UAV.The training experiments show that REL-DDPG improves the convergence speed and the convergence result compared to the state-of-the-art DDPG algorithm,while the testing experiments show the applicability of the algorithm and investigate the performance under different parameter conditions. 展开更多
关键词 Autonomous Motion Planning(AMP) deep deterministic policy gradient(DDPG) deep Reinforcement learning(DRL) Sampling method UAV
原文传递
Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs 被引量:15
9
作者 LI Yue QIU Xiaohui +1 位作者 LIU Xiaodong XIA Qunli 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第4期734-742,共9页
The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies wh... The ever-changing battlefield environment requires the use of robust and adaptive technologies integrated into a reliable platform. Unmanned combat aerial vehicles(UCAVs) aim to integrate such advanced technologies while increasing the tactical capabilities of combat aircraft. As a research object, common UCAV uses the neural network fitting strategy to obtain values of attack areas. However, this simple strategy cannot cope with complex environmental changes and autonomously optimize decision-making problems. To solve the problem, this paper proposes a new deep deterministic policy gradient(DDPG) strategy based on deep reinforcement learning for the attack area fitting of UCAVs in the future battlefield. Simulation results show that the autonomy and environmental adaptability of UCAVs in the future battlefield will be improved based on the new DDPG algorithm and the training process converges quickly. We can obtain the optimal values of attack areas in real time during the whole flight with the well-trained deep network. 展开更多
关键词 attack area neural network deep deterministic policy gradient(DDPG) unmanned combat aerial vehicle(UCAV)
在线阅读 下载PDF
Moving target defense of routing randomization with deep reinforcement learning against eavesdropping attack 被引量:5
10
作者 Xiaoyu Xu Hao Hu +3 位作者 Yuling Liu Jinglei Tan Hongqi Zhang Haotian Song 《Digital Communications and Networks》 SCIE CSCD 2022年第3期373-387,共15页
Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harm... Eavesdropping attacks have become one of the most common attacks on networks because of their easy implementation. Eavesdropping attacks not only lead to transmission data leakage but also develop into other more harmful attacks. Routing randomization is a relevant research direction for moving target defense, which has been proven to be an effective method to resist eavesdropping attacks. To counter eavesdropping attacks, in this study, we analyzed the existing routing randomization methods and found that their security and usability need to be further improved. According to the characteristics of eavesdropping attacks, which are “latent and transferable”, a routing randomization defense method based on deep reinforcement learning is proposed. The proposed method realizes routing randomization on packet-level granularity using programmable switches. To improve the security and quality of service of legitimate services in networks, we use the deep deterministic policy gradient to generate random routing schemes with support from powerful network state awareness. In-band network telemetry provides real-time, accurate, and comprehensive network state awareness for the proposed method. Various experiments show that compared with other typical routing randomization defense methods, the proposed method has obvious advantages in security and usability against eavesdropping attacks. 展开更多
关键词 Routing randomization Moving target defense deep reinforcement learning deep deterministic policy gradient
在线阅读 下载PDF
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:5
11
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system Multi-agent system Distributed optimization Multi-agent deep deterministic policy gradient Real-time optimization decision
在线阅读 下载PDF
Deep reinforcement learning guidance with impact time control 被引量:1
12
作者 LI Guofei LI Shituo +1 位作者 LI Bohao WU Yunjie 《Journal of Systems Engineering and Electronics》 CSCD 2024年第6期1594-1603,共10页
In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desi... In consideration of the field-of-view(FOV)angle con-straint,this study focuses on the guidance problem with impact time control.A deep reinforcement learning guidance method is given for the missile to obtain the desired impact time and meet the demand of FOV angle constraint.On basis of the framework of the proportional navigation guidance,an auxiliary control term is supplemented by the distributed deep deterministic policy gradient algorithm,in which the reward functions are developed to decrease the time-to-go error and improve the terminal guid-ance accuracy.The numerical simulation demonstrates that the missile governed by the presented deep reinforcement learning guidance law can hit the target successfully at appointed arrival time. 展开更多
关键词 impact time deep reinforcement learning guidance law field-of-view(FOV)angle deep deterministic policy gradient
在线阅读 下载PDF
RIS-Assisted UAV-D2D Communications Exploiting Deep Reinforcement Learning 被引量:1
13
作者 YOU Qian XU Qian +2 位作者 YANG Xin ZHANG Tao CHEN Ming 《ZTE Communications》 2023年第2期61-69,共9页
Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interferenc... Device-to-device(D2D)communications underlying cellular networks enabled by unmanned aerial vehicles(UAV)have been regarded as promising techniques for next-generation communications.To mitigate the strong interference caused by the line-of-sight(LoS)airto-ground channels,we deploy a reconfigurable intelligent surface(RIS)to rebuild the wireless channels.A joint optimization problem of the transmit power of UAV,the transmit power of D2D users and the RIS phase configuration are investigated to maximize the achievable rate of D2D users while satisfying the quality of service(QoS)requirement of cellular users.Due to the high channel dynamics and the coupling among cellular users,the RIS,and the D2D users,it is challenging to find a proper solution.Thus,a RIS softmax deep double deterministic(RIS-SD3)policy gradient method is proposed,which can smooth the optimization space as well as reduce the number of local optimizations.Specifically,the SD3 algorithm maximizes the reward of the agent by training the agent to maximize the value function after the softmax operator is introduced.Simulation results show that the proposed RIS-SD3 algorithm can significantly improve the rate of the D2D users while controlling the interference to the cellular user.Moreover,the proposed RIS-SD3 algorithm has better robustness than the twin delayed deep deterministic(TD3)policy gradient algorithm in a dynamic environment. 展开更多
关键词 device-to-device communications reconfigurable intelligent surface deep reinforcement learning softmax deep double deterministic policy gradient
在线阅读 下载PDF
Real-Time Implementation of Quadrotor UAV Control System Based on a Deep Reinforcement Learning Approach
14
作者 Taha Yacine Trad Kheireddine Choutri +4 位作者 Mohand Lagha Souham Meshoul Fouad Khenfri Raouf Fareh Hadil Shaiba 《Computers, Materials & Continua》 SCIE EI 2024年第12期4757-4786,共30页
The popularity of quadrotor Unmanned Aerial Vehicles(UAVs)stems from their simple propulsion systems and structural design.However,their complex and nonlinear dynamic behavior presents a significant challenge for cont... The popularity of quadrotor Unmanned Aerial Vehicles(UAVs)stems from their simple propulsion systems and structural design.However,their complex and nonlinear dynamic behavior presents a significant challenge for control,necessitating sophisticated algorithms to ensure stability and accuracy in flight.Various strategies have been explored by researchers and control engineers,with learning-based methods like reinforcement learning,deep learning,and neural networks showing promise in enhancing the robustness and adaptability of quadrotor control systems.This paper investigates a Reinforcement Learning(RL)approach for both high and low-level quadrotor control systems,focusing on attitude stabilization and position tracking tasks.A novel reward function and actor-critic network structures are designed to stimulate high-order observable states,improving the agent’s understanding of the quadrotor’s dynamics and environmental constraints.To address the challenge of RL hyper-parameter tuning,a new framework is introduced that combines Simulated Annealing(SA)with a reinforcement learning algorithm,specifically Simulated Annealing-Twin Delayed Deep Deterministic Policy Gradient(SA-TD3).This approach is evaluated for path-following and stabilization tasks through comparative assessments with two commonly used control methods:Backstepping and Sliding Mode Control(SMC).While the implementation of the well-trained agents exhibited unexpected behavior during real-world testing,a reduced neural network used for altitude control was successfully implemented on a Parrot Mambo mini drone.The results showcase the potential of the proposed SA-TD3 framework for real-world applications,demonstrating improved stability and precision across various test scenarios and highlighting its feasibility for practical deployment. 展开更多
关键词 deep reinforcement learning hyper-parameters optimization path following QUADROTOR twin delayed deep deterministic policy gradient and simulated annealing
在线阅读 下载PDF
Optimization of plunger lift working systems using reinforcement learning for coupled wellbore/reservoir
15
作者 Zhi-Sheng Xing Guo-Qing Han +5 位作者 You-Liang Jia Wei Tian Hang-Fei Gong Wen-Bo Jiang Pei-Dong Mai Xing-Yuan Liang 《Petroleum Science》 2025年第5期2154-2168,共15页
In the mid-to-late stages of gas reservoir development,liquid loading in gas wells becomes a common challenge.Plunger lift,as an intermittent production technique,is widely used for deliquification in gas wells.With t... In the mid-to-late stages of gas reservoir development,liquid loading in gas wells becomes a common challenge.Plunger lift,as an intermittent production technique,is widely used for deliquification in gas wells.With the advancement of big data and artificial intelligence,the future of oil and gas field development is trending towards intelligent,unmanned,and automated operations.Currently,the optimization of plunger lift working systems is primarily based on expert experience and manual control,focusing mainly on the success of the plunger lift without adequately considering the impact of different working systems on gas production.Additionally,liquid loading in gas wells is a dynamic process,and the intermittent nature of plunger lift requires accurate modeling;using constant inflow dynamics to describe reservoir flow introduces significant errors.To address these challenges,this study establishes a coupled wellbore-reservoir model for plunger lift wells and validates the computational wellhead pressure results against field measurements.Building on this model,a novel optimization control algorithm based on the deep deterministic policy gradient(DDPG)framework is proposed.The algorithm aims to optimize plunger lift working systems to balance overall reservoir pressure,stabilize gas-water ratios,and maximize gas production.Through simulation experiments in three different production optimization scenarios,the effectiveness of reinforcement learning algorithms(including RL,PPO,DQN,and the proposed DDPG)and traditional optimization algorithms(including GA,PSO,and Bayesian optimization)in enhancing production efficiency is compared.The results demonstrate that the coupled model provides highly accurate calculations and can precisely describe the transient production of wellbore and gas reservoir systems.The proposed DDPG algorithm achieves the highest reward value during training with minimal error,leading to a potential increase in cumulative gas production by up to 5%and cumulative liquid production by 252%.The DDPG algorithm exhibits robustness across different optimization scenarios,showcasing excellent adaptability and generalization capabilities. 展开更多
关键词 Plunger lift Liquid loading Deliquification Reinforcement learning deep deterministic policy gradient(DDPG) Artificial intelligence
原文传递
Optimum scheduling of truck-based mobile energy couriers(MEC)using deep deterministic policy gradient
16
作者 Yaze Li Jingxian Wu Yanjun Pan 《Intelligent and Converged Networks》 2025年第3期195-208,共14页
We propose a new architecture of truck-based mobile energy couriers(MEC)for power distribution networks with high penetration of renewable energy sources(RES).Each MEC is a truck equipped with high-density inverters,c... We propose a new architecture of truck-based mobile energy couriers(MEC)for power distribution networks with high penetration of renewable energy sources(RES).Each MEC is a truck equipped with high-density inverters,converters,capacitor banks,and energy storage devices.The MEC platform can improve the flexibility,resilience,and RES hosting capability of a distribution grid through spatial-temporal energy reallocation based on the stochastic behaviors of RES and loads.The employment of MEC necessitates the development of complex scheduling and control schemes that can adaptively cope with the dynamic natures of both the power grid and the transportation network.The problem is formulated as a non-convex optimization problem to minimize the total generation cost,subject to the various constraints imposed by conventional and renewable energy sources,energy storage,and transportation networks,etc.The problem is solved by combining optimal power flow(OPF)with deep reinforcement learning(DRL)under the framework of deep deterministic policy gradient(DDPG).Simulation results demonstrate that the proposed MEC platform with DDPG can achieve significant cost reduction compared to conventional systems with static energy storage. 展开更多
关键词 transportation network renewable energy integration mobile energy couriers(MECs) markov decision process(MDP) deep deterministic policy gradient(DDPG)
原文传递
DDPG优化算法的改进型自抗扰风电机组桨距角控制
17
作者 徐晓宁 范召强 +3 位作者 周雪松 陶珑 问虎龙 杨风霞 《太阳能学报》 北大核心 2026年第1期575-584,共10页
为解决传统风电机组桨距角控制策略面对风速变化时存在动态响应差以及控制器参数适应性不足导致输出功率波动大的问题,提出一种基于深度确定性策略梯度(DDPG)算法的改进型线性自抗扰桨距角控制策略。该策略在线性扩张状态观测器(LESO)... 为解决传统风电机组桨距角控制策略面对风速变化时存在动态响应差以及控制器参数适应性不足导致输出功率波动大的问题,提出一种基于深度确定性策略梯度(DDPG)算法的改进型线性自抗扰桨距角控制策略。该策略在线性扩张状态观测器(LESO)基础上引入自由扩张维度的状态变量,并对增阶后的参数基于比例微分形式进行改进,以提高对扰动的顺馈矫正能力。随后根据发电机转速误差设计合适的奖励函数,利用DDPG算法使改进后的线性自抗扰控制(LADRC)参数能够自适应调整,实现最优的控制效果。仿真结果表明,所提策略能有效应对风速剧烈波动,使桨距角能快速适应风速变化,从而维持风电机组的稳定运行和电能的高效输出。 展开更多
关键词 风电机组 桨距角 线性自抗扰控制 深度确定性策略梯度 奖励函数 参数整定
原文传递
自适应与多目标优化的VSG低频振荡TD3 控制策略
18
作者 李永刚 周鹤然 +1 位作者 周一辰 魏凡超 《辽宁工程技术大学学报(自然科学版)》 北大核心 2026年第1期98-106,共9页
针对虚拟同步机(VSG)接入弱电网频发的低频振荡问题,提出一种融合动态惯量-阻尼协同调节与多模态双延迟深度确定性策略梯度算法的VSG智能控制方法。构建包含动态惯性-阻尼调节机制的增强型VSG模型,基于频率波动标准差与变化率的实时监测... 针对虚拟同步机(VSG)接入弱电网频发的低频振荡问题,提出一种融合动态惯量-阻尼协同调节与多模态双延迟深度确定性策略梯度算法的VSG智能控制方法。构建包含动态惯性-阻尼调节机制的增强型VSG模型,基于频率波动标准差与变化率的实时监测,设计参数连续自适应算法,实现惯量常数H和阻尼系数D的动态协同优化。设计深度前馈神经网络的振荡感知型定性策略梯度算法(TD3),采用双状态经验回放缓冲区结构,将低频振荡特征向量嵌入训练样本,并构建包含频率偏差惩罚、电压偏移抑制和振荡能量约束的多目标奖励函数。仿真和实际算例结果表明,该策略可实现VSG低频振荡的在线快速准确评估,增强系统阻尼与惯量,减少低频振荡风险,改善系统的稳定性。 展开更多
关键词 虚拟同步机 低频振荡抑制 阻尼系数 动态惯量调节 双延迟深度确定性策略梯度算法
原文传递
MEC网络中双延迟深度确定性策略梯度的能效优化算法
19
作者 吴名星 《空天预警研究学报》 2026年第1期52-56,共5页
为解决动态移动边缘计算(MEC)网络中任务卸载与资源分配的能效优化问题,针对传统算法适应性差、强化学习算法稳定性不足的缺陷,提出基于双延迟深度确定性策略梯度(twin delayed DDPG, TD3)的能效优化(TD3-EE)算法.首先,考虑任务异构性... 为解决动态移动边缘计算(MEC)网络中任务卸载与资源分配的能效优化问题,针对传统算法适应性差、强化学习算法稳定性不足的缺陷,提出基于双延迟深度确定性策略梯度(twin delayed DDPG, TD3)的能效优化(TD3-EE)算法.首先,考虑任务异构性与动态资源状态构建了系统模型,建立时延约束下的能效最大化目标函数;然后,将问题转化为马尔可夫决策过程(MDP)模型,并利用TD3算法双Critic网络与延迟更新机制提升决策稳定性.仿真结果表明,该算法在任务完成率、能耗控制及收敛稳定性上优于DDPG-EE、TPBA算法. 展开更多
关键词 移动边缘计算 双延迟深度确定性策略梯度 任务卸载 资源分配
在线阅读 下载PDF
基于深度强化学习的多无人车协同路径规划方法
20
作者 戴晟潭 王寅 尚晨晨 《北京航空航天大学学报》 北大核心 2026年第2期541-550,共10页
为解决多无人车系统中的协同路径规划问题,利用深度强化学习方法,设计了一种高效的路径规划框架。构建基于双轮差速无人车的运动学模型和协同避障场景的数学模型;在此基础上,进一步分析深度强化学习在处理高维度状态空间和连续动作空间... 为解决多无人车系统中的协同路径规划问题,利用深度强化学习方法,设计了一种高效的路径规划框架。构建基于双轮差速无人车的运动学模型和协同避障场景的数学模型;在此基础上,进一步分析深度强化学习在处理高维度状态空间和连续动作空间等复杂动态场景时训练速度慢、采样效率低和适应能力差的机理,为多无人车协同路径规划研究提供理论基础。针对全部可观测条件下多无人车协同路径规划避障围捕的策略生成问题,提出改进双延迟深度确定性策略梯度(AE-TD3)算法,在围捕无人车输出的动作上添加来自高斯分布的随机噪声,并权衡探索或利用输出动作,使围捕无人车在未知环境中能更有效地探索,实现多无人车高效稳定的协同避障围捕。仿真实验表明,改进算法相较于双延迟深度确定性策略梯度(TD3)算法,平均奖励的收敛速度更快,围捕时间缩短16.7%,验证了改进算法的可行性。 展开更多
关键词 路径规划 协同避障和围捕 深度强化学习 双延迟深度确定性策略梯度算法 动作增强探索策略
原文传递
上一页 1 2 26 下一页 到第
使用帮助 返回顶部