期刊文献+
共找到432篇文章
< 1 2 22 >
每页显示 20 50 100
A Dynamic Deceptive Defense Framework for Zero-Day Attacks in IIoT:Integrating Stackelberg Game and Multi-Agent Distributed Deep Deterministic Policy Gradient
1
作者 Shigen Shen Xiaojun Ji Yimeng Liu 《Computers, Materials & Continua》 2025年第11期3997-4021,共25页
The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address th... The Industrial Internet of Things(IIoT)is increasingly vulnerable to sophisticated cyber threats,particularly zero-day attacks that exploit unknown vulnerabilities and evade traditional security measures.To address this critical challenge,this paper proposes a dynamic defense framework named Zero-day-aware Stackelberg Game-based Multi-Agent Distributed Deep Deterministic Policy Gradient(ZSG-MAD3PG).The framework integrates Stackelberg game modeling with the Multi-Agent Distributed Deep Deterministic Policy Gradient(MAD3PG)algorithm and incorporates defensive deception(DD)strategies to achieve adaptive and efficient protection.While conventional methods typically incur considerable resource overhead and exhibit higher latency due to static or rigid defensive mechanisms,the proposed ZSG-MAD3PG framework mitigates these limitations through multi-stage game modeling and adaptive learning,enabling more efficient resource utilization and faster response times.The Stackelberg-based architecture allows defenders to dynamically optimize packet sampling strategies,while attackers adjust their tactics to reach rapid equilibrium.Furthermore,dynamic deception techniques reduce the time required for the concealment of attacks and the overall system burden.A lightweight behavioral fingerprinting detection mechanism further enhances real-time zero-day attack identification within industrial device clusters.ZSG-MAD3PG demonstrates higher true positive rates(TPR)and lower false alarm rates(FAR)compared to existing methods,while also achieving improved latency,resource efficiency,and stealth adaptability in IIoT zero-day defense scenarios. 展开更多
关键词 Industrial internet of things zero-day attacks Stackelberg game distributed deep deterministic policy gradient defensive spoofing dynamic defense
在线阅读 下载PDF
Perception Enhanced Deep Deterministic Policy Gradient for Autonomous Driving in Complex Scenarios
2
作者 Lyuchao Liao Hankun Xiao +3 位作者 Pengqi Xing Zhenhua Gan Youpeng He Jiajun Wang 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第7期557-576,共20页
Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonom... Autonomous driving has witnessed rapid advancement;however,ensuring safe and efficient driving in intricate scenarios remains a critical challenge.In particular,traffic roundabouts bring a set of challenges to autonomous driving due to the unpredictable entry and exit of vehicles,susceptibility to traffic flow bottlenecks,and imperfect data in perceiving environmental information,rendering them a vital issue in the practical application of autonomous driving.To address the traffic challenges,this work focused on complex roundabouts with multi-lane and proposed a Perception EnhancedDeepDeterministic Policy Gradient(PE-DDPG)for AutonomousDriving in the Roundabouts.Specifically,themodel incorporates an enhanced variational autoencoder featuring an integrated spatial attention mechanism alongside the Deep Deterministic Policy Gradient framework,enhancing the vehicle’s capability to comprehend complex roundabout environments and make decisions.Furthermore,the PE-DDPG model combines a dynamic path optimization strategy for roundabout scenarios,effectively mitigating traffic bottlenecks and augmenting throughput efficiency.Extensive experiments were conducted with the collaborative simulation platform of CARLA and SUMO,and the experimental results show that the proposed PE-DDPG outperforms the baseline methods in terms of the convergence capacity of the training process,the smoothness of driving and the traffic efficiency with diverse traffic flow patterns and penetration rates of autonomous vehicles(AVs).Generally,the proposed PE-DDPGmodel could be employed for autonomous driving in complex scenarios with imperfect data. 展开更多
关键词 Autonomous driving traffic roundabouts deep deterministic policy gradient spatial attention mechanisms
在线阅读 下载PDF
Optimizing the Multi-Objective Discrete Particle Swarm Optimization Algorithm by Deep Deterministic Policy Gradient Algorithm
3
作者 Sun Yang-Yang Yao Jun-Ping +2 位作者 Li Xiao-Jun Fan Shou-Xiang Wang Zi-Wei 《Journal on Artificial Intelligence》 2022年第1期27-35,共9页
Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains ... Deep deterministic policy gradient(DDPG)has been proved to be effective in optimizing particle swarm optimization(PSO),but whether DDPG can optimize multi-objective discrete particle swarm optimization(MODPSO)remains to be determined.The present work aims to probe into this topic.Experiments showed that the DDPG can not only quickly improve the convergence speed of MODPSO,but also overcome the problem of local optimal solution that MODPSO may suffer.The research findings are of great significance for the theoretical research and application of MODPSO. 展开更多
关键词 deep deterministic policy gradient multi-objective discrete particle swarm optimization deep reinforcement learning machine learning
在线阅读 下载PDF
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:5
4
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system multi-agent system Distributed optimization multi-agent deep deterministic policy gradient Real-time optimization decision
在线阅读 下载PDF
Noise-driven enhancement for exploration:Deep reinforcement learning for UAV autonomous navigation in complex environments
5
作者 Haotian ZHANG Yiyang LI +1 位作者 Lingquan CHENG Jianliang AI 《Chinese Journal of Aeronautics》 2026年第1期454-471,共18页
Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressin... Unmanned Aerial Vehicle(UAV)plays a prominent role in various fields,and autonomous navigation is a crucial component of UAV intelligence.Deep Reinforcement Learning(DRL)has expanded the research avenues for addressing challenges in autonomous navigation.Nonetheless,challenges persist,including getting stuck in local optima,consuming excessive computations during action space exploration,and neglecting deterministic experience.This paper proposes a noise-driven enhancement strategy.In accordance with the overall learning phases,a global noise control method is designed,while a differentiated local noise control method is developed by analyzing the exploration demands of four typical situations encountered by UAV during navigation.Both methods are integrated into a dual-model for noise control to regulate action space exploration.Furthermore,noise dual experience replay buffers are designed to optimize the rational utilization of both deterministic and noisy experience.In uncertain environments,based on the Twin Delay Deep Deterministic Policy Gradient(TD3)algorithm with Long Short-Term Memory(LSTM)network and Priority Experience Replay(PER),a Noise-Driven Enhancement Priority Memory TD3(NDE-PMTD3)is developed.We established a simulation environment to compare different algorithms,and the performance of the algorithms is analyzed in various scenarios.The training results indicate that the proposed algorithm accelerates the convergence speed and enhances the convergence stability.In test experiments,the proposed algorithm successfully and efficiently performs autonomous navigation tasks in diverse environments,demonstrating superior generalization results. 展开更多
关键词 Action space exploration Autonomous navigation deep reinforcement learning Twin delay deep deterministic policy gradient Unmanned aerial vehicle
原文传递
Deep Reinforcement Learning for Competitive DER Pricing Problem of Virtual Power Plants
6
作者 Zheng Xu Ye Guo +2 位作者 Hongbin Sun Wenjun Tang Wenqi Huang 《CSEE Journal of Power and Energy Systems》 2026年第1期150-161,共12页
Pricing competition between virtual power plants(VPPs)for distributed energy resources(DERs)is considered in this paper.Due to limited amount of DERs in one distributed area,VPPs have to compete for the rights to work... Pricing competition between virtual power plants(VPPs)for distributed energy resources(DERs)is considered in this paper.Due to limited amount of DERs in one distributed area,VPPs have to compete for the rights to work with DERs and then sell electricity from internal DERs in the wholesale market.To address this pricing problem,a Markov decision process(MDP)with continuous state and action spaces is formulated for the VPP to consider future rewards brought by contract statuses of DERs.Deep deterministic policy gradient(DDPG)algorithm is applied to solve the pricing problem in MDP form.To deal with the non-stationary environment in the training process brought by competing VPP,a fictitious adversary method is put forward in this paper to combine with DDPG algorithm for the first time.The proposed fictitious adversary method can help the VPP in finding competitive and robust pricing strategies under competition.Numerical results demonstrate effectiveness of the proposed methodology in finding satisfying pricing strategies that consider competitor behavior and long-term values of DERs. 展开更多
关键词 deep deterministic policy gradient distributed energy resources electricity markets reinforcement learning virtual power plants
原文传递
Optimum scheduling of truck-based mobile energy couriers(MEC)using deep deterministic policy gradient
7
作者 Yaze Li Jingxian Wu Yanjun Pan 《Intelligent and Converged Networks》 2025年第3期195-208,共14页
We propose a new architecture of truck-based mobile energy couriers(MEC)for power distribution networks with high penetration of renewable energy sources(RES).Each MEC is a truck equipped with high-density inverters,c... We propose a new architecture of truck-based mobile energy couriers(MEC)for power distribution networks with high penetration of renewable energy sources(RES).Each MEC is a truck equipped with high-density inverters,converters,capacitor banks,and energy storage devices.The MEC platform can improve the flexibility,resilience,and RES hosting capability of a distribution grid through spatial-temporal energy reallocation based on the stochastic behaviors of RES and loads.The employment of MEC necessitates the development of complex scheduling and control schemes that can adaptively cope with the dynamic natures of both the power grid and the transportation network.The problem is formulated as a non-convex optimization problem to minimize the total generation cost,subject to the various constraints imposed by conventional and renewable energy sources,energy storage,and transportation networks,etc.The problem is solved by combining optimal power flow(OPF)with deep reinforcement learning(DRL)under the framework of deep deterministic policy gradient(DDPG).Simulation results demonstrate that the proposed MEC platform with DDPG can achieve significant cost reduction compared to conventional systems with static energy storage. 展开更多
关键词 transportation network renewable energy integration mobile energy couriers(MECs) markov decision process(MDP) deep deterministic policy gradient(DDPG)
原文传递
Simultaneous Depth and Heading Control for Autonomous Underwater Vehicle Docking Maneuvers Using Deep Reinforcement Learning within a Digital Twin System
8
作者 Yu-Hsien Lin Po-Cheng Chuang Joyce Yi-Tzu Huang 《Computers, Materials & Continua》 2025年第9期4907-4948,共42页
This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion... This study proposes an automatic control system for Autonomous Underwater Vehicle(AUV)docking,utilizing a digital twin(DT)environment based on the HoloOcean platform,which integrates six-degree-of-freedom(6-DOF)motion equations and hydrodynamic coefficients to create a realistic simulation.Although conventional model-based and visual servoing approaches often struggle in dynamic underwater environments due to limited adaptability and extensive parameter tuning requirements,deep reinforcement learning(DRL)offers a promising alternative.In the positioning stage,the Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm is employed for synchronized depth and heading control,which offers stable training,reduced overestimation bias,and superior handling of continuous control compared to other DRL methods.During the searching stage,zig-zag heading motion combined with a state-of-the-art object detection algorithm facilitates docking station localization.For the docking stage,this study proposes an innovative Image-based DDPG(I-DDPG),enhanced and trained in a Unity-MATLAB simulation environment,to achieve visual target tracking.Furthermore,integrating a DT environment enables efficient and safe policy training,reduces dependence on costly real-world tests,and improves sim-to-real transfer performance.Both simulation and real-world experiments were conducted,demonstrating the effectiveness of the system in improving AUV control strategies and supporting the transition from simulation to real-world operations in underwater environments.The results highlight the scalability and robustness of the proposed system,as evidenced by the TD3 controller achieving 25%less oscillation than the adaptive fuzzy controller when reaching the target depth,thereby demonstrating superior stability,accuracy,and potential for broader and more complex autonomous underwater tasks. 展开更多
关键词 Autonomous underwater vehicle docking maneuver digital twin deep reinforcement learning twin delayed deep deterministic policy gradient
在线阅读 下载PDF
基于区块链的车载自组网络安全计算卸载策略
9
作者 陈雷 牛芳琳 《武汉大学学报(工学版)》 北大核心 2026年第2期319-333,共15页
针对基于区块链的车载自组网络(vehicle Ad-Hoc networks,VANETs),研究将服务器车辆与移动边缘计算(mobile edge computing,MEC)及云计算相结合的方式,解决多车辆安全计算卸载问题。提出一种基于区块链的车载自组网络框架;采用基于区块... 针对基于区块链的车载自组网络(vehicle Ad-Hoc networks,VANETs),研究将服务器车辆与移动边缘计算(mobile edge computing,MEC)及云计算相结合的方式,解决多车辆安全计算卸载问题。提出一种基于区块链的车载自组网络框架;采用基于区块链的访问控制方法保护车辆与服务器车辆、边缘云服务器之间的计算卸载安全;在最小化所有车辆的系统延迟、能量消耗和计算损耗的条件下,提出车载自组网络中密集计算任务卸载的最优化问题。该问题联合优化计算卸载策略、共识机制策略、功率资源、计算资源与信道带宽分配策略。为解决该最优化问题,提出一种新的深度强化学习(deep reinforcement learning,DRL)算法,即改进的多智能体深度确定性策略梯度(multi-agent deep deterministic policy gradient,MADDPG)算法。仿真结果表明,与其他现有算法相比,所提算法具有显著优势。 展开更多
关键词 车载自组网络 区块链 计算卸载 边缘云计算 多智能体深度确定性策略梯度
原文传递
DDPG优化算法的改进型自抗扰风电机组桨距角控制
10
作者 徐晓宁 范召强 +3 位作者 周雪松 陶珑 问虎龙 杨风霞 《太阳能学报》 北大核心 2026年第1期575-584,共10页
为解决传统风电机组桨距角控制策略面对风速变化时存在动态响应差以及控制器参数适应性不足导致输出功率波动大的问题,提出一种基于深度确定性策略梯度(DDPG)算法的改进型线性自抗扰桨距角控制策略。该策略在线性扩张状态观测器(LESO)... 为解决传统风电机组桨距角控制策略面对风速变化时存在动态响应差以及控制器参数适应性不足导致输出功率波动大的问题,提出一种基于深度确定性策略梯度(DDPG)算法的改进型线性自抗扰桨距角控制策略。该策略在线性扩张状态观测器(LESO)基础上引入自由扩张维度的状态变量,并对增阶后的参数基于比例微分形式进行改进,以提高对扰动的顺馈矫正能力。随后根据发电机转速误差设计合适的奖励函数,利用DDPG算法使改进后的线性自抗扰控制(LADRC)参数能够自适应调整,实现最优的控制效果。仿真结果表明,所提策略能有效应对风速剧烈波动,使桨距角能快速适应风速变化,从而维持风电机组的稳定运行和电能的高效输出。 展开更多
关键词 风电机组 桨距角 线性自抗扰控制 深度确定性策略梯度 奖励函数 参数整定
原文传递
自适应与多目标优化的VSG低频振荡TD3 控制策略
11
作者 李永刚 周鹤然 +1 位作者 周一辰 魏凡超 《辽宁工程技术大学学报(自然科学版)》 北大核心 2026年第1期98-106,共9页
针对虚拟同步机(VSG)接入弱电网频发的低频振荡问题,提出一种融合动态惯量-阻尼协同调节与多模态双延迟深度确定性策略梯度算法的VSG智能控制方法。构建包含动态惯性-阻尼调节机制的增强型VSG模型,基于频率波动标准差与变化率的实时监测... 针对虚拟同步机(VSG)接入弱电网频发的低频振荡问题,提出一种融合动态惯量-阻尼协同调节与多模态双延迟深度确定性策略梯度算法的VSG智能控制方法。构建包含动态惯性-阻尼调节机制的增强型VSG模型,基于频率波动标准差与变化率的实时监测,设计参数连续自适应算法,实现惯量常数H和阻尼系数D的动态协同优化。设计深度前馈神经网络的振荡感知型定性策略梯度算法(TD3),采用双状态经验回放缓冲区结构,将低频振荡特征向量嵌入训练样本,并构建包含频率偏差惩罚、电压偏移抑制和振荡能量约束的多目标奖励函数。仿真和实际算例结果表明,该策略可实现VSG低频振荡的在线快速准确评估,增强系统阻尼与惯量,减少低频振荡风险,改善系统的稳定性。 展开更多
关键词 虚拟同步机 低频振荡抑制 阻尼系数 动态惯量调节 双延迟深度确定性策略梯度算法
原文传递
基于LLM-DDPG协同决控实现闭环自动驾驶
12
作者 郭彤颖 樊烁 郑岩 《汽车技术》 北大核心 2026年第4期10-16,共7页
针对自动驾驶领域中规则驱动方法在长尾场景下泛化能力有限,数据驱动模型存在数据过拟合、决策过程缺乏可解释性等问题,提出一种基于大语言模型(LLM)和深度确定性策略梯度(DDPG)驱动的自动驾驶框架。通过整合常识与环境数据,生成可解释... 针对自动驾驶领域中规则驱动方法在长尾场景下泛化能力有限,数据驱动模型存在数据过拟合、决策过程缺乏可解释性等问题,提出一种基于大语言模型(LLM)和深度确定性策略梯度(DDPG)驱动的自动驾驶框架。通过整合常识与环境数据,生成可解释性决策;利用DDPG算法实现底层控制;在存储模块中,设计数值+语义特征相似性检索机制,为当前决策任务实时匹配相似历史案例辅助动态决策。试验结果表明:相较于Rule-Driving和DQN-Driving,提出的方法在异构测试场景A中的成功率分别提升约33%和14%;在异构测试场景B中的成功率分别提升约70%和42%,表现出更强的跨场景泛化能力和环境适应能力。 展开更多
关键词 自动驾驶 大语言模型 思维链 深度确定性策略梯度算法 案例推理
在线阅读 下载PDF
MEC网络中双延迟深度确定性策略梯度的能效优化算法
13
作者 吴名星 《空天预警研究学报》 2026年第1期52-56,共5页
为解决动态移动边缘计算(MEC)网络中任务卸载与资源分配的能效优化问题,针对传统算法适应性差、强化学习算法稳定性不足的缺陷,提出基于双延迟深度确定性策略梯度(twin delayed DDPG, TD3)的能效优化(TD3-EE)算法.首先,考虑任务异构性... 为解决动态移动边缘计算(MEC)网络中任务卸载与资源分配的能效优化问题,针对传统算法适应性差、强化学习算法稳定性不足的缺陷,提出基于双延迟深度确定性策略梯度(twin delayed DDPG, TD3)的能效优化(TD3-EE)算法.首先,考虑任务异构性与动态资源状态构建了系统模型,建立时延约束下的能效最大化目标函数;然后,将问题转化为马尔可夫决策过程(MDP)模型,并利用TD3算法双Critic网络与延迟更新机制提升决策稳定性.仿真结果表明,该算法在任务完成率、能耗控制及收敛稳定性上优于DDPG-EE、TPBA算法. 展开更多
关键词 移动边缘计算 双延迟深度确定性策略梯度 任务卸载 资源分配
在线阅读 下载PDF
Full-model-free Adaptive Graph Deep Deterministic Policy Gradient Model for Multi-terminal Soft Open Point Voltage Control in Distribution Systems 被引量:2
14
作者 Huayi Wu Zhao Xu +1 位作者 Minghao Wang Youwei Jia 《Journal of Modern Power Systems and Clean Energy》 CSCD 2024年第6期1893-1904,共12页
High penetration of renewable energy sources(RESs)induces sharply-fluctuating feeder power,leading to volt-age deviation in active distribution systems.To prevent voltage violations,multi-terminal soft open points(M-s... High penetration of renewable energy sources(RESs)induces sharply-fluctuating feeder power,leading to volt-age deviation in active distribution systems.To prevent voltage violations,multi-terminal soft open points(M-sOPs)have been integrated into the distribution systems to enhance voltage con-trol flexibility.However,the M-SOP voltage control recalculated in real time cannot adapt to the rapid fluctuations of photovol-taic(PV)power,fundamentally limiting the voltage controllabili-ty of M-SOPs.To address this issue,a full-model-free adaptive graph deep deterministic policy gradient(FAG-DDPG)model is proposed for M-SOP voltage control.Specifically,the attention-based adaptive graph convolutional network(AGCN)is lever-aged to extract the complex correlation features of nodal infor-mation to improve the policy learning ability.Then,the AGCN-based surrogate model is trained to replace the power flow cal-culation to achieve model-free control.Furthermore,the deep deterministic policy gradient(DDPG)algorithm allows FAG-DDPG model to learn an optimal control strategy of M-SOP by continuous interactions with the AGCN-based surrogate model.Numerical tests have been performed on modified IEEE 33-node,123-node,and a real 76-node distribution systems,which demonstrate the effectiveness and generalization ability of the proposed FAG-DDPGmodel. 展开更多
关键词 Soft open point graph attention graph convolutional network reinforcement learning voltage control distribution system deep deterministic policy gradient
原文传递
基于马尔科夫转换场与深度确定性策略梯度算法的VSC-HVDC系统控制参数优化方法
15
作者 朱介北 黄闽杰 +3 位作者 俞露杰 欧开健 刘晓龙 贾宏杰 《中国电机工程学报》 北大核心 2026年第5期1821-1832,I0008,共13页
针对柔性直流输电系统(voltage source converter based high voltage direct current transmission,VSC-HVDC)控制参数设计过程中存在的鲁棒性差、依赖已知电路参数、工程设计经验化等问题,提出一种基于马尔科夫转换场(Markov transiti... 针对柔性直流输电系统(voltage source converter based high voltage direct current transmission,VSC-HVDC)控制参数设计过程中存在的鲁棒性差、依赖已知电路参数、工程设计经验化等问题,提出一种基于马尔科夫转换场(Markov transition field,MTF)与深度确定性策略梯度算法(deep deterministic policy gradient,DDPG)结合的鲁棒性强、不依赖电路参数特性以及可视化的VSC-HVDC控制参数优化设计方法。首先,采用马尔科夫转换场将电路功率、电压等一维时序波形数据转换为二维马尔科夫转换场域图像并使用马尔科夫转换场损失函数(Markov transition field loss,MTFL)判断二维转换域图的数据波动性;其次,将MTFL损失函数与DDPG算法相结合,综合利用MTFL损失函数对系统输出时序数据动态特性评价能力更强的优点和DDPG算法泛化性能优秀的特点,实现VSC-HVDC系统控制参数优化;最后,通过MATLAB模拟和实验结果验证该方法的有效性。 展开更多
关键词 柔性直流输电 控制参数优化 马尔科夫转换场损失函数 马尔科夫转换场 深度确定性策略梯度算法
原文传递
基于深度强化学习的多无人车协同路径规划方法
16
作者 戴晟潭 王寅 尚晨晨 《北京航空航天大学学报》 北大核心 2026年第2期541-550,共10页
为解决多无人车系统中的协同路径规划问题,利用深度强化学习方法,设计了一种高效的路径规划框架。构建基于双轮差速无人车的运动学模型和协同避障场景的数学模型;在此基础上,进一步分析深度强化学习在处理高维度状态空间和连续动作空间... 为解决多无人车系统中的协同路径规划问题,利用深度强化学习方法,设计了一种高效的路径规划框架。构建基于双轮差速无人车的运动学模型和协同避障场景的数学模型;在此基础上,进一步分析深度强化学习在处理高维度状态空间和连续动作空间等复杂动态场景时训练速度慢、采样效率低和适应能力差的机理,为多无人车协同路径规划研究提供理论基础。针对全部可观测条件下多无人车协同路径规划避障围捕的策略生成问题,提出改进双延迟深度确定性策略梯度(AE-TD3)算法,在围捕无人车输出的动作上添加来自高斯分布的随机噪声,并权衡探索或利用输出动作,使围捕无人车在未知环境中能更有效地探索,实现多无人车高效稳定的协同避障围捕。仿真实验表明,改进算法相较于双延迟深度确定性策略梯度(TD3)算法,平均奖励的收敛速度更快,围捕时间缩短16.7%,验证了改进算法的可行性。 展开更多
关键词 路径规划 协同避障和围捕 深度强化学习 双延迟深度确定性策略梯度算法 动作增强探索策略
原文传递
基于CEEMD-Attention的深度强化学习应用于超短期光伏功率预测
17
作者 张颖 丁正凯 +3 位作者 陈建平 陆悠 王蕴哲 傅启明 《计算机应用与软件》 北大核心 2026年第3期112-118,138,共8页
针对光伏电站输出功率的随机性和间歇性的特点,提出一种基于互补集合经验模态分解(CEEMD)和注意力机制(Attention)的深度确定性策略梯度(DDPG)模型(CA-DDPG)。通过CEEMD方法将光伏发电功率分解为稳定的子序列;利用Attention机制对输入... 针对光伏电站输出功率的随机性和间歇性的特点,提出一种基于互补集合经验模态分解(CEEMD)和注意力机制(Attention)的深度确定性策略梯度(DDPG)模型(CA-DDPG)。通过CEEMD方法将光伏发电功率分解为稳定的子序列;利用Attention机制对输入状态赋予不同的权重以获取关键信息;使用DDPG算法进行训练和优化,获得最优策略。选用1B DKASC、Alice Springs光伏系统中的数据对不同的预测模型进行分析,实验结果表明,该模型的预测精准度优于基线模型。 展开更多
关键词 光伏功率预测 深度强化学习 互补集合经验模态分解 注意力机制 深度确定性策略梯度
在线阅读 下载PDF
基于STL-MTL-DDPG自适应动态组合的综合能源系统多元负荷短期预测
18
作者 张玉敏 孙猛 +3 位作者 吉兴全 杨明 叶平峰 孟祥剑 《高电压技术》 北大核心 2026年第3期1178-1187,I0033-I0036,共14页
针对不同时刻多元负荷间耦合系数的峰谷变化以及气象因素对多元负荷变化的感应差异对多任务学习(multi-task learning,MTL)预测模型精度的影响,提出一种基于单任务学习(single-task learning,STL)-MTL-深度确定性策略梯度算法(deep dete... 针对不同时刻多元负荷间耦合系数的峰谷变化以及气象因素对多元负荷变化的感应差异对多任务学习(multi-task learning,MTL)预测模型精度的影响,提出一种基于单任务学习(single-task learning,STL)-MTL-深度确定性策略梯度算法(deep deterministic policy gradient,DDPG)的自适应动态组合综合能源系统多元负荷短期预测方法。首先,明晰电、冷、热多元负荷的峰谷特征指标,基于皮尔森相关系数逐时刻提取气象数据与电、冷、热负荷间的耦合关系,为后续预测模型构建提供有效的数据支撑;其次,为解决MTL模型存在耦合特征峰谷差的问题,构建STL模型相对独立的提取负荷数据自时序特征,同时解决环境因素对多元负荷变化的灵敏度干扰;然后,对MTL与STL预测数据进行重构,采用DDPG构建动态权重自适应分配模型,拟合子模型预测结果,感知外部环境变化,实现预测模型的逐时刻最优组合。最后,以亚利桑那州立大学Tempe校区综合能源系统为例进行验证,结果表明,模型电、冷、热负荷预测结果的平均绝对百分比误差分别为0.63%、0.98%和1.08%,预测精度相比其他预测模型具有一定提升。 展开更多
关键词 综合能源系统 多任务学习 单任务学习 深度确定性策略梯度算法 负荷预测
原文传递
基于分布式联邦强化学习的多区域综合能源系统优化调度
19
作者 朱新文 王家奇 +3 位作者 李生炜 林文杰 吴祥 郭方洪 《电力自动化设备》 北大核心 2026年第4期94-102,共9页
针对传统优化方法及集中式联邦强化学习在隐私保护和计算效率方面存在的局限性,提出一种基于分布式联邦强化学习的多区域综合能源系统优化调度方法。每个区域综合能源系统由单独智能体管理,各智能体通过双延迟确定性策略梯度算法优化本... 针对传统优化方法及集中式联邦强化学习在隐私保护和计算效率方面存在的局限性,提出一种基于分布式联邦强化学习的多区域综合能源系统优化调度方法。每个区域综合能源系统由单独智能体管理,各智能体通过双延迟确定性策略梯度算法优化本地Critic网络的参数,并与邻域智能体进行参数信息交互,无需额外的中央服务器即可高效管理能量调度。为了保证全局最优,参数交互的权重系数由双随机矩阵元素确定。算例分析结果表明,所提方法能在增强对综合能源系统隐私保护的同时,展现出良好的收敛性能,有效降低了运营成本。 展开更多
关键词 综合能源系统 优化调度 分布式联邦强化学习 智能体 双延迟确定性策略梯度算法
在线阅读 下载PDF
基于知识嵌入型深度强化学习的电力系统频率紧急控制方法
20
作者 李佳旭 吴俊勇 +2 位作者 史法顺 张振远 李栌苏 《电力系统自动化》 北大核心 2026年第1期97-107,共11页
随着新型电力系统建设的快速推进,电力系统频率安全面临的挑战愈发严峻,当系统发生故障导致频率失稳时,采取紧急控制恢复频率稳定至关重要。文中提出一种基于知识嵌入型深度强化学习(DRL)的电力系统频率紧急控制方法。首先,将频率紧急... 随着新型电力系统建设的快速推进,电力系统频率安全面临的挑战愈发严峻,当系统发生故障导致频率失稳时,采取紧急控制恢复频率稳定至关重要。文中提出一种基于知识嵌入型深度强化学习(DRL)的电力系统频率紧急控制方法。首先,将频率紧急控制问题转化为马尔可夫模型,以仿真系统为强化学习环境,并基于深度确定性策略梯度(DDPG)算法构建深度强化学习智能体。此外,通过理论知识引导动作空间优化,综合考虑高频切机与低频减载两类场景。最后,在IEEE 39节点系统中进行控制效果测试,结果表明:深度强化学习智能体能够给出有效的频率紧急控制策略,维护系统频率安全;知识嵌入的方法改善了模型的训练稳定性,能显著提高智能体的策略学习效率与决策质量。 展开更多
关键词 人工智能 新型电力系统 频率安全 频率紧急控制 深度强化学习 深度确定性策略梯度 高频切机 低频减载
在线阅读 下载PDF
上一页 1 2 22 下一页 到第
使用帮助 返回顶部