期刊文献+
共找到267篇文章
< 1 2 14 >
每页显示 20 50 100
Dynamic hedging of 50ETF options using Proximal Policy Optimization
1
作者 Lei Liu Mengmeng Hao Jinde Cao 《Journal of Automation and Intelligence》 2025年第3期198-206,共9页
This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the cha... This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness. 展开更多
关键词 B-S model Option hedging Reinforcement learning 50ETF proximal policy optimization(PPO)
在线阅读 下载PDF
Gait Learning Reproduction for Quadruped Robots Based on Experience Evolution Proximal Policy Optimization
2
作者 LI Chunyang ZHU Xiaoqing +2 位作者 RUAN Xiaogang LIU Xinyuan ZHANG Siyuan 《Journal of Shanghai Jiaotong university(Science)》 2025年第6期1125-1133,共9页
Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch ... Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch due to problems such as reward sparsity.To solve the problem,we propose a experience evolution proximal policy optimization(EEPPO)algorithm which integrates PPO with priori knowledge highlighting by evolutionary strategy.We use the successful trained samples as priori knowledge to guide the learning direction in order to increase the success probability of the learning algorithm.To verify the effectiveness of the proposed EEPPO algorithm,we have conducted simulation experiments of the quadruped robot gait learning task on Pybullet.Experimental results show that the central pattern generator based radial basis function(CPG-RBF)network and the policy network are simultaneously updated to achieve the quadruped robot’s bionic diagonal trot gait learning task using key information such as the robot’s speed,posture and joints information.Experimental comparison results with the traditional soft actor-critic(SAC)algorithm validate the superiority of the proposed EEPPO algorithm,which can learn a more stable diagonal trot gait in flat terrain. 展开更多
关键词 quadruped robot proximal policy optimization(PPO) priori knowledge evolutionary strategy bionic gait learning
原文传递
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:5
3
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system multi-agent system Distributed optimization multi-agent deep deterministic policy gradient Real-time optimization decision
在线阅读 下载PDF
Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning 被引量:3
4
作者 Xiaoxiao Zhao Peng Yi Li Li 《Control Theory and Technology》 EI CSCD 2020年第4期362-378,共17页
This paper studies a distributed policy evaluation in multi-agent reinforcement learning.Under cooperative settings,each agent only obtains a local reward,while all agents share a common environmental state.To optimiz... This paper studies a distributed policy evaluation in multi-agent reinforcement learning.Under cooperative settings,each agent only obtains a local reward,while all agents share a common environmental state.To optimize the global return as the sum of local return,the agents exchange information with their neighbors through a communication network.The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint;then,a distributed alternating directions method of multipliers(ADMM)algorithm is proposed to solve it.Furthermore,an inexact step for ADMM is used to achieve efficient computation at each iteration.The convergence of the proposed algorithm is established.yipeng@tongji.edu.cn;LilLi received the B.Sc.and M.Se.degrees from Shengyang Agri-culture University,China in 1996 and 1999.respectivly.and the Ph.D.degree from Shenyang Institute of Automation,Chinese Academy of Science,in 2003.She joined Tongji Universitry,Shanghai,China,in 2003,and is now a professor at the Depart-ment of Control Science and Engineering.Her research inter-ests are in data-driven modeling and opimization,computaional intelligence. 展开更多
关键词 multi-agent system Reinforcement learning Distributed optimization policy evaluation
原文传递
Optimization Scheduling of Hydrogen-Coupled Electro-Heat-Gas Integrated Energy System Based on Generative Adversarial Imitation Learning
5
作者 Baiyue Song Chenxi Zhang +1 位作者 Wei Zhang Leiyu Wan 《Energy Engineering》 2025年第12期4919-4945,共27页
Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most ... Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most promising directions of development.This paper proposes an optimized schedulingmodel for a hydrogen-coupled electro-heat-gas integrated energy system(HCEHG-IES)using generative adversarial imitation learning(GAIL).The model aims to enhance renewable-energy absorption,reduce carbon emissions,and improve grid-regulation flexibility.First,the optimal scheduling problem of HCEHG-IES under uncertainty is modeled as a Markov decision process(MDP).To overcome the limitations of conventional deep reinforcement learning algorithms—including long optimization time,slow convergence,and subjective reward design—this study augments the PPO algorithm by incorporating a discriminator network and expert data.The newly developed algorithm,termed GAIL,enables the agent to perform imitation learning from expert data.Based on this model,dynamic scheduling decisions are made in continuous state and action spaces,generating optimal energy-allocation and management schemes.Simulation results indicate that,compared with traditional reinforcement-learning algorithms,the proposed algorithmoffers better economic performance.Guided by expert data,the agent avoids blind optimization,shortens the offline training time,and improves convergence performance.In the online phase,the algorithm enables flexible energy utilization,thereby promoting renewable-energy absorption and reducing carbon emissions. 展开更多
关键词 Hydrogen energy optimization dispatch generative adversarial imitation learning proximal policy optimization imitation learning renewable energy
在线阅读 下载PDF
Multi-agent reinforcement learning for edge information sharing in vehicular networks 被引量:3
6
作者 Ruyan Wang Xue Jiang +5 位作者 Yujie Zhou Zhidu Li Dapeng Wu Tong Tang Alexander Fedotov Vladimir Badenko 《Digital Communications and Networks》 SCIE CSCD 2022年第3期267-277,共11页
To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape... To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments. 展开更多
关键词 Vehicular networks Edge information sharing Delay guarantee multi-agent reinforcement learning proximal policy optimization
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
7
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding proximal policy optimization
在线阅读 下载PDF
A novel trajectories optimizing method for dynamic soaring based on deep reinforcement learning
8
作者 Wanyong Zou Ni Li +2 位作者 Fengcheng An Kaibo Wang Changyin Dong 《Defence Technology(防务技术)》 2025年第4期99-108,共10页
Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soar... Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soaring trajectory is crucial for maximizing energy efficiency during flight.Existing nonlinear programming methods are heavily dependent on the choice of initial values which is hard to determine.Therefore,this paper introduces a deep reinforcement learning method based on a differentially flat model for dynamic soaring trajectory planning and optimization.Initially,the gliding trajectory is parameterized using Fourier basis functions,achieving a flexible trajectory representation with a minimal number of hyperparameters.Subsequently,the trajectory optimization problem is formulated as a dynamic interactive process of Markov decision-making.The hyperparameters of the trajectory are optimized using the Proximal Policy Optimization(PPO2)algorithm from deep reinforcement learning(DRL),reducing the strong reliance on initial value settings in the optimization process.Finally,a comparison between the proposed method and the nonlinear programming method reveals that the trajectory generated by the proposed approach is smoother while meeting the same performance requirements.Specifically,the proposed method achieves a 34%reduction in maximum thrust,a 39.4%decrease in maximum thrust difference,and a 33%reduction in maximum airspeed difference. 展开更多
关键词 Dynamic soaring Differential flatness Trajectory optimization proximal policy optimization
在线阅读 下载PDF
Proximal policy optimization with an integral compensator for quadrotor control 被引量:6
9
作者 Huan HU Qing-ling WANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第5期777-795,共19页
We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled ... We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled by four learned neural networks,which directly map the system states to control commands in an end-to-end style.By introducing an integral compensator into the actor-critic framework,the speed tracking accuracy and robustness have been greatly enhanced.In addition,a two-phase learning scheme which includes both offline-and online-learning is developed for practical use.A model with strong generalization ability is learned in the offline phase.Then,the flight policy of the model is continuously optimized in the online learning phase.Finally,the performances of our proposed algorithm are compared with those of the traditional PID algorithm. 展开更多
关键词 Reinforcement learning proximal policy optimization Quadrotor control Neural network
原文传递
Distributed Policy Gradient with Variance Reduction in Multi-Agent Reinforcement Learning 被引量:1
10
作者 ZHAO Xiaoxiao LEI Jinlong +2 位作者 LI Li BUSONIU Lucian XU Jia 《Journal of Systems Science & Complexity》 2025年第5期1853-1886,共34页
This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning(MARL),where agents communicating over a network aim to find an optimal policy that maximizes the average of all the ... This paper studies a distributed policy gradient in collaborative multi-agent reinforcement learning(MARL),where agents communicating over a network aim to find an optimal policy that maximizes the average of all the agents'local returns.To address the challenges of high variance and bias in stochastic policy gradients for MARL,this paper proposes a distributed policy gradient method with variance reduction,combined with gradient tracking to correct the bias resulting from the difference between local and global gradients.The authors also utilize importance sampling to solve the distribution shift problem in the sampling process.The authors then show that the proposed algorithm finds anε-approximate stationary point,where the convergence depends on the number of iterations,the mini-batch size,the epoch size,the problem parameters,and the network topology.The authors further establish the sample and communication complexity to obtain anε-approximate stationary point.Finally,numerical experiments are performed to validate the effectiveness of the proposed algorithm. 展开更多
关键词 Distributed optimization gradient tracking multi-agent systems policy gradient variance reduction
原文传递
基于生成对抗网络修正的源网荷储协同优化调度 被引量:3
11
作者 谢桦 李凯 +3 位作者 郄靖彪 张沛 王珍意 路学刚 《中国电机工程学报》 北大核心 2025年第5期1668-1679,I0003,共13页
大规模风光可再生能源发电并网给电力系统带来强不确定性,使得系统全局优化决策面临挑战,该文提出基于生成对抗网络(generative adversarial networks,GAN)修正的源网荷储协同优化调度策略设计方法。首先,考虑新型电力系统中各类可调节... 大规模风光可再生能源发电并网给电力系统带来强不确定性,使得系统全局优化决策面临挑战,该文提出基于生成对抗网络(generative adversarial networks,GAN)修正的源网荷储协同优化调度策略设计方法。首先,考虑新型电力系统中各类可调节资源的运行特性,构建基于近端策略优化(proximal policy optimization,PPO)算法的源网荷储协同优化调度模型;其次,引入GAN对PPO算法的优势函数进行修正,减少价值函数的方差,提高智能体探索效率;然后,GAN中的判别器结合专家策略指导生成器生成调度策略;最后,判别器与生成器不断对抗寻找纳什均衡点,得到优化调度策略。算例分析表明,设计的源网荷储协同的日内优化调度策略,采用GAN修正的PPO算法,相较于传统的PPO算法缩短了训练过程的收敛时间,在线控制提升了可再生能源消纳能力。 展开更多
关键词 源网荷储协同 生成对抗网络 近端策略优化算法 优化调度 可再生能源消纳
原文传递
基于近端策略优化的两栖无人平台路径规划算法研究 被引量:1
12
作者 左哲 覃卫 +2 位作者 徐梓洋 李寓安 陈泰然 《北京理工大学学报》 EI CAS 北大核心 2025年第1期19-25,共7页
为解决水陆两栖无人平台在复杂环境中的路径规划问题,针对传统方法难以应对动态障碍物和多变环境的局限性,提出了一种基于近端策略优化(PPO)的路径规划算法,包含四种感知信息输入方案以及速度强化奖励函数,适应动态和静态环境.该算法通... 为解决水陆两栖无人平台在复杂环境中的路径规划问题,针对传统方法难以应对动态障碍物和多变环境的局限性,提出了一种基于近端策略优化(PPO)的路径规划算法,包含四种感知信息输入方案以及速度强化奖励函数,适应动态和静态环境.该算法通过批次函数正则化、策略熵引入和自适应裁剪因子,显著提升了算法的收敛速度和稳定性.研究中采用了ROS仿真平台,结合Flatland物理引擎和PedSim插件,模拟了包含动态障碍物的多种复杂场景.实验结果表明,采用BEV+V状态空间输入结构和离散动作空间的两栖无人平台,在路径规划中展现出高成功率和低超时率,优于传统方法和其他方案.仿真和对比实验显示采用鸟瞰图与速度组合的状态空间数据结构配合速度强化奖励函数算法提高了性能,收敛速度提高25.58%,路径规划成功率提升25.54%,超时率下降13.73%. 展开更多
关键词 路径规划 两栖 无人平台 近端策略优化(PPO)
在线阅读 下载PDF
基于分布式双层强化学习的区域综合能源系统多时间尺度优化调度 被引量:1
13
作者 张薇 王浚宇 +1 位作者 杨茂 严干贵 《电工技术学报》 北大核心 2025年第11期3529-3544,共16页
考虑异质能源在网络中的流动时间差异性,提升系统设备在不同时间尺度下调控的灵活性,是实现区域综合能源系统(RIES)多时间尺度优化调度的关键。为此,该文提出一种面向冷-热-电RIES的分布式双层近端策略优化(DBLPPO)调度模型。首先将RIE... 考虑异质能源在网络中的流动时间差异性,提升系统设备在不同时间尺度下调控的灵活性,是实现区域综合能源系统(RIES)多时间尺度优化调度的关键。为此,该文提出一种面向冷-热-电RIES的分布式双层近端策略优化(DBLPPO)调度模型。首先将RIES内部能源的出力、储存和转换构建高维空间的马尔可夫决策过程数学模型;其次基于改进的分布式近端策略优化算法对其进行序贯决策描述,构建内部双层近端策略优化(PPO)的控制模型,局部网络采用“先耦合-再解耦”的求解思路对冷-热力系统和电力系统的设备进行多时间尺度优化决策,最终实现RIES冷-热力系统与电力系统的多时间尺度调度和协同优化运行;最后仿真结果表明,所提模型不仅能克服深度强化学习算法在复杂随机场景下的“维数灾难”问题,实现RIES各能源网络在不同时间尺度下的协同优化管理,还能加快模型的最优决策求解速度,提高系统运行的经济效益。 展开更多
关键词 区域综合能源系统 多时间尺度 分布式双层近端策略优化 深度强化学习 协同优化管理 经济效益
在线阅读 下载PDF
基于VSG的风光水火储系统频率调节深度强化学习方法 被引量:2
14
作者 刘晓明 刘俊 +3 位作者 姚宏伟 赵誉 聂永欣 任柯政 《电力系统自动化》 北大核心 2025年第9期114-124,共11页
由于可再生能源发电固有的不确定性和低惯量特性,随着分布式能源(DER)的快速发展,电力系统正面临显著的系统频率动态恶化。为解决这一问题,使DER能够模拟传统同步发电机运行的虚拟同步发电机(VSG)技术已被开发并得到广泛关注。然而,现... 由于可再生能源发电固有的不确定性和低惯量特性,随着分布式能源(DER)的快速发展,电力系统正面临显著的系统频率动态恶化。为解决这一问题,使DER能够模拟传统同步发电机运行的虚拟同步发电机(VSG)技术已被开发并得到广泛关注。然而,现有研究主要集中于采用固定参数运行VSG以提供惯量支撑,而很少关注动态调整VSG以利用其快速响应特性来提升电力系统的频率响应性能。因此,文中提出一种基于深度强化学习的频率调节(DRL-FR)方法,该方法可自适应调整一、二次调频及VSG的动态参数。首先,构建频率调节模型,并将新能源电厂建模为可调VSG,将最优频率调节问题建模为马尔可夫决策过程。然后,构建DRL-FR控制器,其动作空间为一、二次调频动态参数,涵盖下垂控制、比例-积分-微分控制、机组参与系数及VSG可调参数。最后,开发了一种结合单调优势重加权模仿学习的近端策略优化算法,可结合历史运行数据及专家经验加速模型训练过程。在改造的IEEE 39节点系统中进行测试,验证了所提DRL-FR方法的有效性。 展开更多
关键词 深度强化学习 虚拟同步发电机 频率调节 模仿学习 近端策略优化
在线阅读 下载PDF
基于改进近端策略优化算法的柔性作业车间调度 被引量:2
15
作者 王艳红 付威通 +2 位作者 张俊 谭园园 田中大 《控制与决策》 北大核心 2025年第6期1883-1891,共9页
柔性作业车间调度是经典且复杂的组合优化问题,对于离散制造系统的生产优化具有重要的理论和实际意义.基于多指针图网络框架和近端策略优化算法设计一种求解柔性作业车间调度问题的深度强化学习算法.首先,将“工序-机器”分配调度过程... 柔性作业车间调度是经典且复杂的组合优化问题,对于离散制造系统的生产优化具有重要的理论和实际意义.基于多指针图网络框架和近端策略优化算法设计一种求解柔性作业车间调度问题的深度强化学习算法.首先,将“工序-机器”分配调度过程表征成由选择工序和分配机器两类动作构成的马尔可夫决策过程;其次,通过解耦策略解除动作之间的耦合关系,并设计新的损失函数和贪婪采样策略以提高算法的验证推理能力;在此基础上扩充状态空间,使评估网络能够更全面地感知与评估,从而进一步提升算法的学习和决策能力.在随机生成算例及基准算例上进行仿真和对比分析,验证算法的良好性能及泛化能力. 展开更多
关键词 柔性作业车间调度 近端策略优化算法 双动作耦合网络 损失函数优化 贪婪采样 深度强化学习
原文传递
基于深度强化学习算法的分布式光伏-EV互补系统智能调度 被引量:1
16
作者 陈宁 李法社 +3 位作者 王霜 张慧聪 唐存靖 倪梓皓 《高电压技术》 北大核心 2025年第3期1454-1463,共10页
针对分布式光伏与电动汽车(electric vehicle,EV)大规模接入电网将对电力系统造成冲击的问题,通过建立分布式光伏-EV互补调度模型,以平抑光伏并网波动、增加EV用户经济性为目标,考虑光伏出力的随机性、负荷功率波动、EV接入时间及电量... 针对分布式光伏与电动汽车(electric vehicle,EV)大规模接入电网将对电力系统造成冲击的问题,通过建立分布式光伏-EV互补调度模型,以平抑光伏并网波动、增加EV用户经济性为目标,考虑光伏出力的随机性、负荷功率波动、EV接入时间及电量随机性、实时电价、电池老化成本等因素,提出采用梯度随机扰动的改进型近端策略优化算法(gradient random perturbation-proximal policy optimization algorithm,GRP-PPO)进行求解,通过对模型目标函数的调整,得到基于不同优化目标的2种实时运行策略。通过算例可知,实时调度策略可有效地平抑并网点功率波动,调度效果较传统PPO算法提高了3.48%;策略一以用户的出行需求及平抑并网点功率波动为首要目标,能够保证用户的24h用车需求,同时并网点功率稳定率达到91.84%;策略二以用户经济效益为首要优化目标,全天参与调度的EV收益可达82.6元,可起到鼓励用户参与调度的目的。 展开更多
关键词 分布式光伏 电动汽车 V2G 深度强化学习 实时调度 近端策略优化
原文传递
基于近端策略优化算法的电力系统多类型储能爬坡功率分配策略 被引量:1
17
作者 王杰 苗世洪 +3 位作者 王廷涛 姚福星 励刚 汤伟 《高电压技术》 北大核心 2025年第9期4796-4806,I0020-I0025,共17页
随着新能源发电比例不断提高,由此引发的短时大规模功率爬坡事件愈加频繁,因此研究多类型储能爬坡功率分配策略对防范极端爬坡风险、保障系统稳定运行具有重要意义。该文提出一种面向紧急爬坡需求的多类型储能功率优化分配策略,引入深... 随着新能源发电比例不断提高,由此引发的短时大规模功率爬坡事件愈加频繁,因此研究多类型储能爬坡功率分配策略对防范极端爬坡风险、保障系统稳定运行具有重要意义。该文提出一种面向紧急爬坡需求的多类型储能功率优化分配策略,引入深度强化学习(deep reinforcement learning,DRL)方法以兼顾功率分配的准确性与时效性。首先,以绝热压缩空气储能(adiabatic compressed air energy storage,A-CAES)、风电联合储能、火电联合飞轮储能为代表,分析多类型储能的爬坡互补特性,重点研究A-CAES的非线性热动-气动耦合特征及风储系统的风机转子动能瞬态响应行为,并据此构建多类型储能爬坡功率响应模型;其次,将功率优化分配问题转化为适合DRL的马尔可夫决策过程,并引入学习率动态衰减、策略熵以及状态归一化等训练机制,提出基于近端策略优化算法的电力系统多类型储能爬坡功率分配策略;最后,在多种爬坡场景下开展算例分析。结果表明,所提分配策略能够充分发挥各类储能的调控优势,提高爬坡功率分配的灵活性、精准性、时效性。 展开更多
关键词 近端策略优化算法 多类型储能 功率优化分配 爬坡场景 深度强化学习 绝热压缩空气储能
原文传递
基于深度强化学习的游戏智能引导算法 被引量:2
18
作者 白天 吕璐瑶 +1 位作者 李储 何加亮 《吉林大学学报(理学版)》 北大核心 2025年第1期91-98,共8页
针对传统游戏智能体算法存在模型输入维度大及训练时间长的问题,提出一种结合状态信息转换与奖励函数塑形技术的新型深度强化学习游戏智能引导算法.首先,利用Unity引擎提供的接口直接读取游戏后台信息,以有效压缩状态空间的维度,减少输... 针对传统游戏智能体算法存在模型输入维度大及训练时间长的问题,提出一种结合状态信息转换与奖励函数塑形技术的新型深度强化学习游戏智能引导算法.首先,利用Unity引擎提供的接口直接读取游戏后台信息,以有效压缩状态空间的维度,减少输入数据量;其次,通过精细化设计奖励机制,加速模型的收敛过程;最后,从主观定性和客观定量两方面对该算法模型与现有方法进行对比实验,实验结果表明,该算法不仅显著提高了模型的训练效率,还大幅度提高了智能体的性能. 展开更多
关键词 深度强化学习 游戏智能体 奖励函数塑形 近端策略优化算法
在线阅读 下载PDF
一种改进的自适应近端策略优化算法 被引量:1
19
作者 王慧 李虹 +1 位作者 何秋生 李占龙 《计算机仿真》 2025年第3期404-409,436,共7页
针对传统的近端策略优化(PPO)惩罚算法在训练过程中存在收敛性不好的问题,提出一种改进的PPO惩罚算法。通过将基于常量自适应更新惩罚系数的方法改为基于函数自适应更新的方法,使惩罚系数与散度相关联,并随着散度的变化以一定的趋势发... 针对传统的近端策略优化(PPO)惩罚算法在训练过程中存在收敛性不好的问题,提出一种改进的PPO惩罚算法。通过将基于常量自适应更新惩罚系数的方法改为基于函数自适应更新的方法,使惩罚系数与散度相关联,并随着散度的变化以一定的趋势发生改变,从而改善算法的收敛性和学习的可靠性,上述方法使得算法更加灵活且适应性更强。经仿真验证,改进的PPO惩罚算法在收敛性和学习可靠性方面优于传统的PPO惩罚算法,并使用分布式PPO算法进一步验证了改进方法的有效性,为后续强化学习算法的研究提供了新的思路和方法。 展开更多
关键词 强化学习 近端策略优化 自适应惩罚系数
在线阅读 下载PDF
基于图神经网络的SDN路由算法优化 被引量:1
20
作者 张晓莉 汤颖琪 宋婉莹 《电讯技术》 北大核心 2025年第1期18-24,共7页
针对现有路由方案不适合学习图形结构信息,对陌生拓扑适应性不佳的问题,提出了一种基于图神经网络的软件定义网络(Software Defined Network,SDN)路由算法G-PPO。引入近端策略优化(Proximal Policy Optimization,PPO)强化学习算法实现... 针对现有路由方案不适合学习图形结构信息,对陌生拓扑适应性不佳的问题,提出了一种基于图神经网络的软件定义网络(Software Defined Network,SDN)路由算法G-PPO。引入近端策略优化(Proximal Policy Optimization,PPO)强化学习算法实现模型训练,利用消息传递神经网络(Massage Passing Neural Network,MPNN)对网络拓扑进行学习,通过调整链路权重完成路由路径的调整。G-PPO将图神经网络对网络拓扑信息的感知能力和深度强化学习的自主学习能力有效结合,提升路由策略的性能。实验结果表明,与相关算法比较,所提算法的平均时延和丢包率、网络链路利用率和吞吐量指标均为最优。在3种不同拓扑上,该算法较其他算法最少提升10.5%吞吐量,最多提升95.6%丢包率,表明所提算法具有更好的适应不同网络拓扑的能力。 展开更多
关键词 软件定义网络 路由优化 图神经网络 深度强化学习 近端策略优化
在线阅读 下载PDF
上一页 1 2 14 下一页 到第
使用帮助 返回顶部