期刊文献+
共找到296篇文章
< 1 2 15 >
每页显示 20 50 100
Dynamic hedging of 50ETF options using Proximal Policy Optimization
1
作者 Lei Liu Mengmeng Hao Jinde Cao 《Journal of Automation and Intelligence》 2025年第3期198-206,共9页
This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the cha... This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness. 展开更多
关键词 B-S model Option hedging Reinforcement learning 50ETF proximal policy optimization(PPO)
在线阅读 下载PDF
Gait Learning Reproduction for Quadruped Robots Based on Experience Evolution Proximal Policy Optimization
2
作者 LI Chunyang ZHU Xiaoqing +2 位作者 RUAN Xiaogang LIU Xinyuan ZHANG Siyuan 《Journal of Shanghai Jiaotong university(Science)》 2025年第6期1125-1133,共9页
Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch ... Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch due to problems such as reward sparsity.To solve the problem,we propose a experience evolution proximal policy optimization(EEPPO)algorithm which integrates PPO with priori knowledge highlighting by evolutionary strategy.We use the successful trained samples as priori knowledge to guide the learning direction in order to increase the success probability of the learning algorithm.To verify the effectiveness of the proposed EEPPO algorithm,we have conducted simulation experiments of the quadruped robot gait learning task on Pybullet.Experimental results show that the central pattern generator based radial basis function(CPG-RBF)network and the policy network are simultaneously updated to achieve the quadruped robot’s bionic diagonal trot gait learning task using key information such as the robot’s speed,posture and joints information.Experimental comparison results with the traditional soft actor-critic(SAC)algorithm validate the superiority of the proposed EEPPO algorithm,which can learn a more stable diagonal trot gait in flat terrain. 展开更多
关键词 quadruped robot proximal policy optimization(PPO) priori knowledge evolutionary strategy bionic gait learning
原文传递
Distributed optimization of electricity-Gas-Heat integrated energy system with multi-agent deep reinforcement learning 被引量:5
3
作者 Lei Dong Jing Wei +1 位作者 Hao Lin Xinying Wang 《Global Energy Interconnection》 EI CAS CSCD 2022年第6期604-617,共14页
The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high co... The coordinated optimization problem of the electricity-gas-heat integrated energy system(IES)has the characteristics of strong coupling,non-convexity,and nonlinearity.The centralized optimization method has a high cost of communication and complex modeling.Meanwhile,the traditional numerical iterative solution cannot deal with uncertainty and solution efficiency,which is difficult to apply online.For the coordinated optimization problem of the electricity-gas-heat IES in this study,we constructed a model for the distributed IES with a dynamic distribution factor and transformed the centralized optimization problem into a distributed optimization problem in the multi-agent reinforcement learning environment using multi-agent deep deterministic policy gradient.Introducing the dynamic distribution factor allows the system to consider the impact of changes in real-time supply and demand on system optimization,dynamically coordinating different energy sources for complementary utilization and effectively improving the system economy.Compared with centralized optimization,the distributed model with multiple decision centers can achieve similar results while easing the pressure on system communication.The proposed method considers the dual uncertainty of renewable energy and load in the training.Compared with the traditional iterative solution method,it can better cope with uncertainty and realize real-time decision making of the system,which is conducive to the online application.Finally,we verify the effectiveness of the proposed method using an example of an IES coupled with three energy hub agents. 展开更多
关键词 Integrated energy system multi-agent system Distributed optimization multi-agent deep deterministic policy gradient Real-time optimization decision
在线阅读 下载PDF
Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning 被引量:3
4
作者 Xiaoxiao Zhao Peng Yi Li Li 《Control Theory and Technology》 EI CSCD 2020年第4期362-378,共17页
This paper studies a distributed policy evaluation in multi-agent reinforcement learning.Under cooperative settings,each agent only obtains a local reward,while all agents share a common environmental state.To optimiz... This paper studies a distributed policy evaluation in multi-agent reinforcement learning.Under cooperative settings,each agent only obtains a local reward,while all agents share a common environmental state.To optimize the global return as the sum of local return,the agents exchange information with their neighbors through a communication network.The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint;then,a distributed alternating directions method of multipliers(ADMM)algorithm is proposed to solve it.Furthermore,an inexact step for ADMM is used to achieve efficient computation at each iteration.The convergence of the proposed algorithm is established.yipeng@tongji.edu.cn;LilLi received the B.Sc.and M.Se.degrees from Shengyang Agri-culture University,China in 1996 and 1999.respectivly.and the Ph.D.degree from Shenyang Institute of Automation,Chinese Academy of Science,in 2003.She joined Tongji Universitry,Shanghai,China,in 2003,and is now a professor at the Depart-ment of Control Science and Engineering.Her research inter-ests are in data-driven modeling and opimization,computaional intelligence. 展开更多
关键词 multi-agent system Reinforcement learning Distributed optimization policy evaluation
原文传递
Optimization Scheduling of Hydrogen-Coupled Electro-Heat-Gas Integrated Energy System Based on Generative Adversarial Imitation Learning
5
作者 Baiyue Song Chenxi Zhang +1 位作者 Wei Zhang Leiyu Wan 《Energy Engineering》 2025年第12期4919-4945,共27页
Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most ... Hydrogen energy is a crucial support for China’s low-carbon energy transition.With the large-scale integration of renewable energy,the combination of hydrogen and integrated energy systems has become one of the most promising directions of development.This paper proposes an optimized schedulingmodel for a hydrogen-coupled electro-heat-gas integrated energy system(HCEHG-IES)using generative adversarial imitation learning(GAIL).The model aims to enhance renewable-energy absorption,reduce carbon emissions,and improve grid-regulation flexibility.First,the optimal scheduling problem of HCEHG-IES under uncertainty is modeled as a Markov decision process(MDP).To overcome the limitations of conventional deep reinforcement learning algorithms—including long optimization time,slow convergence,and subjective reward design—this study augments the PPO algorithm by incorporating a discriminator network and expert data.The newly developed algorithm,termed GAIL,enables the agent to perform imitation learning from expert data.Based on this model,dynamic scheduling decisions are made in continuous state and action spaces,generating optimal energy-allocation and management schemes.Simulation results indicate that,compared with traditional reinforcement-learning algorithms,the proposed algorithmoffers better economic performance.Guided by expert data,the agent avoids blind optimization,shortens the offline training time,and improves convergence performance.In the online phase,the algorithm enables flexible energy utilization,thereby promoting renewable-energy absorption and reducing carbon emissions. 展开更多
关键词 Hydrogen energy optimization dispatch generative adversarial imitation learning proximal policy optimization imitation learning renewable energy
在线阅读 下载PDF
Evaluating end-to-end autonomous driving architectures: a proximal policy optimization approach in simulated environments
6
作者 Angelo Morgado Kaoru Ota +1 位作者 Mianxiong Dong Nuno Pombo 《Autonomous Intelligent Systems》 2025年第1期191-205,共15页
Autonomous driving systems(ADS)are at the forefront of technological innovation,promising enhanced safety,efficiency,and convenience in transportation.This study investigates the potential of end-to-end reinforcement ... Autonomous driving systems(ADS)are at the forefront of technological innovation,promising enhanced safety,efficiency,and convenience in transportation.This study investigates the potential of end-to-end reinforcement learning(RL)architectures for ADS,specifically focusing on a Go-To-Point task involving lane-keeping and navigation through basic urban environments.The study uses the Proximal Policy Optimization(PPO)algorithm within the CARLA simulation environment.Traditional modular systems,which separate driving tasks into perception,decision-making,and control,provide interpretability and reliability in controlled scenarios but struggle with adaptability to dynamic,real-world conditions.In contrast,end-to-end systems offer a more integrated approach,potentially enhancing flexibility and decision-making cohesion.This research introduces CARLA-GymDrive,a novel framework integrating the CARLA simulator with the Gymnasium API,enabling seamless RL experimentation with both discrete and continuous action spaces.Through a two-phase training regimen,the study evaluates the efficacy of PPO in an end-to-end ADS focused on basic tasks like lane-keeping and waypoint navigation.A comparative analysis with modular architectures is also provided.The findings highlight the strengths of PPO in managing continuous control tasks,achieving smoother and more adaptable driving behaviors than value-based algorithms like Deep Q-Networks.However,challenges remain in generalization and computational demands,with end-to-end systems requiring extensive training time.While the study underscores the potential of end-to-end architectures,it also identifies limitations in scalability and real-world applicability,suggesting that modular systems may currently be more feasible for practical ADS deployment.Nonetheless,the CARLA-GymDrive framework and the insights gained from PPO-based ADS contribute significantly to the field,laying a foundation for future advancements in AD. 展开更多
关键词 Autonomous Driving Systems(ADS) End-to-End Architecture Software System Architecture proximal policy optimization(PPO) Real-Time Embedded Systems Simulation Framework
原文传递
Multi-agent reinforcement learning for edge information sharing in vehicular networks 被引量:3
7
作者 Ruyan Wang Xue Jiang +5 位作者 Yujie Zhou Zhidu Li Dapeng Wu Tong Tang Alexander Fedotov Vladimir Badenko 《Digital Communications and Networks》 SCIE CSCD 2022年第3期267-277,共11页
To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This pape... To guarantee the heterogeneous delay requirements of the diverse vehicular services,it is necessary to design a full cooperative policy for both Vehicle to Infrastructure(V2I)and Vehicle to Vehicle(V2V)links.This paper investigates the reduction of the delay in edge information sharing for V2V links while satisfying the delay requirements of the V2I links.Specifically,a mean delay minimization problem and a maximum individual delay minimization problem are formulated to improve the global network performance and ensure the fairness of a single user,respectively.A multi-agent reinforcement learning framework is designed to solve these two problems,where a new reward function is proposed to evaluate the utilities of the two optimization objectives in a unified framework.Thereafter,a proximal policy optimization approach is proposed to enable each V2V user to learn its policy using the shared global network reward.The effectiveness of the proposed approach is finally validated by comparing the obtained results with those of the other baseline approaches through extensive simulation experiments. 展开更多
关键词 Vehicular networks Edge information sharing Delay guarantee multi-agent reinforcement learning proximal policy optimization
在线阅读 下载PDF
Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning 被引量:2
8
作者 Jiawei Xia Yasong Luo +3 位作者 Zhikun Liu Yalun Zhang Haoran Shi Zhong Liu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第11期80-94,共15页
To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model wit... To solve the problem of multi-target hunting by an unmanned surface vehicle(USV)fleet,a hunting algorithm based on multi-agent reinforcement learning is proposed.Firstly,the hunting environment and kinematic model without boundary constraints are built,and the criteria for successful target capture are given.Then,the cooperative hunting problem of a USV fleet is modeled as a decentralized partially observable Markov decision process(Dec-POMDP),and a distributed partially observable multitarget hunting Proximal Policy Optimization(DPOMH-PPO)algorithm applicable to USVs is proposed.In addition,an observation model,a reward function and the action space applicable to multi-target hunting tasks are designed.To deal with the dynamic change of observational feature dimension input by partially observable systems,a feature embedding block is proposed.By combining the two feature compression methods of column-wise max pooling(CMP)and column-wise average-pooling(CAP),observational feature encoding is established.Finally,the centralized training and decentralized execution framework is adopted to complete the training of hunting strategy.Each USV in the fleet shares the same policy and perform actions independently.Simulation experiments have verified the effectiveness of the DPOMH-PPO algorithm in the test scenarios with different numbers of USVs.Moreover,the advantages of the proposed model are comprehensively analyzed from the aspects of algorithm performance,migration effect in task scenarios and self-organization capability after being damaged,the potential deployment and application of DPOMH-PPO in the real environment is verified. 展开更多
关键词 Unmanned surface vehicles multi-agent deep reinforcement learning Cooperative hunting Feature embedding proximal policy optimization
在线阅读 下载PDF
A novel trajectories optimizing method for dynamic soaring based on deep reinforcement learning
9
作者 Wanyong Zou Ni Li +2 位作者 Fengcheng An Kaibo Wang Changyin Dong 《Defence Technology(防务技术)》 2025年第4期99-108,共10页
Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soar... Dynamic soaring,inspired by the wind-riding flight of birds such as albatrosses,is a biomimetic technique which leverages wind fields to enhance the endurance of unmanned aerial vehicles(UAVs).Achieving a precise soaring trajectory is crucial for maximizing energy efficiency during flight.Existing nonlinear programming methods are heavily dependent on the choice of initial values which is hard to determine.Therefore,this paper introduces a deep reinforcement learning method based on a differentially flat model for dynamic soaring trajectory planning and optimization.Initially,the gliding trajectory is parameterized using Fourier basis functions,achieving a flexible trajectory representation with a minimal number of hyperparameters.Subsequently,the trajectory optimization problem is formulated as a dynamic interactive process of Markov decision-making.The hyperparameters of the trajectory are optimized using the Proximal Policy Optimization(PPO2)algorithm from deep reinforcement learning(DRL),reducing the strong reliance on initial value settings in the optimization process.Finally,a comparison between the proposed method and the nonlinear programming method reveals that the trajectory generated by the proposed approach is smoother while meeting the same performance requirements.Specifically,the proposed method achieves a 34%reduction in maximum thrust,a 39.4%decrease in maximum thrust difference,and a 33%reduction in maximum airspeed difference. 展开更多
关键词 Dynamic soaring Differential flatness Trajectory optimization proximal policy optimization
在线阅读 下载PDF
Proximal policy optimization with an integral compensator for quadrotor control 被引量:7
10
作者 Huan HU Qing-ling WANG 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2020年第5期777-795,共19页
We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled ... We use the advanced proximal policy optimization(PPO)reinforcement learning algorithm to optimize the stochastic control strategy to achieve speed control of the"model-free"quadrotor.The model is controlled by four learned neural networks,which directly map the system states to control commands in an end-to-end style.By introducing an integral compensator into the actor-critic framework,the speed tracking accuracy and robustness have been greatly enhanced.In addition,a two-phase learning scheme which includes both offline-and online-learning is developed for practical use.A model with strong generalization ability is learned in the offline phase.Then,the flight policy of the model is continuously optimized in the online learning phase.Finally,the performances of our proposed algorithm are compared with those of the traditional PID algorithm. 展开更多
关键词 Reinforcement learning proximal policy optimization Quadrotor control Neural network
原文传递
基于贝叶斯优化的机载智能避让系统安全性评估
11
作者 马赞 白杰 +2 位作者 闫励勤 陈勇 孙淑光 《航空学报》 北大核心 2026年第1期232-248,共17页
针对强化学习在无人机智能避让系统中应用所带来的适航安全性挑战,在SAE ARP4761标准框架下,基于贝叶斯优化理论提出一种面向无人机智能避让系统安全性评估方法。首先,基于无人机运动学模型和近端策略优化算法,建立智能避让系统模型。其... 针对强化学习在无人机智能避让系统中应用所带来的适航安全性挑战,在SAE ARP4761标准框架下,基于贝叶斯优化理论提出一种面向无人机智能避让系统安全性评估方法。首先,基于无人机运动学模型和近端策略优化算法,建立智能避让系统模型。其次,将系统模型的验证任务与贝叶斯优化理论结合,通过不确定性探索、边界细化和失效区域采样3个获取函数完成对高斯代理模型的迭代式训练,实现少量样本下智能避让系统的安全验证、安全边界确定和功能失效概率分析,支持整机/系统定量安全性评估。最后,基于典型智能感知避让系统设计架构为案例,证明该方法对适航安全性评估能够发挥有效支撑作用,可为智能避让系统的装机应用提供必要的适航符合性方法和技术保证。同时通过实验验证了在少量样本的情况下,相比于均匀采样和蒙特卡洛方法,基于贝叶斯优化的方法能够为强化学习模块提供细致的失效边界预测、精确的失效概率估计和更高的置信水平。 展开更多
关键词 强化学习 机载智能避让系统 近端策略优化算法 贝叶斯优化 适航安全性
原文传递
基于深度强化学习与非线性模型预测方法的四足机器人运动控制
12
作者 陈尧伟 吕明 张捷 《兵工学报》 北大核心 2026年第1期307-317,共11页
在四足机器人运动控制中,针对传统基于非线性模型预测控制(Nonlinear Model Predictive Control,NMPC)轨迹跟踪控制器对权重参数设计高度敏感的问题,提出一种结合深度强化学习(Deep Reinforcement Learning,DRL)与NMPC的控制方法。基于... 在四足机器人运动控制中,针对传统基于非线性模型预测控制(Nonlinear Model Predictive Control,NMPC)轨迹跟踪控制器对权重参数设计高度敏感的问题,提出一种结合深度强化学习(Deep Reinforcement Learning,DRL)与NMPC的控制方法。基于四足机器人的单刚体动力学模型,设计了参数化的NMPC轨迹跟踪控制器,用以生成最优的地面反作用力;通过训练策略网络输出控制器的决策变量,实现对NMPC轨迹跟踪控制器的动态调整。为提升学习效率,提出改进的近端策略优化算法。仿真实验结果验证了新算法在学习效率上的有效性。仿真结果表明,与传统控制方法相比,新提出的DRL-NMPC方法提升了四足机器人的轨迹跟踪能力和抗干扰能力。 展开更多
关键词 四足机器人 非线性模型预测控制 深度强化学习 近端策略优化
在线阅读 下载PDF
基于双重决策机制的深度符号回归算法
13
作者 郭泽一 李凤莲 徐利春 《计算机应用》 北大核心 2026年第2期406-415,共10页
深度符号回归(DSR)算法由循环神经网络(RNN)自动化生成表达式树,进而获得较高的模型性能,然而,它无法兼顾表达式树的准确性和结构的简洁性。因此,提出一种基于双重决策机制的深度符号回归(DDSR)算法。首先,在RNN初步决策的基础上,利用... 深度符号回归(DSR)算法由循环神经网络(RNN)自动化生成表达式树,进而获得较高的模型性能,然而,它无法兼顾表达式树的准确性和结构的简洁性。因此,提出一种基于双重决策机制的深度符号回归(DDSR)算法。首先,在RNN初步决策的基础上,利用双评分机制综合评估表达式树的结构简洁性和准确性。其次,采用强化学习对表达式树生成进行训练,将表达式树生成视为序列决策过程,并利用风险近端策略优化(RPPO)算法进行奖励反馈以更新下一批次的模型参数。在公共数据集上的实验结果表明,相较于DSR算法,DDSR算法在拟合度相关系数上最多提高了0.396,最少提高了0.001,而整体性能提升了0.116。以上证明了DDSR算法的有效性。 展开更多
关键词 符号回归 深度学习 评分机制 近端策略优化算法 风险寻优策略梯度
在线阅读 下载PDF
基于迁移深度强化学习的串列翼射流流动控制
14
作者 钱志龙 漆培龙 +1 位作者 黄振贵 何贤军 《力学学报》 北大核心 2026年第1期29-42,共14页
串列翼有前后翼共同分担升力、减小诱导阻力等优点,但复杂的尾迹干扰造成流场不稳定,影响气动性能的进一步提高.为了克服这些问题,本文提出了一个基于迁移深度强化学习的射流主动控制方案,用近端策略优化算法(proximal policy optimizat... 串列翼有前后翼共同分担升力、减小诱导阻力等优点,但复杂的尾迹干扰造成流场不稳定,影响气动性能的进一步提高.为了克服这些问题,本文提出了一个基于迁移深度强化学习的射流主动控制方案,用近端策略优化算法(proximal policy optimization,PPO)来训练智能体,通过调节翼面射流强度,达成稳定升力并减小阻力的目的,训练是在雷诺数为1000,上下翼间距h=0.5c(翼型弦长)、前后翼间距d=2c的典型工况下开展的,之后把策略迁移到4种不同的布局上:(a)h=0.5c,d=3c;(b)h=0.5c,d=4c;(c)h=-0.5c,d=2c;(d)h=0c,d=2c,以此来考察其在不同空间结构下的泛化能力和鲁棒性.结果表明:在训练工况下,前翼和后翼的升阻比分别上升了22.89%和5.37%;在迁移工况下,前翼的升阻比分别增长了17.27%,18.03%,19.35%和31.64%,后翼分别增长了4.86%,3.97%,23.68%和18.07%.而且对升力系数的功率谱作了剖析表明,此控制策略可以很好地遏制周期性涡脱落和气动效应的振荡.本研究证实了基于强化学习的迁移控制策略能够在复杂非定常流场中的应用价值以及高效率,为串列翼飞行器的高速高效主动流动控制提供了新的思路与理论支撑. 展开更多
关键词 串联翼 主动流动控制 射流激励 迁移深度强化学习 近端策略优化算法(PPO)
在线阅读 下载PDF
基于端到端深度强化学习的多订单动态柔性作业车间调度方法
15
作者 王旭 李寰 +2 位作者 韩玉艳 王玉亭 王雅坤 《聊城大学学报(自然科学版)》 2026年第2期192-204,273,I0001-I0007,共21页
针对多订单随机到达条件下的动态柔性作业车间调度问题(Dynamic Flexible Job Shop Scheduling Problem with Order Random Arrival, DFJSP_ORA),提出一种面向实际生产环境的建模与求解框架。首先构建了以最小化最大完工时间为优化目标D... 针对多订单随机到达条件下的动态柔性作业车间调度问题(Dynamic Flexible Job Shop Scheduling Problem with Order Random Arrival, DFJSP_ORA),提出一种面向实际生产环境的建模与求解框架。首先构建了以最小化最大完工时间为优化目标DFJSP_ORA的数学模型。引入流体模型对系统行为进行连续近似,从而提取关键状态特征。调度过程被建模为马尔可夫决策过程(Markov Decision Process, MDP),并采用近端策略优化(Proximal Policy Optimization, PPO)算法构建端到端的深度强化学习框架进行求解。该方法结合复合规则驱动的离散动作空间与优势函数驱动的策略优化机制,实现了对动态环境的高效决策。最后通过81个不同规模的实例,对所提方法与6种优先调度规则及3种强化学习方法进行比较,结果验证了其优越性,为DFJSP_ORA的求解提供了一种高效、灵活的解决方案。 展开更多
关键词 柔性作业车间调度 深度强化学习 近端策略优化 流体模型 最大完工时间
在线阅读 下载PDF
基于多智能体最优折中强化学习的多主体含氢综合能源系统优化调度
16
作者 彭春华 钟沂辰 +1 位作者 孙惠娟 张大权 《电网技术》 北大核心 2026年第1期71-80,I0048-I0051,共14页
为充分考虑含氢综合能源系统中异质能源动态耦合和多主体交互协同,实现多能源载体多主体协同优化运行,提出了一种基于多智能体最优折中强化学习的多主体含氢综合能源系统优化调度方法。首先,以系统运行成本最低为目标建立多主体含氢综... 为充分考虑含氢综合能源系统中异质能源动态耦合和多主体交互协同,实现多能源载体多主体协同优化运行,提出了一种基于多智能体最优折中强化学习的多主体含氢综合能源系统优化调度方法。首先,以系统运行成本最低为目标建立多主体含氢综合能源系统优化调度模型,然后以各主体为对象构建多智能体模型,并在多智能体强化学习框架下对优化调度模型进行求解。针对传统多智能体近端策略优化算法因状态和动作相互独立而导致协作不足、动作冲突及优化效率低的问题,该文通过融入相邻智能体动作信息增强状态空间协作,并基于逼近理想解排序法构建多方案评估机制,筛选最优折中解的动作组合,形成多智能体最优折中强化学习算法,提升智能体间的协作和求解效率。算例仿真结果验证了所构建模型及算法在求解精度和优化性能上的优势。 展开更多
关键词 综合能源系统 多智能体强化学习 最优折中 近端策略优化 优化调度
原文传递
基于改进PPO算法的无人机航路规划
17
作者 姜智中 贺建良 《电光与控制》 北大核心 2026年第1期24-29,77,共7页
为提升无人机执行的可靠性,需要根据地形及敌方威胁规划出光滑连续的航路,并确保无人机在沿航路飞行过程中满足飞行性能约束。针对飞行性能约束下无人机连续航路规划问题,基于深度强化学习方法建立规划模型,在标准近端策略优化(PPO)算... 为提升无人机执行的可靠性,需要根据地形及敌方威胁规划出光滑连续的航路,并确保无人机在沿航路飞行过程中满足飞行性能约束。针对飞行性能约束下无人机连续航路规划问题,基于深度强化学习方法建立规划模型,在标准近端策略优化(PPO)算法的基础上引入门控循环单元(GRU)进行改进,实现了满足约束条件和平滑要求的无人机航路规划。通过仿真验证,证明了所提算法的有效性。 展开更多
关键词 深度强化学习 航路规划 近端策略优化 门控循环单元
在线阅读 下载PDF
基于ACVAE-MPPO算法的端到端自动驾驶算法研究
18
作者 于康鸿 张军 刘元盛 《计算机工程与应用》 北大核心 2026年第4期210-223,共14页
由于道路类型多样、交互实体众多以及环境复杂,在城市环境中实现高效的自动驾驶是当今自动驾驶技术研究的重点和挑战之一。端到端强化学习在自动驾驶应用中,面临表征模型提取特征能力不足和决策模型学习特征间历史联系困难的问题,这些... 由于道路类型多样、交互实体众多以及环境复杂,在城市环境中实现高效的自动驾驶是当今自动驾驶技术研究的重点和挑战之一。端到端强化学习在自动驾驶应用中,面临表征模型提取特征能力不足和决策模型学习特征间历史联系困难的问题,这些限制影响了算法在复杂城市环境中的决策性能。针对上述问题,提出ACVAE-MPPO算法。为了解决特征提取精度低的问题,在变分自编码器(variational auto-encoder,VAE)中加入坐标卷积层,使用判别器进行辅助训练,形成辅助训练坐标卷积变分自编码器(auxiliary training coordinate convolutional variational auto-encoder,ACVAE),最终提升特征提取的精度;为了增强决策模型提取历史特征的能力,在近端策略优化算法(proximal policy optimization,PPO)中引入长短期记忆网络,形成记忆近端策略优化算法(memory proximal policy optimization,MPPO),使PPO能够记忆和有效利用时序信息,提升决策准确性。将两个模型结合形成ACVAE-MPPO算法。Carla仿真器的实验结果表明,ACVAE-MPPO算法能展现出更强的决策能力,实现更稳定且成功率更高的驾驶决策。 展开更多
关键词 变分自编码器 近端策略优化算法 深度强化学习 自动驾驶
在线阅读 下载PDF
面向目标覆盖的多无人机协同算法研究
19
作者 王珩冰 周亚同 杨帆 《计算机工程与应用》 北大核心 2026年第2期335-346,共12页
多智能体近端策略优化(multi-agent proximal policy optimization,MAPPO)在处理动态复杂环境时面临稳定性和效率的挑战。针对以上挑战,设计了一种联合引力机制,提升了任务执行的速度和效率。在联合引力机制的基础上,提出了一种改进的MA... 多智能体近端策略优化(multi-agent proximal policy optimization,MAPPO)在处理动态复杂环境时面临稳定性和效率的挑战。针对以上挑战,设计了一种联合引力机制,提升了任务执行的速度和效率。在联合引力机制的基础上,提出了一种改进的MAPPO算法(DEA-MAPPO)。该算法通过自适应探索机制,动态调整探索幅度以适应无人机的实时表现和环境变化,增强了算法的适应性和稳定性。引入了高斯自适应注意力机制,提高了无人机决策质量、协作效率,增强了系统的可解释性和鲁棒性。通过加入延迟策略更新机制,有效避免了局部最优陷阱,进一步提升了算法性能和稳定性。通过在仿真环境中的实验验证,联合引力机制显著提升了覆盖效率,DEA-MAPPO算法在目标覆盖任务中的表现优于传统的MAPPO算法,在同样的训练回合数下,总奖励值和覆盖率都有了较大的提升,为多无人机在复杂环境中的高效协同执行目标覆盖任务提供了新的解决方案和理论支持。 展开更多
关键词 多无人机 强化学习 目标覆盖 多智能体近端策略优化 联合引力机制
在线阅读 下载PDF
基于双动态PPO算法的高超声速飞行器姿态控制
20
作者 王旭 蔡光斌 +2 位作者 余晓亚 叶子绮 单斌 《系统工程与电子技术》 北大核心 2026年第2期694-704,共11页
针对高超声速飞行器姿态控制中的强非线性和大不确定性特点,以及传统强化学习算法在多重控制需求下训练收敛性和控制精度的不足,提出一种双动态自适应近端策略优化(proximal policy optimization,PPO)算法。算法通过软动态裁剪机制和策... 针对高超声速飞行器姿态控制中的强非线性和大不确定性特点,以及传统强化学习算法在多重控制需求下训练收敛性和控制精度的不足,提出一种双动态自适应近端策略优化(proximal policy optimization,PPO)算法。算法通过软动态裁剪机制和策略驱动的熵调整机制,实现控制精度与执行机构保护的平衡,并在此基础上构建了集成气动特性和执行机构特性的综合仿真验证环境。结合比例-积分-微分控制思想,对状态观测空间进行了优化设计。仿真结果表明,与基准PPO算法相比,所提算法的收敛速度提升了22%,并显著改善了控制精度和动作平滑性。在不同飞行工况下,该方法展现出优异的策略适应性和鲁棒性,有效提升了飞行器的姿态控制性能。 展开更多
关键词 高超声速飞行器 动态自适应机制 智能控制 深度强化学习 近端策略优化
在线阅读 下载PDF
上一页 1 2 15 下一页 到第
使用帮助 返回顶部