期刊文献+
共找到366,301篇文章
< 1 2 250 >
每页显示 20 50 100
Research on an Air Pollutant Data Correction Method Based on Bayesian Optimization Support Vector Machine
1
作者 Xingfu Ou Miao Zhang Wenfeng Chen 《Journal of Electronic Research and Application》 2025年第4期190-203,共14页
Miniature air quality sensors are widely used in urban grid-based monitoring due to their flexibility in deployment and low cost.However,the raw data collected by these devices often suffer from low accuracy caused by... Miniature air quality sensors are widely used in urban grid-based monitoring due to their flexibility in deployment and low cost.However,the raw data collected by these devices often suffer from low accuracy caused by environmental interference and sensor drift,highlighting the need for effective calibration methods to improve data reliability.This study proposes a data correction method based on Bayesian Optimization Support Vector Regression(BO-SVR),which combines the nonlinear modeling capability of Support Vector Regression(SVR)with the efficient global hyperparameter search of Bayesian Optimization.By introducing cross-validation loss as the optimization objective and using Gaussian process modeling with an Expected Improvement acquisition strategy,the approach automatically determines optimal hyperparameters for accurate pollutant concentration prediction.Experiments on real-world micro-sensor datasets demonstrate that BO-SVR outperforms traditional SVR,grid search SVR,and random forest(RF)models across multiple pollutants,including PM_(2.5),PM_(10),CO,NO_(2),SO_(2),and O_(3).The proposed method achieves lower prediction residuals,higher fitting accuracy,and better generalization,offering an efficient and practical solution for enhancing the quality of micro-sensor air monitoring data. 展开更多
关键词 Air quality monitoring Data calibration Support vector regression Bayesian optimization Machine learning
在线阅读 下载PDF
Dynamic failure analysis and support optimization for web pillars under static and dynamic loading using catastrophe theory
2
作者 Juyu Jiang Yulong Zhang +2 位作者 Laigui Wang Changbo Du Jun Xu 《International Journal of Mining Science and Technology》 2025年第9期1591-1602,共12页
Web pillars enduring complex coupled loads are critical for stability in high-wall mining.This study develops a dynamic failure criterion for web pillars under non-uniform loading using catastrophe theory.Through the ... Web pillars enduring complex coupled loads are critical for stability in high-wall mining.This study develops a dynamic failure criterion for web pillars under non-uniform loading using catastrophe theory.Through the analysis of the web pillar-overburden system’s dynamic stress and deformation,a total potential energy function and dynamic failure criterion were established for web pillars.An optimizing method for web pillar parameters was developed in highwall mining.The dynamic criterion established was used to evaluate the dynamic failure and stability of web pillars under static and dynamic loading.Key findings reveal that vertical displacements exhibit exponential-trigonometric variation under static loads and multi-variable power-law behavior under dynamic blasting.Instability risks arise when the roof’s tensile strength-to-stress ratio drops below 1.Using catastrophe theory,the bifurcation setΔ<0 signals sudden instability.The criterion defines failure as when the unstable web pillar section length l1 exceeds the roof’s critical collapse distance l2.Case studies and simulations determine an optimal web pillar width of 4.6 m.This research enhances safety and resource recovery,providing a theoretical framework for advancing highwall mining technology. 展开更多
关键词 Non-uniform loading Highwall mining Web pillar Dynamic failure criterion Parameter optimization design
在线阅读 下载PDF
Urban Vertical Greening Optimization Supported by Deep Learning and Remote Sensing Technology and Its Application in Smart Ecological Cities
3
作者 Jian Sun Peng Li 《Journal of Environmental & Earth Sciences》 2025年第7期144-170,共27页
This research systematically investigates urban three-dimensional greening layout optimization and smart ecocity construction using deep learning and remote sensing technology.An improved U-Net++ architecture combined... This research systematically investigates urban three-dimensional greening layout optimization and smart ecocity construction using deep learning and remote sensing technology.An improved U-Net++ architecture combined with multi-source remote sensing data achieved high-precision recognition of urban three-dimensional greening with 92.8% overall accuracy.Analysis of spatiotemporal evolution patterns in Shanghai,Hangzhou,and Nanjing revealed that threedimensional greening shows a development trend from demonstration to popularization,with 16.5% annual growth rate.The study quantitatively assessed ecological benefits of various three-dimensional greening types.Results indicate that modular vertical greening and intensive roof gardens yield highest ecological benefits,while climbing-type vertical greening and extensive roof gardens offer optimal benefit-cost ratios.Integration of multiple forms generates 15-22% synergistic enhancement.Compared with traditional planning,the multi-objective optimization-based layout achieved 27.5% increase in carbon sequestration,32.6% improvement in temperature regulation,35.8% enhancement in stormwater management,and 42.3% rise in biodiversity index.Three pilot projects validated that actual ecological benefits reached 90.3-102.3% of predicted values.Multi-scenario simulations indicate optimized layouts can reduce urban heat island intensity by 15.2-18.7%,increase carbon neutrality contribution to 8.6-10.2%,and decrease stormwater runoff peaks by 25.3-32.6%.The findings provide technical methods for urban three-dimensional greening optimization and smart eco-city construction,promoting sustainable urban development. 展开更多
关键词 Deep Learning Remote Sensing Image Processing Three-Dimensional Greening Layout optimization Smart Eco-City
在线阅读 下载PDF
基于改进PPO算法的机械臂动态路径规划 被引量:1
4
作者 万宇航 朱子璐 +3 位作者 钟春富 刘永奎 林廷宇 张霖 《系统仿真学报》 北大核心 2025年第6期1462-1473,共12页
针对非结构化环境下机械臂路径规划面临的环境不确定性因素增多、建模难度大等问题,提出了一种基于改进近端策略优化(PPO)算法的机械臂动态路径规划方法。针对由于动态环境中障碍物数量变化而导致的状态空间输入长度不固定的问题,提出... 针对非结构化环境下机械臂路径规划面临的环境不确定性因素增多、建模难度大等问题,提出了一种基于改进近端策略优化(PPO)算法的机械臂动态路径规划方法。针对由于动态环境中障碍物数量变化而导致的状态空间输入长度不固定的问题,提出了基于LSTM网络的环境状态输入处理方法,并对PPO算法的网络结构进行了改进;基于人工势场法设计了奖励函数,并建立机械臂碰撞检测模型。实验结果表明:改进算法能够适应场景中障碍物数量和位置的变化,具有更快的收敛速度和稳定性。 展开更多
关键词 动态路径规划 改进ppo算法 LSTM网络 人工势场法 ML-Agents
原文传递
An Improved Artificial Rabbits Optimization Algorithm with Chaotic Local Search and Opposition-Based Learning for Engineering Problems and Its Applications in Breast Cancer Problem 被引量:1
5
作者 Feyza AltunbeyÖzbay ErdalÖzbay Farhad Soleimanian Gharehchopogh 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第11期1067-1110,共44页
Artificial rabbits optimization(ARO)is a recently proposed biology-based optimization algorithm inspired by the detour foraging and random hiding behavior of rabbits in nature.However,for solving optimization problems... Artificial rabbits optimization(ARO)is a recently proposed biology-based optimization algorithm inspired by the detour foraging and random hiding behavior of rabbits in nature.However,for solving optimization problems,the ARO algorithm shows slow convergence speed and can fall into local minima.To overcome these drawbacks,this paper proposes chaotic opposition-based learning ARO(COARO),an improved version of the ARO algorithm that incorporates opposition-based learning(OBL)and chaotic local search(CLS)techniques.By adding OBL to ARO,the convergence speed of the algorithm increases and it explores the search space better.Chaotic maps in CLS provide rapid convergence by scanning the search space efficiently,since their ergodicity and non-repetitive properties.The proposed COARO algorithm has been tested using thirty-three distinct benchmark functions.The outcomes have been compared with the most recent optimization algorithms.Additionally,the COARO algorithm’s problem-solving capabilities have been evaluated using six different engineering design problems and compared with various other algorithms.This study also introduces a binary variant of the continuous COARO algorithm,named BCOARO.The performance of BCOARO was evaluated on the breast cancer dataset.The effectiveness of BCOARO has been compared with different feature selection algorithms.The proposed BCOARO outperforms alternative algorithms,according to the findings obtained for real applications in terms of accuracy performance,and fitness value.Extensive experiments show that the COARO and BCOARO algorithms achieve promising results compared to other metaheuristic algorithms. 展开更多
关键词 Artificial rabbit optimization binary optimization breast cancer chaotic local search engineering design problem opposition-based learning
在线阅读 下载PDF
一种面向博弈场景的PPO-Dueling DQN策略优化方法
6
作者 刘鹏程 汪永伟 +2 位作者 余欣鋆 刘小虎 胡浩 《小型微型计算机系统》 北大核心 2025年第11期2594-2599,共6页
传统的深度Q学习训练算法改进通常侧重于奖励函数的优化,相对缺少策略的自优化和收敛梯度的动态调整.本文针对该问题,在Dueling-DQN算法的基础上提出了一种混合算法PPO-Dueling DQN,该算法一方面能够使用策略梯度下降和自适应KL散度惩... 传统的深度Q学习训练算法改进通常侧重于奖励函数的优化,相对缺少策略的自优化和收敛梯度的动态调整.本文针对该问题,在Dueling-DQN算法的基础上提出了一种混合算法PPO-Dueling DQN,该算法一方面能够使用策略梯度下降和自适应KL散度惩罚机制,实现目标函数损失和值函数损失的同步更新,进而优化模型的损失函数和策略选择,另一方面能更加实时地提取博弈过程中的状态价值和动作优势,从而避免依靠单一指标进行策略更新和效能评估.通过对比实验,验证了面向网络博弈模型的PPO-Dueling DQN算法在学习能力、收敛速度和自适应效能等指标上的优化效果,并进行了关于折扣因子的参数分析以更好地评估模型效能,实验结果证明本文提出的算法相对于基准模型具有一定的性能优势. 展开更多
关键词 强化学习 深度Q网络 ppo算法 网络攻防博弈 效能评估
在线阅读 下载PDF
基于卷积金字塔网络的PPO算法求解作业车间调度问题 被引量:1
7
作者 徐帅 李艳武 +1 位作者 谢辉 牛晓伟 《现代制造工程》 北大核心 2025年第3期19-30,共12页
作业车间调度问题是一个经典的NP-hard组合优化问题,其调度方案的优劣直接影响制造系统的运行效率。为得到更优的调度策略,以最小化最大完工时间为优化目标,提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)和卷积神经网... 作业车间调度问题是一个经典的NP-hard组合优化问题,其调度方案的优劣直接影响制造系统的运行效率。为得到更优的调度策略,以最小化最大完工时间为优化目标,提出了一种基于近端策略优化(Proximal Policy Optimization,PPO)和卷积神经网络(Convolutional Neural Network,CNN)的深度强化学习(Deep Reinforcement Learning,DRL)调度方法。设计了一种三通道状态表示方法,选取16种启发式调度规则作为动作空间,将奖励函数等价为最小化机器总空闲时间。为使训练得到的调度策略能够处理不同规模的调度算例,在卷积神经网络中使用空间金字塔池化(Spatial Pyramid Pooling,SPP),将不同维度的特征矩阵转化为固定长度的特征向量。在公开OR-Library的42个作业车间调度(Job-Shop Scheduling Problem,JSSP)算例上进行了计算实验。仿真实验结果表明,该算法优于单一启发式调度规则和遗传算法,在大部分算例中取得了比现有深度强化学习算法更好的结果,且平均完工时间最小。 展开更多
关键词 深度强化学习 作业车间调度 卷积神经网络 近端策略优化 空间金字塔池化
在线阅读 下载PDF
局部风信息启发的AVW-PPO室内气源定位算法
8
作者 李世钰 袁杰 +2 位作者 谢霖伟 郭旭 张宁宁 《哈尔滨工业大学学报》 北大核心 2025年第8期57-68,共12页
为解决当前复杂、动态室内羽流环境中气源定位(OSL)效率低下和成功率不足的问题,尤其在湍流条件下机器人难以准确感知环境并实现有效导航的挑战,提出了一种基于深度强化学习的辅助价值与风导向的近端策略优化(AVW-PPO)算法。首先,在原始... 为解决当前复杂、动态室内羽流环境中气源定位(OSL)效率低下和成功率不足的问题,尤其在湍流条件下机器人难以准确感知环境并实现有效导航的挑战,提出了一种基于深度强化学习的辅助价值与风导向的近端策略优化(AVW-PPO)算法。首先,在原始PPO算法的基础上引入辅助价值网络,以减少单一值网络的估计偏差,从而提升策略更新的稳定性与预测精度。其次,设计了一种风导向策略,将局部环境风场信息融入强化学习框架中的状态空间与奖励函数,使机器人能够更敏锐地感知羽流环境的动态变化,优化其决策路径,从而有效提高气源定位的效率。最后,通过构建二维环境中的气体扩散模型,在3种不同的湍流条件下对所提算法进行了测试。结果表明:相同环境条件下,AVW-PPO算法在平均搜索步数和成功率两个指标上均优于其他同类算法,且定位成功率超过99%。其中,风导向策略在提升搜索效率方面表现尤为突出,有助于减少机器人完成任务所需的时间。本研究为解决室内复杂湍流环境下的气源定位问题提供了新思路和新方法。 展开更多
关键词 气源定位 深度强化学习 近端策略优化(ppo) 辅助价值网络 风导向策略
在线阅读 下载PDF
Control effect and optimization scheme of combined rockbolt-cable support for a tunnel in horizontally layered limestone:A case study 被引量:1
9
作者 Jiachen Wang Dingli Zhang +1 位作者 Zhenyu Sun Feng Peng 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2024年第11期4586-4604,共19页
This study focused on the mechanical behavior of a deep-buried tunnel constructed in horizontally layered limestone,and investigated the effect of a new combined rockboltecable support system on the tunnel response.Th... This study focused on the mechanical behavior of a deep-buried tunnel constructed in horizontally layered limestone,and investigated the effect of a new combined rockboltecable support system on the tunnel response.The Yujingshan Tunnel,excavated through a giant karst cave,was used as a case study.Firstly,a multi-objective optimization model for the rockboltecable support was proposed by using fuzzy mathematics and multi-objective comprehensive decision-making principles.Subsequently,the parameters of the surrounding rock were calibrated by comparing the simulation results obtained by the discrete element method(DEM)with the field monitoring data to obtain an optimized support scheme based on the optimization model.Finally,the optimization scheme was applied to the karst cave section,which was divided into the B-and C-shaped sections.The distribution range of the rockboltecable support in the C-shaped section was larger than that in the B-shaped section.The field monitoring results,including tunnel crown settlement,horizontal convergence,and axial force of the rockboltecable system,were analyzed to assess the effectiveness of the optimization scheme.The maximum crown settlement and horizontal convergence were measured to be 25.9 mm and 35 mm,accounting for 0.1%and 0.2%of the tunnel height and span,respectively.Although the C-shaped section had poorer rock properties than the B-shaped section,the crown settlement and horizontal convergence in the C-shaped section ranged from 46%to 97%of those observed in the B-shaped section.The cable axial force in the Bshaped section was approximately 60%of that in the C-shaped section.The axial force in the crown rockbolt was much smaller than that in the sidewall rockbolt.Field monitoring results demonstrated that the optimized scheme effectively controlled the deformation of the layered surrounding rock,ensuring that it remained within a safe range.These results provide valuable references for the design of support systems in deep-buried tunnels situated in layered rock masses. 展开更多
关键词 Giant karst cave Multi-objective optimization model Numerical simulation Combined rockbolt-cable support Field monitoring
在线阅读 下载PDF
Comparison of debris flow susceptibility assessment methods:support vector machine,particle swarm optimization,and feature selection techniques 被引量:1
10
作者 ZHAO Haijun WEI Aihua +3 位作者 MA Fengshan DAI Fenggang JIANG Yongbing LI Hui 《Journal of Mountain Science》 SCIE CSCD 2024年第2期397-412,共16页
The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques we... The selection of important factors in machine learning-based susceptibility assessments is crucial to obtain reliable susceptibility results.In this study,metaheuristic optimization and feature selection techniques were applied to identify the most important input parameters for mapping debris flow susceptibility in the southern mountain area of Chengde City in Hebei Province,China,by using machine learning algorithms.In total,133 historical debris flow records and 16 related factors were selected.The support vector machine(SVM)was first used as the base classifier,and then a hybrid model was introduced by a two-step process.First,the particle swarm optimization(PSO)algorithm was employed to select the SVM model hyperparameters.Second,two feature selection algorithms,namely principal component analysis(PCA)and PSO,were integrated into the PSO-based SVM model,which generated the PCA-PSO-SVM and FS-PSO-SVM models,respectively.Three statistical metrics(accuracy,recall,and specificity)and the area under the receiver operating characteristic curve(AUC)were employed to evaluate and validate the performance of the models.The results indicated that the feature selection-based models exhibited the best performance,followed by the PSO-based SVM and SVM models.Moreover,the performance of the FS-PSO-SVM model was better than that of the PCA-PSO-SVM model,showing the highest AUC,accuracy,recall,and specificity values in both the training and testing processes.It was found that the selection of optimal features is crucial to improving the reliability of debris flow susceptibility assessment results.Moreover,the PSO algorithm was found to be not only an effective tool for hyperparameter optimization,but also a useful feature selection algorithm to improve prediction accuracies of debris flow susceptibility by using machine learning algorithms.The high and very high debris flow susceptibility zone appropriately covers 38.01%of the study area,where debris flow may occur under intensive human activities and heavy rainfall events. 展开更多
关键词 Chengde Feature selection Support vector machine Particle swarm optimization Principal component analysis Debris flow susceptibility
原文传递
Prediction and optimization of flue pressure in sintering process based on SHAP 被引量:2
11
作者 Mingyu Wang Jue Tang +2 位作者 Mansheng Chu Quan Shi Zhen Zhang 《International Journal of Minerals,Metallurgy and Materials》 SCIE EI CAS 2025年第2期346-359,共14页
Sinter is the core raw material for blast furnaces.Flue pressure,which is an important state parameter,affects sinter quality.In this paper,flue pressure prediction and optimization were studied based on the shapley a... Sinter is the core raw material for blast furnaces.Flue pressure,which is an important state parameter,affects sinter quality.In this paper,flue pressure prediction and optimization were studied based on the shapley additive explanation(SHAP)to predict the flue pressure and take targeted adjustment measures.First,the sintering process data were collected and processed.A flue pressure prediction model was then constructed after comparing different feature selection methods and model algorithms using SHAP+extremely random-ized trees(ET).The prediction accuracy of the model within the error range of±0.25 kPa was 92.63%.SHAP analysis was employed to improve the interpretability of the prediction model.The effects of various sintering operation parameters on flue pressure,the relation-ship between the numerical range of key operation parameters and flue pressure,the effect of operation parameter combinations on flue pressure,and the prediction process of the flue pressure prediction model on a single sample were analyzed.A flue pressure optimization module was also constructed and analyzed when the prediction satisfied the judgment conditions.The operating parameter combination was then pushed.The flue pressure was increased by 5.87%during the verification process,achieving a good optimization effect. 展开更多
关键词 sintering process flue pressure shapley additive explanation PREDICTION optimization
在线阅读 下载PDF
基于深度强化学习PPO的车辆智能控制方法
12
作者 叶宝林 王欣 +1 位作者 李灵犀 吴维敏 《计算机工程》 北大核心 2025年第7期385-396,共12页
为提高高速公路上混合环境下车辆的行驶效率、减少交通事故的发生,提出一种基于近端策略优化(PPO)的车辆智能控制方法。首先构建一个融合深度强化学习和传统比例-积分-微分(PID)控制的分层控制框架,上层深度强化学习智能体负责确定控制... 为提高高速公路上混合环境下车辆的行驶效率、减少交通事故的发生,提出一种基于近端策略优化(PPO)的车辆智能控制方法。首先构建一个融合深度强化学习和传统比例-积分-微分(PID)控制的分层控制框架,上层深度强化学习智能体负责确定控制策略,下层PID控制器负责执行控制策略。其次为了提升车辆的行驶效率,通过定义优势距离对观测到的环境状态矩阵进行数据筛选,帮助自主车辆选择具有更长优势距离的车道进行变道。基于定义的优势距离提出一种新的状态采集方法以减少数据处理量,加快深度强化学习模型的收敛速度。另外,为了兼顾车辆的安全性、行驶效率和稳定性,设计一个多目标奖励函数。最后在基于Gym搭建的车辆强化学习任务仿真环境Highway_env中进行测试,对所提方法在不同目标速度下的表现进行分析和讨论。仿真测试结果表明,相比深度Q网络(DQN)方法,所提方法具有更快的收敛速度,且在两种不同目标速度下均能使车辆安全平稳地完成驾驶任务。 展开更多
关键词 近端策略优化 车辆控制 分层控制框架 多目标奖励函数 深度Q网络
在线阅读 下载PDF
自适应奖励函数的PPO曲面覆盖方法
13
作者 李淑怡 阳波 +2 位作者 陈灵 沈玲 唐文胜 《计算机工程》 北大核心 2025年第3期86-94,共9页
针对机器人清洁作业过程中现有曲面覆盖方法难以适应曲面变化且覆盖效率低的问题,提出一种自适应奖励函数的近端策略优化(PPO)曲面覆盖方法(SC-SRPPO)。首先,将目标曲面离散化,以球查询方式获得协方差矩阵,求解点云的法向量,建立3D曲面... 针对机器人清洁作业过程中现有曲面覆盖方法难以适应曲面变化且覆盖效率低的问题,提出一种自适应奖励函数的近端策略优化(PPO)曲面覆盖方法(SC-SRPPO)。首先,将目标曲面离散化,以球查询方式获得协方差矩阵,求解点云的法向量,建立3D曲面模型;其次,以曲面局部点云的覆盖状态特征和曲率变化特征作为曲面模型观测值以构建状态模型,有利于机器人移动轨迹拟合曲面,提高机器人对曲面变化的适应能力;接着,基于曲面的全局覆盖率和与时间相关的指数模型构建一种自适应奖励函数,引导机器人向未覆盖区域移动,提高覆盖效率;最后,将曲面局部状态模型、奖励函数、PPO强化学习算法相融合,训练机器人完成曲面覆盖路径规划任务。在球形、马鞍形、立体心形等3种曲面模型上,以点云覆盖率与覆盖完成时间作为主要评价指标进行实验,结果表明,SC-SRPPO的平均覆盖率为90.72%,与NSGA Ⅱ、PPO、SAC这3种方法对比,覆盖率分别提升4.98%、14.56%、27.11%,覆盖完成时间分别缩短15.20%、67.18%、62.64%。SC-SRPPO能够在适应曲面变化的基础上使机器人更加高效地完成曲面覆盖任务。 展开更多
关键词 清洁机器人 曲面 覆盖路径规划 强化学习 近端策略优化
在线阅读 下载PDF
面向匝道合流场景的多智能体强化学习SAG-MAPPO安全协同决策方法
14
作者 张树培 庞莹 +2 位作者 孙朋举 张玮 王玲德 《重庆理工大学学报(自然科学)》 北大核心 2025年第9期45-52,共8页
针对匝道合流场景中智能网联汽车(CAVs)与人类驾驶车辆(HDVs)在多车协同决策时,因局部可观测性以及动态环境不确定性导致的安全与效率问题,提出一种基于时序记忆和安全约束机制的多智能体近端策略优化(SAG-MAPPO)算法。建立匝道协同决... 针对匝道合流场景中智能网联汽车(CAVs)与人类驾驶车辆(HDVs)在多车协同决策时,因局部可观测性以及动态环境不确定性导致的安全与效率问题,提出一种基于时序记忆和安全约束机制的多智能体近端策略优化(SAG-MAPPO)算法。建立匝道协同决策场景的分布式部分可观测马尔可夫决策(Dec-POMDP)模型,通过引入门控循环单元(gated recurrent unit,GRU)处理车辆状态的历史信息,解决环境局部观测性导致的策略不稳定问题。在此基础上,设计包含硬性规则约束和动态行为预测的双层安全机制,实时屏蔽危险动作,确保决策输出的安全性。仿真结果表明,SAG-MAPPO在不同密度的匝道协同合流场景下均表现出更快的收敛速度、更高的策略累计奖励和平均速度,验证了其在复杂动态场景下的有效性。 展开更多
关键词 匝道合流 自动驾驶 深度强化学习 多智能体近端策略优化 决策
在线阅读 下载PDF
基于改进PPO算法的混合动力汽车能量管理策略
15
作者 马超 孙统 +2 位作者 曹磊 杨坤 胡文静 《河北科技大学学报》 北大核心 2025年第3期237-247,共11页
为了提高功率分流式混合动力汽车(hybrid electric vehicle, HEV)的经济性,建立了HEV整车的纵向动力学模型,并提出了一种基于策略熵优化的改进近端策略优化(proximal policy optimization, PPO)算法的能量管理策略(energy management st... 为了提高功率分流式混合动力汽车(hybrid electric vehicle, HEV)的经济性,建立了HEV整车的纵向动力学模型,并提出了一种基于策略熵优化的改进近端策略优化(proximal policy optimization, PPO)算法的能量管理策略(energy management strategy, EMS)。在一般PPO算法基础上,通过采用经验池机制简化算法框架,只使用1个深度神经网络进行交互训练和更新,以减少策略网络参数同步的复杂性;为了有效探索环境并学习更高效的策略,在损失函数中增加策略熵,以促进智能体在探索与利用之间达到平衡,避免策略过早收敛至局部最优解。结果表明,这种基于单策略网络改进PPO算法的EMS相比于基于双策略网络PPO的EMS,在UDDS工况和NEDC工况下,均能更好地维持电池的荷电状态(state of charge, SOC),同时等效燃油消耗分别降低了8.5%和1.4%,并取得了与基于动态规划(dynamic programming, DP)算法的EMS相近的节能效果。所提改进PPO算法能有效提高HEV的燃油经济性,可为HEV的EMS设计与开发提供参考。 展开更多
关键词 车辆工程 混合动力汽车 能量管理策略 深度强化学习 近端策略优化
在线阅读 下载PDF
基于改进PPO的HCSY-MG并网系统分布式混合储能充放电优化控制
16
作者 李锦键 王兴贵 丁颖杰 《电源学报》 北大核心 2025年第4期255-264,共10页
为平抑微源半桥变流器串联星型结构微电网HCSY-MG(half-bridge converter series Y-connection micro-grids)并网系统中微源出力的波动,保证各相直流侧电压之和相等,与并网电流三相平衡,提出1种基于改进近端策略优化PPO(proximal policy... 为平抑微源半桥变流器串联星型结构微电网HCSY-MG(half-bridge converter series Y-connection micro-grids)并网系统中微源出力的波动,保证各相直流侧电压之和相等,与并网电流三相平衡,提出1种基于改进近端策略优化PPO(proximal policy optimization)的分布式混合储能系统HESS(hybrid energy storage system)充、放电优化控制策略。在考虑HCSY-MG系统并网电流与分布式HESS特性的条件下,确定影响并网电流的主要系统变量,以及HESS接入系统的最佳拓扑结构。然后结合串联系统的特点,将分布式HESS的充、放电问题转换为深度强化学习的Markov决策过程。同时针对PPO算法中熵损失权重难以确定的问题,提出1种改进的PPO算法,兼顾智能体的收敛性和探索性。最后以某新能源发电基地的典型运行数据为算例,验证所提控制策略的可行性和有效性。 展开更多
关键词 串联微电网 分布式混合储能系统 近端策略优化 充放电功率 深度强化学习
在线阅读 下载PDF
Recent Advancements in the Optimization Capacity Configuration and Coordination Operation Strategy of Wind-Solar Hybrid Storage System 被引量:1
17
作者 Hongliang Hao Caifeng Wen +5 位作者 Feifei Xue Hao Qiu Ning Yang Yuwen Zhang Chaoyu Wang Edwin E.Nyakilla 《Energy Engineering》 EI 2025年第1期285-306,共22页
Present of wind power is sporadically and cannot be utilized as the only fundamental load of energy sources.This paper proposes a wind-solar hybrid energy storage system(HESS)to ensure a stable supply grid for a longe... Present of wind power is sporadically and cannot be utilized as the only fundamental load of energy sources.This paper proposes a wind-solar hybrid energy storage system(HESS)to ensure a stable supply grid for a longer period.A multi-objective genetic algorithm(MOGA)and state of charge(SOC)region division for the batteries are introduced to solve the objective function and configuration of the system capacity,respectively.MATLAB/Simulink was used for simulation test.The optimization results show that for a 0.5 MW wind power and 0.5 MW photovoltaic system,with a combination of a 300 Ah lithium battery,a 200 Ah lead-acid battery,and a water storage tank,the proposed strategy reduces the system construction cost by approximately 18,000 yuan.Additionally,the cycle count of the electrochemical energy storage systemincreases from4515 to 4660,while the depth of discharge decreases from 55.37%to 53.65%,achieving shallow charging and discharging,thereby extending battery life and reducing grid voltage fluctuations significantly.The proposed strategy is a guide for stabilizing the grid connection of wind and solar power generation,capability allocation,and energy management of energy conservation systems. 展开更多
关键词 Electric-thermal hybrid storage modal decomposition multi-objective genetic algorithm capacity optimization allocation operation strategy
在线阅读 下载PDF
Dynamic hedging of 50ETF options using Proximal Policy Optimization
18
作者 Lei Liu Mengmeng Hao Jinde Cao 《Journal of Automation and Intelligence》 2025年第3期198-206,共9页
This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the cha... This paper employs the PPO(Proximal Policy Optimization) algorithm to study the risk hedging problem of the Shanghai Stock Exchange(SSE) 50ETF options. First, the action and state spaces were designed based on the characteristics of the hedging task, and a reward function was developed according to the cost function of the options. Second, combining the concept of curriculum learning, the agent was guided to adopt a simulated-to-real learning approach for dynamic hedging tasks, reducing the learning difficulty and addressing the issue of insufficient option data. A dynamic hedging strategy for 50ETF options was constructed. Finally, numerical experiments demonstrate the superiority of the designed algorithm over traditional hedging strategies in terms of hedging effectiveness. 展开更多
关键词 B-S model Option hedging Reinforcement learning 50ETF Proximal Policy optimization(ppo)
在线阅读 下载PDF
Gait Learning Reproduction for Quadruped Robots Based on Experience Evolution Proximal Policy Optimization
19
作者 LI Chunyang ZHU Xiaoqing +2 位作者 RUAN Xiaogang LIU Xinyuan ZHANG Siyuan 《Journal of Shanghai Jiaotong university(Science)》 2025年第6期1125-1133,共9页
Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch ... Bionic gait learning of quadruped robots based on reinforcement learning has become a hot research topic.The proximal policy optimization(PPO)algorithm has a low probability of learning a successful gait from scratch due to problems such as reward sparsity.To solve the problem,we propose a experience evolution proximal policy optimization(EEPPO)algorithm which integrates PPO with priori knowledge highlighting by evolutionary strategy.We use the successful trained samples as priori knowledge to guide the learning direction in order to increase the success probability of the learning algorithm.To verify the effectiveness of the proposed EEPPO algorithm,we have conducted simulation experiments of the quadruped robot gait learning task on Pybullet.Experimental results show that the central pattern generator based radial basis function(CPG-RBF)network and the policy network are simultaneously updated to achieve the quadruped robot’s bionic diagonal trot gait learning task using key information such as the robot’s speed,posture and joints information.Experimental comparison results with the traditional soft actor-critic(SAC)algorithm validate the superiority of the proposed EEPPO algorithm,which can learn a more stable diagonal trot gait in flat terrain. 展开更多
关键词 quadruped robot proximal policy optimization(ppo) priori knowledge evolutionary strategy bionic gait learning
原文传递
结合PPO和蒙特卡洛树搜索的斗地主博弈模型
20
作者 王世鹏 王亚杰 +2 位作者 吴燕燕 郭其龙 赵甜宇 《重庆理工大学学报(自然科学)》 北大核心 2025年第8期126-133,共8页
斗地主是一种典型的非完备信息博弈,由于具有多人博弈、动作空间庞大、合作与竞争并存等决策需求,单一的蒙特卡洛树搜索在应用时存在效率低的问题。为提升蒙特卡洛树搜索的策略效果和搜索效率,提出一种基于近端策略优化(proximal policy... 斗地主是一种典型的非完备信息博弈,由于具有多人博弈、动作空间庞大、合作与竞争并存等决策需求,单一的蒙特卡洛树搜索在应用时存在效率低的问题。为提升蒙特卡洛树搜索的策略效果和搜索效率,提出一种基于近端策略优化(proximal policy optimization,PPO)算法结合蒙特卡洛树搜索的斗地主博弈模型。利用PPO算法学习斗地主中的牌局和策略信息,训练出可根据当前局面提供动作概率的策略模型,为蒙特卡洛树搜索的选择和模拟阶段提供策略指导。在选择阶段,通过PPO策略模型输出的动作概率优化策略选择公式,指导高质量动作节点的选择。在模拟阶段,PPO替代了随机模拟过程,使模拟更加符合策略,减少低效路径的探索。实验结果表明:结合PPO优化后的蒙特卡洛树搜索不仅提高了决策的效率,还提升了模型的胜率,表现出较强的斗地主博弈决策优势。 展开更多
关键词 ppo算法 蒙特卡洛树搜索 斗地主 非完备信息博弈
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部