期刊文献+
共找到119,589篇文章
< 1 2 250 >
每页显示 20 50 100
基于Q-Learning长尾延迟优化的SSD-SMR写缓存策略研究
1
作者 刘健 章步镐 +4 位作者 方匡弛 刘宣锋 孙国道 梁荣华 梁浩然 《计算机工程》 北大核心 2026年第3期287-298,共12页
随着全球数据规模的不断增大,如何以低成本的方式有效提升数据的访问性能是存储系统面临的一项重要挑战,使用低延迟、高带宽的固态硬盘(SSD)和低成本、高存储密度的叠瓦式磁盘(SMR)来构建缓存系统,成为一种有效的解决方案。但是,SMR固... 随着全球数据规模的不断增大,如何以低成本的方式有效提升数据的访问性能是存储系统面临的一项重要挑战,使用低延迟、高带宽的固态硬盘(SSD)和低成本、高存储密度的叠瓦式磁盘(SMR)来构建缓存系统,成为一种有效的解决方案。但是,SMR固有的机械运动和多磁道堆叠的特性导致其写性能较差,SSD中的脏数据频繁写回SMR所导致的大量读-合并-写(RMW)操作可能会引起严重的长尾延迟现象。为此,基于SSD-SMR混合存储架构提出一种结合强化学习Q-Learning算法的缓存替换优化策略。通过学习SMR设备的I/O负载状况与延迟之间的经验知识来控制对SMR的写入,当SMR负载较大时,通过控制缓存中脏数据的逐出来减少SMR因写回而产生的大量RMW操作,从而优化系统在不同负载下的尾部延迟开销。将Q-Learning算法与基于数据流行度的缓存算法LRU以及SMR感知的缓存算法SAC进行结合,使用真实企业Trace和YCSB生成的模拟Trace进行测试,实验结果表明,所提方法能够有效提升现有缓存算法的性能,可以降低57.06%的平均延迟和87.49%的尾部延迟。 展开更多
关键词 q-learning算法 I/O负载 长尾延迟 缓存替换算法 混合存储
在线阅读 下载PDF
基于Q-Learning的多模态自适应光伏功率优化组合预测
2
作者 隗知初 杨苹 +3 位作者 周钱雨凡 陈文皓 万思洋 崔嘉雁 《电力工程技术》 北大核心 2026年第1期115-124,163,共11页
针对光伏功率序列波动性强、随机性高的问题,文中提出一种基于Q-Learning的多模态自适应光伏功率优化组合预测模型。首先,采用鲸鱼优化算法的变分模态分解方法,将原始光伏功率序列分解成不同子模态,并通过集成特征筛选模型,确定各子模... 针对光伏功率序列波动性强、随机性高的问题,文中提出一种基于Q-Learning的多模态自适应光伏功率优化组合预测模型。首先,采用鲸鱼优化算法的变分模态分解方法,将原始光伏功率序列分解成不同子模态,并通过集成特征筛选模型,确定各子模态序列最敏感的气象因素。然后,构建反向传播神经网络、双向长短期记忆网络、门控循环单元网络和时间卷积网络4种基础预测模型。考虑到不同模型对不同频率特征的子序列预测能力不同,利用Q-Learning算法自适应选择各模态对应的最优基础模型组合方式。最后,将不同子模态的预测结果叠加重构,得到最终预测结果,并利用高分辨率光伏气象功率数据集进行验证。结果证明,文中所提出的基于Q-Learning的多模态自适应光伏功率优化组合预测模型,相较于单一模型的预测误差平均绝对误差下降了16.18%,均方误差下降了17.00%。 展开更多
关键词 鲸鱼优化算法 变分模态分解 q-learning 功率预测 组合模型 光伏发电
在线阅读 下载PDF
基于随机森林与Q-learning融合的多元电力数据存储优化决策方法
3
作者 叶学顺 贾东梨 +2 位作者 周俊 唐英 贾梓豪 《科学技术与工程》 北大核心 2026年第3期1065-1074,共10页
大规模和多样的电力数据存储面临效率低和内存容量不足的瓶颈问题。数据索引和数据压缩等传统数据存储优化方法各有优劣势,如何有效应用于电力数据存储是目前研究的难点。为了解决这个问题,提出了一种融合随机森林和Q-learning的多元电... 大规模和多样的电力数据存储面临效率低和内存容量不足的瓶颈问题。数据索引和数据压缩等传统数据存储优化方法各有优劣势,如何有效应用于电力数据存储是目前研究的难点。为了解决这个问题,提出了一种融合随机森林和Q-learning的多元电力数据存储优化决策方法。该方法中的关键技术包括:首先提出了基于改进随机森林算法的存储优化策略决策模型,引入信息增益方法,综合评价数据存储时对数据库的数据访问频率、查询时间、存储速度以及数据冗余率等因素影响,做出数据直接存储、数据索引存储和数据压缩存储的存储优化方法策略决策;其次提出了基于改进Q-learning算法的数据存储算法决策模型,引入多尺度学习机制、优先经验放回机制和正负向奖励机制,决策数据索引存储时适用的索引算法以及数据压缩存储时适用的数据压缩算法。本方法有效融合了数据索引与数据压缩的技术优势,大幅提升数据存储效率并节约存储空间,为大规模多元电力数据管理提供新的解决方案。 展开更多
关键词 随机森林算法 q-learning算法 数据存储优化方法 数据索引算法 数据压缩算法
在线阅读 下载PDF
An Improved Reinforcement Learning-Based 6G UAV Communication for Smart Cities
4
作者 Vi Hoai Nam Chu Thi Minh Hue Dang Van Anh 《Computers, Materials & Continua》 2026年第1期2030-2044,共15页
Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic top... Unmanned Aerial Vehicles(UAVs)have become integral components in smart city infrastructures,supporting applications such as emergency response,surveillance,and data collection.However,the high mobility and dynamic topology of Flying Ad Hoc Networks(FANETs)present significant challenges for maintaining reliable,low-latency communication.Conventional geographic routing protocols often struggle in situations where link quality varies and mobility patterns are unpredictable.To overcome these limitations,this paper proposes an improved routing protocol based on reinforcement learning.This new approach integrates Q-learning with mechanisms that are both link-aware and mobility-aware.The proposed method optimizes the selection of relay nodes by using an adaptive reward function that takes into account energy consumption,delay,and link quality.Additionally,a Kalman filter is integrated to predict UAV mobility,improving the stability of communication links under dynamic network conditions.Simulation experiments were conducted using realistic scenarios,varying the number of UAVs to assess scalability.An analysis was conducted on key performance metrics,including the packet delivery ratio,end-to-end delay,and total energy consumption.The results demonstrate that the proposed approach significantly improves the packet delivery ratio by 12%–15%and reduces delay by up to 25.5%when compared to conventional GEO and QGEO protocols.However,this improvement comes at the cost of higher energy consumption due to additional computations and control overhead.Despite this trade-off,the proposed solution ensures reliable and efficient communication,making it well-suited for large-scale UAV networks operating in complex urban environments. 展开更多
关键词 UAV FANET smart cities reinforcement learning q-learning
在线阅读 下载PDF
FAIR-DQL:Fairness-Aware Deep Q-Learning for Enhanced Resource Allocation and RIS Optimization in High-Altitude Platform Networks
5
作者 Muhammad Ejaz Muhammad Asim +1 位作者 Mudasir Ahmad Wani Kashish Ara Shakil 《Computers, Materials & Continua》 2026年第3期758-779,共22页
The integration of High-Altitude Platform Stations(HAPS)with Reconfigurable Intelligent Surfaces(RIS)represents a critical advancement for next-generation wireless networks,offering unprecedented opportunities for ubi... The integration of High-Altitude Platform Stations(HAPS)with Reconfigurable Intelligent Surfaces(RIS)represents a critical advancement for next-generation wireless networks,offering unprecedented opportunities for ubiquitous connectivity.However,existing research reveals significant gaps in dynamic resource allocation,joint optimization,and equitable service provisioning under varying channel conditions,limiting practical deployment of these technologies.This paper addresses these challenges by proposing a novel Fairness-Aware Deep Q-Learning(FAIRDQL)framework for joint resource management and phase configuration in HAPS-RIS systems.Our methodology employs a comprehensive three-tier algorithmic architecture integrating adaptive power control,priority-based user scheduling,and dynamic learning mechanisms.The FAIR-DQL approach utilizes advanced reinforcement learning with experience replay and fairness-aware reward functions to balance competing objectives while adapting to dynamic environments.Key findings demonstrate substantial improvements:9.15 dB SINR gain,12.5 bps/Hz capacity,78%power efficiency,and 0.82 fairness index.The framework achieves rapid 40-episode convergence with consistent delay performance.These contributions establish new benchmarks for fairness-aware resource allocation in aerial communications,enabling practical HAPS-RIS deployments in rural connectivity,emergency communications,and urban networks. 展开更多
关键词 Wireless communication high-altitude platform station reconfigurable intelligent surfaces deep q-learning
在线阅读 下载PDF
Dynamic Integration of Q-Learning and A-APF for Efficient Path Planning in Complex Underground Mining Environments
6
作者 Chang Su Liangliang Zhao Dongbing Xiang 《Computers, Materials & Continua》 2026年第2期1017-1040,共24页
To address low learning efficiency and inadequate path safety in spraying robot navigation within complex obstacle-rich environments—with dense,dynamic,unpredictable obstacles challenging conventional methods—this p... To address low learning efficiency and inadequate path safety in spraying robot navigation within complex obstacle-rich environments—with dense,dynamic,unpredictable obstacles challenging conventional methods—this paper proposes a hybrid algorithm integrating Q-learning and improved A*-Artificial Potential Field(A-APF).Centered on theQ-learning framework,the algorithmleverages safety-oriented guidance generated byA-APF and employs a dynamic coordination mechanism that adaptively balances exploration and exploitation.The proposed system comprises four core modules:(1)an environment modeling module that constructs grid-based obstacle maps;(2)an A-APF module that combines heuristic search from A*algorithm with repulsive force strategies from APF to generate guidance;(3)a Q-learning module that learns optimal state-action values(Q-values)through spraying robot-environment interaction and a reward function emphasizing path optimality and safety;and(4)a dynamic optimization module that ensures adaptive cooperation between Q-learning and A-APF through exploration rate control and environment-aware constraints.Simulation results demonstrate that the proposed method significantly enhances path safety in complex underground mining environments.Quantitative results indicate that,compared to the traditional Q-learning algorithm,the proposed method shortens training time by 42.95% and achieves a reduction in training failures from 78 to just 3.Compared to the static fusion algorithm,it further reduces both training time(by 10.78%)and training failures(by 50%),thereby improving overall training efficiency. 展开更多
关键词 q-learning A*algorithm artificial potential field path planning hybrid algorithm
在线阅读 下载PDF
A Novel Improved Puma Optimizer to Boost Photovoltaic Array Production in Partially Shaded Environments
7
作者 Nagwan Abdel Samee Ahmed Fathy +2 位作者 Mohamed A.Mahdy Maali Alabdulhafith Essam H.Houssein 《Computer Modeling in Engineering & Sciences》 2026年第2期737-771,共35页
This research proposes an improved Puma optimization algorithm(IPuma)as a novel dynamic recon-figuration tool for a photovoltaic(PV)array linked in total-cross-tied(TCT).The proposed algorithm utilizes the Newton-Raph... This research proposes an improved Puma optimization algorithm(IPuma)as a novel dynamic recon-figuration tool for a photovoltaic(PV)array linked in total-cross-tied(TCT).The proposed algorithm utilizes the Newton-Raphson search rule(NRSR)to boost the exploration process,especially in search spaces with more local regions,and boost the exploitation with adaptive parameters alternating with random parameters in the original Puma.The effectiveness of the introduced IPuma is confirmed through comprehensive evaluations on the CEC’20 benchmark problems.It shows superior performance compared to both established and modern metaheuristic algorithms in terms of effectively navigating the search space and achieving convergence towards near-optimal regions.The findings indicated that the IPuma algorithm demonstrates considerable statistical promise and surpasses the performance of competing algorithms.In addition,the proposed IPuma is utilized to reconfigure a 9×9 PV array that operates under different shade patterns,such as lower triangular(LT),long wide(LW),and short wide(SW).In addition to other programmed approaches,such as the Whale optimization algorithm(WOA),grey wolf optimizer(GWO),Harris Hawks optimization(HHO),particle swarm optimization(PSO),gravitational search algorithm(GSA),biogeography-based optimization(BBO),sine cosine algorithm(SCA),equilibrium optimizer(EO),and original Puma,the indicated method is contrasted to the traditional configurations of TCT and Sudoku.In addition,the metrics of mismatch power loss,maximum efficiency improvement,efficiency improvement ratio,and peak-to-mean ratio are calculated to assess the effectiveness of the indicated approach.The proposed IPuma improved the generated power by 36.72%,28.03%,and 40.97%for SW,LW,and LT,respectively,outperforming the TCT configuration.In addition,it achieved the best maximum efficiency improvement among the algorithms considered,with 26.86%,21.89%,and 29.07%for the examined patterns.The results highlight the superiority and competence of the proposed approach in both convergence rates and stability,as well as applicability to dynamically reconfigure the PV system and enhance its harvested energy. 展开更多
关键词 Photovoltaic partial shade RECONFIGURATION improved puma METAHEURISTIC
在线阅读 下载PDF
A Hybrid Approach to Software Testing Efficiency:Stacked Ensembles and Deep Q-Learning for Test Case Prioritization and Ranking
8
作者 Anis Zarrad Thomas Armstrong Jaber Jemai 《Computers, Materials & Continua》 2026年第3期1726-1746,共21页
Test case prioritization and ranking play a crucial role in software testing by improving fault detection efficiency and ensuring software reliability.While prioritization selects the most relevant test cases for opti... Test case prioritization and ranking play a crucial role in software testing by improving fault detection efficiency and ensuring software reliability.While prioritization selects the most relevant test cases for optimal coverage,ranking further refines their execution order to detect critical faults earlier.This study investigates machine learning techniques to enhance both prioritization and ranking,contributing to more effective and efficient testing processes.We first employ advanced feature engineering alongside ensemble models,including Gradient Boosted,Support Vector Machines,Random Forests,and Naive Bayes classifiers to optimize test case prioritization,achieving an accuracy score of 0.98847 and significantly improving the Average Percentage of Fault Detection(APFD).Subsequently,we introduce a deep Q-learning framework combined with a Genetic Algorithm(GA)to refine test case ranking within priority levels.This approach achieves a rank accuracy of 0.9172,demonstrating robust performance despite the increasing computational demands of specialized variation operators.Our findings highlight the effectiveness of stacked ensemble learning and reinforcement learning in optimizing test case prioritization and ranking.This integrated approach improves testing efficiency,reduces late-stage defects,and improves overall software stability.The study provides valuable information for AI-driven testing frameworks,paving the way for more intelligent and adaptive software quality assurance methodologies. 展开更多
关键词 Software testing test case prioritization test case ranking machine learning reinforcement learning deep q-learning
在线阅读 下载PDF
PEMFC Performance Degradation Prediction Based on CNN-BiLSTM with Data Augmentation by an Improved GAN
9
作者 Xiaolu Wang Haoyu Sun +1 位作者 Aiguo Wang Xin Xia 《Energy Engineering》 2026年第2期417-435,共19页
To address the issues of insufficient and imbalanced data samples in proton exchange membrane fuel cell(PEMFC)performance degradation prediction,this study proposes a data augmentation-based model to predict PEMFC per... To address the issues of insufficient and imbalanced data samples in proton exchange membrane fuel cell(PEMFC)performance degradation prediction,this study proposes a data augmentation-based model to predict PEMFC performance degradation.Firstly,an improved generative adversarial network(IGAN)with adaptive gradient penalty coefficient is proposed to address the problems of excessively fast gradient descent and insufficient diversity of generated samples.Then,the IGANis used to generate datawith a distribution analogous to real data,therebymitigating the insufficiency and imbalance of original PEMFC samples and providing the predictionmodel with training data rich in feature information.Finally,a convolutional neural network-bidirectional long short-termmemory(CNN-BiLSTM)model is adopted to predict PEMFC performance degradation.Experimental results show that the data generated by the proposed IGAN exhibits higher quality than that generated by the original GAN,and can fully characterize and enrich the original data’s features.Using the augmented data,the prediction accuracy of the CNN-BiLSTM model is significantly improved,rendering it applicable to tasks of predicting PEMFC performance degradation. 展开更多
关键词 PEMFC performance degradation prediction data augmentation improved generative adversarial network
在线阅读 下载PDF
基于深度Q-learning算法的智能电网管控模型研究
10
作者 王筠 李志鹏 +2 位作者 项旭 张军堂 石雷波 《自动化技术与应用》 2026年第2期54-57,142,共5页
设计基于深度Q-learning算法的智能电网管控模型,将可验证声明(verifiable credential, VC)和分布式数字身份(decentralized identity, DID)作为应用程序身份凭证与软件定义网络(software-defined networking, SDN)控制器,结合动态信任... 设计基于深度Q-learning算法的智能电网管控模型,将可验证声明(verifiable credential, VC)和分布式数字身份(decentralized identity, DID)作为应用程序身份凭证与软件定义网络(software-defined networking, SDN)控制器,结合动态信任评估算法与基于属性的访问控制策略,构建基于区块链的智能电网分布式SDN管控模型。在资源分配、网络拓扑动态变化以及安全威胁不断演变的情况下,实施基于区块链的分布式SDN网络的优化。实验测试结果表明,设计方法在通过深度Q-learning优化模型后累积奖励明显大幅增加,在多种安全性能方面表现出色,能够清除恶意域,确保网络环境的安全。 展开更多
关键词 SDN控制器 分布式SDN网络 深度q-learning算法 区块链 智能电网管控模型
在线阅读 下载PDF
An improved conditional denoising diffusion GAN for Mach number field reconstruction in a multi-tunnel combined inlet based on sparse parameter information
11
作者 Ke MIN Fan LEI +2 位作者 Jiale ZHANG Chengxiang ZHU Yancheng YOU 《Chinese Journal of Aeronautics》 2026年第1期169-190,共22页
The internal flow fields within a three-dimensional inward-tunning combined inlet are extremely complex,especially during the engine mode transition,where the tunnel changes may impact the flow fields significantly.To... The internal flow fields within a three-dimensional inward-tunning combined inlet are extremely complex,especially during the engine mode transition,where the tunnel changes may impact the flow fields significantly.To develop an efficient flow field reconstruction model for this,we present an Improved Conditional Denoising Diffusion Generative Adversarial Network(ICDDGAN),which integrates Conditional Denoising Diffusion Probabilistic Models(CDDPMs)with Style GAN,and introduce a reconstruction discrimination mechanism and dynamic loss weight learning strategy.We establish the Mach number flow field dataset by numerical simulation at various backpressures for the mode transition process from turbine mode to ejector ramjet mode at Mach number 2.5.The proposed ICDDGAN model,given only sparse parameter information,can rapidly generate high-quality Mach number flow fields without a large number of samples for training.The results show that ICDDGAN is superior to CDDGAN in terms of training convergence and stability.Moreover,the interpolation and extrapolation test results during backpressure conditions show that ICDDGAN can accurately and quickly reconstruct Mach number fields at various tunnel slice shapes,with a Structural Similarity Index Measure(SSIM)of over 0.96 and a Mean-Square Error(MSE)of 0.035%to actual flow fields,reducing time costs by 7-8 orders of magnitude compared to Computational Fluid Dynamics(CFD)calculations.This can provide an efficient means for rapid computation of complex flow fields. 展开更多
关键词 Flow field reconstruction improved Conditional Denoising Diffusion Generative Adversarial Network(ICDDGAN) Mode transition Sparse parameter information Three-dimensional inward-tunning combined inlet
原文传递
Improved Gain Shared Knowledge Optimizer Based Reactive Power Optimization for Various Renewable Penetrated Power Grids with Static Var Generator Participation
12
作者 Xuan Ruan HanYan +4 位作者 DonglinHu Min Zhang YingLi DiHai Bo Yang 《Energy Engineering》 2026年第3期23-56,共34页
An optimized volt-ampere reactive(VAR)control framework is proposed for transmission-level power systems to simultaneously mitigate voltage deviations and active-power losses through coordinated control of large-scale... An optimized volt-ampere reactive(VAR)control framework is proposed for transmission-level power systems to simultaneously mitigate voltage deviations and active-power losses through coordinated control of large-scale wind/solar farms with shunt static var generators(SVGs).The model explicitly represents reactive-power regulation characteristics of doubly-fed wind turbines and PV inverters under real-time meteorological conditions,and quantifies SVG high-speed compensation capability,enabling seamless transition from localized VAR management to a globally coordinated strategy.An enhanced adaptive gain-sharing knowledge optimizer(AGSK-SD)integrates simulated annealing and diversity maintenance to autonomously tune voltage-control actions,renewable source reactive-power set-points,and SVG output.The algorithm adaptively modulates knowledge factors and ratios across search phases,performs SA-based fine-grained local exploitation,and periodically re-injects population diversity to prevent premature convergence.Comprehensive tests on IEEE 9-bus and 39-bus systems demonstrate AGSK-SD’s superiority over NSGA-II and MOPSO in hypervolume(HV),inverse generative distance(IGD),and spread metrics while maintaining acceptable computational burden.The method reduces network losses from 2.7191 to 2.15 MW(20.79%reduction)and from 15.1891 to 11.22 MW(26.16%reduction)in the 9-bus and 39-bus systems respectively.Simultaneously,the cumulative voltage-deviation index decreases from 0.0277 to 3.42×10^(−4) p.u.(98.77%reduction)in the 9-bus system,and from 0.0556 to 0.0107 p.u.(80.76%reduction)in the 39-bus system.These improvements demonstrate significant suppression of line losses and voltage fluctuations.Comparative analysis with traditional heuristic optimization algorithms confirms the superior performance of the proposed approach. 展开更多
关键词 Gained-sharing knowledge improved algorithm adaptive parameter adjustment simulated annealing local search algorithms diversity enhancement mechanisms wind and solar new energy static var generator reactive power optimization
在线阅读 下载PDF
Improved Q-learning algorithm for load balance in millimeter wave backhaul networks 被引量:1
13
作者 Meng Danfeng Li Xiaohui Pu Wenjuan 《The Journal of China Universities of Posts and Telecommunications》 EI CSCD 2018年第3期8-16,共9页
With the intensive deployment of users and the drastic increase of traffic load, a millimeter wave (mmWave) backhaul network was widely investigated. A typical mmWave backhaul network consists of the macro base stat... With the intensive deployment of users and the drastic increase of traffic load, a millimeter wave (mmWave) backhaul network was widely investigated. A typical mmWave backhaul network consists of the macro base station (MBS) and the small base stations (SBSs). How to efficiently associate users with the MBS and the SBSs for load balancing is a key issue in the network. By adding a virtual power bias to the SBSs, more users can access to the SBSs to share the load of the MBS. The bias values shall be set reasonably to guarantee the backhaul efficiency and the quality of service (QoS). An improved Q-learning algorithm is proposed to effectively adjust the bias value for each SBS. In the proposed algorithm, each SBS becomes an agent with independent learning and can achieve the best behavior, namely the optimal bias value through a series of training. Besides, an improved behavior selection mechanism is adopted to improve the learning efficiency and accelerate the convergence of the algorithm. Finally, simulations conducted in the 60 GHz band demonstrate the superior performance of the proposed algorithm in backhaul efficiency and user outage probability. 展开更多
关键词 millimeter wave backhaul networks load balance user association q-learning
原文传递
玻尔兹曼优化Q-learning的高速铁路越区切换控制算法 被引量:4
14
作者 陈永 康婕 《控制理论与应用》 北大核心 2025年第4期688-694,共7页
针对5G-R高速铁路越区切换使用固定切换阈值,且忽略了同频干扰、乒乓切换等的影响,导致越区切换成功率低的问题,提出了一种玻尔兹曼优化Q-learning的越区切换控制算法.首先,设计了以列车位置–动作为索引的Q表,并综合考虑乒乓切换、误... 针对5G-R高速铁路越区切换使用固定切换阈值,且忽略了同频干扰、乒乓切换等的影响,导致越区切换成功率低的问题,提出了一种玻尔兹曼优化Q-learning的越区切换控制算法.首先,设计了以列车位置–动作为索引的Q表,并综合考虑乒乓切换、误码率等构建Q-learning算法回报函数;然后,提出玻尔兹曼搜索策略优化动作选择,以提高切换算法收敛性能;最后,综合考虑基站同频干扰的影响进行Q表更新,得到切换判决参数,从而控制切换执行.仿真结果表明:改进算法在不同运行速度和不同运行场景下,较传统算法能有效提高切换成功率,且满足无线通信服务质量QoS的要求. 展开更多
关键词 越区切换 5G-R q-learning算法 玻尔兹曼优化策略
在线阅读 下载PDF
多代理Nash Q-Learning模型行动选择策略研究
15
作者 韩松 李璨 《中国管理科学》 北大核心 2025年第12期110-120,共11页
多代理Q-Learning模型的行动选择策略优化是复杂经济学博弈模拟过程中亟待解决的问题之一。本文将强制ε-greedy行动选择策略引入多代理Nash Q-Learning模型中,通过博弈实验对比该行动选择策略与经典ε-greedy策略的效果,探究该行动选... 多代理Q-Learning模型的行动选择策略优化是复杂经济学博弈模拟过程中亟待解决的问题之一。本文将强制ε-greedy行动选择策略引入多代理Nash Q-Learning模型中,通过博弈实验对比该行动选择策略与经典ε-greedy策略的效果,探究该行动选择策略对算法计算速度和收敛情况的影响;同时,根据实验结果进行了算法真实性理论验证,并基于多代理模型的性质给出强制ε-greedy的普适性推论。模拟结果表明,强制ε-greedy适用于更复杂、涉及状态行动更多、回合更多的博弈,此时能有效提升多代理Q-Learning算法运行性能,但由于其本质是初期增加对行动的探索,这会消耗一些回合,导致均衡收敛率下降。因此,强制ε-greedy带来的性能提升与损失的均衡收敛率是使用者在应用该策略时需要权衡的问题。 展开更多
关键词 Nash q-learning 强制ε-greedy 行动选择
原文传递
基于改进Q-learning算法的XGBoost模型智能预测页岩断裂韧性
16
作者 张艳 王宗勇 +3 位作者 张豪 吴建成 祝春波 吴高平 《长江大学学报(自然科学版)》 2025年第5期58-65,共8页
岩石的断裂韧性是影响裂缝扩展及延伸的重要因素,同时也是储层可压性评价的关键参数。但目前断裂韧性直接测试较为复杂,且现有的断裂韧性预测方法多基于断裂韧性与其他物理参数之间的拟合关系,难以形成整个井段的连续剖面。通过室内断... 岩石的断裂韧性是影响裂缝扩展及延伸的重要因素,同时也是储层可压性评价的关键参数。但目前断裂韧性直接测试较为复杂,且现有的断裂韧性预测方法多基于断裂韧性与其他物理参数之间的拟合关系,难以形成整个井段的连续剖面。通过室内断裂韧性实验,分析了页岩断裂韧性与其他物理力学参数之间的关系,建立了断裂韧性拟合公式,同时采用XGBoost模型,利用地球物理测井数据,通过改进的Q-learning算法优化XGBoost模型超参数,实现了岩石断裂韧性的预测。研究结果表明,Ⅰ型断裂韧性与抗拉强度、声波速度相关性较高,与密度相关性较低,与纵波速度、横波速度、抗拉强度、岩石密度均成正相关。基于改进的Q-learning优化断裂韧性智能预测的XGBoost模型预测准确性较高,预测断裂韧性与拟合断裂韧性相关度高达0.981,所提出的岩石断裂韧性预测模型是可靠的,可为压裂工程设计提供参考。 展开更多
关键词 断裂韧性 测井数据 智能算法 q-learning XGBoost 压裂设计
在线阅读 下载PDF
无监督环境下改进Q-learning算法在网络异常诊断中的应用
17
作者 梁西陈 《六盘水师范学院学报》 2025年第3期89-97,共9页
针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数... 针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。 展开更多
关键词 无监督 改进q-learning ADU单元 状态值 联合分布概率
在线阅读 下载PDF
融合改进Q-learning的遗传算法求解柔性作业车间调度问题
18
作者 陈涛 赵厚安 《常州工学院学报》 2025年第5期17-24,82,共9页
传统遗传算法求解柔性作业车间调度问题,存在参数敏感性差、容易陷入局部最优等问题。强化学习通过探索、利用的平衡,可以提高解的多样性和精确度,在此基础上,通过融合改进Q-learning的遗传算法来求解以最小化最大完工时间为目标的柔性... 传统遗传算法求解柔性作业车间调度问题,存在参数敏感性差、容易陷入局部最优等问题。强化学习通过探索、利用的平衡,可以提高解的多样性和精确度,在此基础上,通过融合改进Q-learning的遗传算法来求解以最小化最大完工时间为目标的柔性作业车间调度模型。采用混合策略初始化种群,提高种群质量,引入精英保留策略,保留进化过程中的优质染色体,通过精细设计强化学习的状态空间、动作设置、奖励机制和基于算法性能的自适应探索率衰减机制,实现对遗传算法关键参数的快速自适应调优,在全局搜索和局部利用之间实现更为精细的平衡。最后,通过Brandimarte的10个基准算例进行仿真实验,与3种不同的算法对比,该方法表现出了较好的寻优能力,证实了算法的有效性。 展开更多
关键词 柔性作业车间调度 q-learning 遗传算法 自适应
在线阅读 下载PDF
基于Q-Learning反馈机制的短距离无线通信网络多信道调度方法
19
作者 李忠 严莉 《计算机与网络》 2025年第5期470-479,共10页
由于传统信道调度方法受传统固定规则影响,导致出现信道资源利用率低下、数据通信不稳定等问题。为解决这一问题,提出基于Q-Learning反馈机制的短距离无线通信网络多信道调度方法。深入核心网系统架构与无线接入网系统架构的拓扑架构与... 由于传统信道调度方法受传统固定规则影响,导致出现信道资源利用率低下、数据通信不稳定等问题。为解决这一问题,提出基于Q-Learning反馈机制的短距离无线通信网络多信道调度方法。深入核心网系统架构与无线接入网系统架构的拓扑架构与底层逻辑,分析短距离无线通信网络架构;基于Dijkstra算法,结合短距离无线通信网络通信节点无向图进行网络信道节点优化部署;计算多信道状态特征参数,构建信道状态预估模型,预估短距离无线通信网络多信道状态;创新性地基于Q-Learning反馈机制,利用Q-Learning算法的强化学习能力,将强化学习过程视为马尔可夫决策过程,实现短距离无线通信网络多信道调度。实验结果表明:利用设计方法获取的平均丢包率最大值为0.03、网络吞吐量最大值为4.5 Mb/s,能够在维持较低丢包率的同时,保持较高的吞吐量,具有较高的信道资源利用效率。在低流量负载区,通信延迟均低于0.4 s、在高流量负载区通信延迟最高为0.4 s,最低值为0.26 s,可以有效实现通信数据高效、稳定传输。 展开更多
关键词 q-learning反馈机制 短距离 无线通信网络 多信道调度 信道状态 马尔可夫决策
在线阅读 下载PDF
融合Q-learning的A^(*)预引导蚁群路径规划算法 被引量:1
20
作者 殷笑天 杨丽英 +1 位作者 刘干 何玉庆 《传感器与微系统》 北大核心 2025年第8期143-147,153,共6页
针对传统蚁群优化(ACO)算法在复杂环境路径规划中存在易陷入局部最优、收敛速度慢及避障能力不足的问题,提出了一种融合Q-learning基于分层信息素机制的A^(*)算法预引导蚁群路径规划算法-QHACO算法。首先,通过A^(*)算法预分配全局信息素... 针对传统蚁群优化(ACO)算法在复杂环境路径规划中存在易陷入局部最优、收敛速度慢及避障能力不足的问题,提出了一种融合Q-learning基于分层信息素机制的A^(*)算法预引导蚁群路径规划算法-QHACO算法。首先,通过A^(*)算法预分配全局信息素,引导初始路径快速逼近最优解;其次,构建全局-局部双层信息素协同模型,利用全局层保留历史精英路径经验、局部层实时响应环境变化;最后,引入Q-learning方向性奖励函数优化决策过程,在路径拐点与障碍边缘施加强化引导信号。实验表明:在25×24中等复杂度地图中,QHACO算法较传统ACO算法最优路径缩短22.7%,收敛速度提升98.7%;在50×50高密度障碍环境中,最优路径长度优化16.9%,迭代次数减少95.1%。相比传统ACO算法,QHACO算法在最优性、收敛速度与避障能力上均有显著提升,展现出较强环境适应性。 展开更多
关键词 蚁群优化算法 路径规划 局部最优 收敛速度 q-learning 分层信息素 A^(*)算法
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部