期刊文献+
共找到137,956篇文章
< 1 2 250 >
每页显示 20 50 100
Effective Controller Placement in Software-Defined Internet-of-Things Leveraging Deep Q-Learning (DQL)
1
作者 Jehad Ali Mohammed J.F.Alenazi 《Computers, Materials & Continua》 SCIE EI 2024年第12期4015-4032,共18页
The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent comm... The controller is a main component in the Software-Defined Networking(SDN)framework,which plays a significant role in enabling programmability and orchestration for 5G and next-generation networks.In SDN,frequent communication occurs between network switches and the controller,which manages and directs traffic flows.If the controller is not strategically placed within the network,this communication can experience increased delays,negatively affecting network performance.Specifically,an improperly placed controller can lead to higher end-to-end(E2E)delay,as switches must traverse more hops or encounter greater propagation delays when communicating with the controller.This paper introduces a novel approach using Deep Q-Learning(DQL)to dynamically place controllers in Software-Defined Internet of Things(SD-IoT)environments,with the goal of minimizing E2E delay between switches and controllers.E2E delay,a crucial metric for network performance,is influenced by two key factors:hop count,which measures the number of network nodes data must traverse,and propagation delay,which accounts for the physical distance between nodes.Our approach models the controller placement problem as a Markov Decision Process(MDP).In this model,the network configuration at any given time is represented as a“state,”while“actions”correspond to potential decisions regarding the placement of controllers or the reassignment of switches to controllers.Using a Deep Q-Network(DQN)to approximate the Q-function,the system learns the optimal controller placement by maximizing the cumulative reward,which is defined as the negative of the E2E delay.Essentially,the lower the delay,the higher the reward the system receives,enabling it to continuously improve its controller placement strategy.The experimental results show that our DQL-based method significantly reduces E2E delay when compared to traditional benchmark placement strategies.By dynamically learning from the network’s real-time conditions,the proposed method ensures that controller placement remains efficient and responsive,reducing communication delays and enhancing overall network performance. 展开更多
关键词 Software-defined networking deep q-learning controller placement quality of service
在线阅读 下载PDF
Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge 被引量:27
2
作者 Lan Jiang Hongyun Huang Zuohua Ding 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2020年第4期1179-1189,共11页
Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay ... Path planning and obstacle avoidance are two challenging problems in the study of intelligent robots. In this paper, we develop a new method to alleviate these problems based on deep Q-learning with experience replay and heuristic knowledge. In this method, a neural network has been used to resolve the "curse of dimensionality" issue of the Q-table in reinforcement learning. When a robot is walking in an unknown environment, it collects experience data which is used for training a neural network;such a process is called experience replay.Heuristic knowledge helps the robot avoid blind exploration and provides more effective data for training the neural network. The simulation results show that in comparison with the existing methods, our method can converge to an optimal action strategy with less time and can explore a path in an unknown environment with fewer steps and larger average reward. 展开更多
关键词 deep q-learning(DQL) experience replay(ER) heuristic knowledge(HK) path planning
在线阅读 下载PDF
Intelligent Fast Cell Association Scheme Based on Deep Q-Learning in Ultra-Dense Cellular Networks 被引量:1
3
作者 Jinhua Pan Lusheng Wang +2 位作者 Hai Lin Zhiheng Zha Caihong Kai 《China Communications》 SCIE CSCD 2021年第2期259-270,共12页
To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q... To support dramatically increased traffic loads,communication networks become ultra-dense.Traditional cell association(CA)schemes are timeconsuming,forcing researchers to seek fast schemes.This paper proposes a deep Q-learning based scheme,whose main idea is to train a deep neural network(DNN)to calculate the Q values of all the state-action pairs and the cell holding the maximum Q value is associated.In the training stage,the intelligent agent continuously generates samples through the trial-anderror method to train the DNN until convergence.In the application stage,state vectors of all the users are inputted to the trained DNN to quickly obtain a satisfied CA result of a scenario with the same BS locations and user distribution.Simulations demonstrate that the proposed scheme provides satisfied CA results in a computational time several orders of magnitudes shorter than traditional schemes.Meanwhile,performance metrics,such as capacity and fairness,can be guaranteed. 展开更多
关键词 ultra-dense cellular networks(UDCN) cell association(CA) deep q-learning proportional fairness q-learning
在线阅读 下载PDF
Deep Q-Learning Based Optimal Query Routing Approach for Unstructured P2P Network 被引量:1
4
作者 Mohammad Shoab Abdullah Shawan Alotaibi 《Computers, Materials & Continua》 SCIE EI 2022年第3期5765-5781,共17页
Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environmen... Deep Reinforcement Learning(DRL)is a class of Machine Learning(ML)that combines Deep Learning with Reinforcement Learning and provides a framework by which a system can learn from its previous actions in an environment to select its efforts in the future efficiently.DRL has been used in many application fields,including games,robots,networks,etc.for creating autonomous systems that improve themselves with experience.It is well acknowledged that DRL is well suited to solve optimization problems in distributed systems in general and network routing especially.Therefore,a novel query routing approach called Deep Reinforcement Learning based Route Selection(DRLRS)is proposed for unstructured P2P networks based on a Deep Q-Learning algorithm.The main objective of this approach is to achieve better retrieval effectiveness with reduced searching cost by less number of connected peers,exchangedmessages,and reduced time.The simulation results shows a significantly improve searching a resource with compression to k-Random Walker and Directed BFS.Here,retrieval effectiveness,search cost in terms of connected peers,and average overhead are 1.28,106,149,respectively. 展开更多
关键词 Reinforcement learning deep q-learning unstructured p2p network query routing
在线阅读 下载PDF
Multi-User MmWave Beam Tracking via Multi-Agent Deep Q-Learning 被引量:2
5
作者 MENG Fan HUANG Yongming +1 位作者 LU Zhaohua XIAO Huahua 《ZTE Communications》 2023年第2期53-60,共8页
Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynami... Beamforming is significant for millimeter wave multi-user massive multi-input multi-output systems.In the meanwhile,the overhead cost of channel state information and beam training is considerable,especially in dynamic environments.To reduce the overhead cost,we propose a multi-user beam tracking algorithm using a distributed deep Q-learning method.With online learning of users’moving trajectories,the proposed algorithm learns to scan a beam subspace to maximize the average effective sum rate.Considering practical implementation,we model the continuous beam tracking problem as a non-Markov decision process and thus develop a simplified training scheme of deep Q-learning to reduce the training complexity.Furthermore,we propose a scalable state-action-reward design for scenarios with different users and antenna numbers.Simulation results verify the effectiveness of the designed method. 展开更多
关键词 multi-agent deep q-learning centralized training and distributed execution mmWave communication beam tracking scalability
在线阅读 下载PDF
基于Double Deep Q-learning的无线通信节点覆盖优化 被引量:1
6
作者 李忠涛 《电子技术与软件工程》 2021年第14期1-3,共3页
本文拟采用Double Deep Q-learning模型进行算法设计,该算法是强化学习中的一种values-based算法,实现一种神经网络模型来代替表格Q-Table,解决了系统状态过多导致的Q-Table过大问题。
关键词 无线通信节点 最优路径 Double deep q-learning
在线阅读 下载PDF
Deep Q-Learning Based Computation Offloading Strategy for Mobile Edge Computing 被引量:6
7
作者 Yifei Wei Zhaoying Wang +1 位作者 Da Guo FRichard Yu 《Computers, Materials & Continua》 SCIE EI 2019年第4期89-104,共16页
To reduce the transmission latency and mitigate the backhaul burden of the centralized cloud-based network services,the mobile edge computing(MEC)has been drawing increased attention from both industry and academia re... To reduce the transmission latency and mitigate the backhaul burden of the centralized cloud-based network services,the mobile edge computing(MEC)has been drawing increased attention from both industry and academia recently.This paper focuses on mobile users’computation offloading problem in wireless cellular networks with mobile edge computing for the purpose of optimizing the computation offloading decision making policy.Since wireless network states and computing requests have stochastic properties and the environment’s dynamics are unknown,we use the modelfree reinforcement learning(RL)framework to formulate and tackle the computation offloading problem.Each mobile user learns through interactions with the environment and the estimate of its performance in the form of value function,then it chooses the overhead-aware optimal computation offloading action(local computing or edge computing)based on its state.The state spaces are high-dimensional in our work and value function is unrealistic to estimate.Consequently,we use deep reinforcement learning algorithm,which combines RL method Q-learning with the deep neural network(DNN)to approximate the value functions for complicated control applications,and the optimal policy will be obtained when the value function reaches convergence.Simulation results showed that the effectiveness of the proposed method in comparison with baseline methods in terms of total overheads of all mobile users. 展开更多
关键词 Mobile edge computing computation offloading resource allocation deep reinforcement learning
在线阅读 下载PDF
基于Deep Q-Learning的抽取式摘要生成方法
8
作者 王灿宇 孙晓海 +4 位作者 吴叶辉 季荣彪 李亚东 张少如 杨士豪 《吉林大学学报(信息科学版)》 CAS 2023年第2期306-314,共9页
为解决训练过程中需要句子级标签的问题,提出一种基于深度强化学习的无标签抽取式摘要生成方法,将文本摘要转化为Q-learning问题,并利用DQN(Deep Q-Network)学习Q函数。为有效表示文档,利用BERT(Bidirectional Encoder Representations ... 为解决训练过程中需要句子级标签的问题,提出一种基于深度强化学习的无标签抽取式摘要生成方法,将文本摘要转化为Q-learning问题,并利用DQN(Deep Q-Network)学习Q函数。为有效表示文档,利用BERT(Bidirectional Encoder Representations from Transformers)作为句子编码器,Transformer作为文档编码器。解码器充分考虑了句子的信息富集度、显著性、位置重要性以及其与当前摘要之间的冗余程度等重要性等信息。该方法在抽取摘要时不需要句子级标签,可显著减少标注工作量。实验结果表明,该方法在CNN(Cable News Network)/DailyMail数据集上取得了最高的Rouge-L(38.35)以及可比较的Rouge-1(42.07)和Rouge-2(18.32)。 展开更多
关键词 抽取式文本摘要 BERT模型 编码器 深度强化学习
在线阅读 下载PDF
玻尔兹曼优化Q-learning的高速铁路越区切换控制算法 被引量:3
9
作者 陈永 康婕 《控制理论与应用》 北大核心 2025年第4期688-694,共7页
针对5G-R高速铁路越区切换使用固定切换阈值,且忽略了同频干扰、乒乓切换等的影响,导致越区切换成功率低的问题,提出了一种玻尔兹曼优化Q-learning的越区切换控制算法.首先,设计了以列车位置–动作为索引的Q表,并综合考虑乒乓切换、误... 针对5G-R高速铁路越区切换使用固定切换阈值,且忽略了同频干扰、乒乓切换等的影响,导致越区切换成功率低的问题,提出了一种玻尔兹曼优化Q-learning的越区切换控制算法.首先,设计了以列车位置–动作为索引的Q表,并综合考虑乒乓切换、误码率等构建Q-learning算法回报函数;然后,提出玻尔兹曼搜索策略优化动作选择,以提高切换算法收敛性能;最后,综合考虑基站同频干扰的影响进行Q表更新,得到切换判决参数,从而控制切换执行.仿真结果表明:改进算法在不同运行速度和不同运行场景下,较传统算法能有效提高切换成功率,且满足无线通信服务质量QoS的要求. 展开更多
关键词 越区切换 5G-R q-learning算法 玻尔兹曼优化策略
在线阅读 下载PDF
A deep Q-learning model for sequential task offloading in edge AI systems
10
作者 Dong Liu Shiheng Gu +1 位作者 Xinyu Fan Xu Zheng 《Intelligent and Converged Networks》 EI 2024年第3期207-221,共15页
Currently,edge Artificial Intelligence(AI)systems have significantly facilitated the functionalities of intelligent devices such as smartphones and smart cars,and supported diverse applications and services.This funda... Currently,edge Artificial Intelligence(AI)systems have significantly facilitated the functionalities of intelligent devices such as smartphones and smart cars,and supported diverse applications and services.This fundamental supports come from continuous data analysis and computation over these devices.Considering the resource constraints of terminal devices,multi-layer edge artificial intelligence systems improve the overall computing power of the system by scheduling computing tasks to edge and cloud servers for execution.Previous efforts tend to ignore the nature of strong pipelined characteristics of processing tasks in edge AI systems,such as the encryption,decryption and consensus algorithm supporting the implementation of Blockchain techniques.Therefore,this paper proposes a new pipelined task scheduling algorithm(referred to as PTS-RDQN),which utilizes the system representation ability of deep reinforcement learning and integrates multiple dimensional information to achieve global task scheduling.Specifically,a co-optimization strategy based on Rainbow Deep Q-Learning(RainbowDQN)is proposed to allocate computation tasks for mobile devices,edge and cloud servers,which is able to comprehensively consider the balance of task turnaround time,link quality,and other factors,thus effectively improving system performance and user experience.In addition,a task scheduling strategy based on PTS-RDQN is proposed,which is capable of realizing dynamic task allocation according to device load.The results based on many simulation experiments show that the proposed method can effectively improve the resource utilization,and provide an effective task scheduling strategy for the edge computing system with cloud-edge-end architecture. 展开更多
关键词 edge computing task scheduling reinforcement learning Rainbow deep q-learning(RainbowDQN)
原文传递
无监督环境下改进Q-learning算法在网络异常诊断中的应用
11
作者 梁西陈 《六盘水师范学院学报》 2025年第3期89-97,共9页
针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数... 针对无监督环境下传统网络异常诊断算法存在异常点定位和异常数据分类准确率低等不足,通过设计一种基于改进Q-learning算法的无线网络异常诊断方法:首先基于ADU(Asynchronous Data Unit异步数据单元)单元采集无线网络的数据流,并提取数据包特征;然后构建Q-learning算法模型探索状态值和奖励值的平衡点,利用SA(Simulated Annealing模拟退火)算法从全局视角对下一时刻状态进行精确识别;最后确定训练样本的联合分布概率,提升输出值的逼近性能以达到平衡探索与代价之间的均衡。测试结果显示:改进Q-learning算法的网络异常定位准确率均值达99.4%,在不同类型网络异常的分类精度和分类效率等方面,也优于三种传统网络异常诊断方法。 展开更多
关键词 无监督 改进q-learning ADU单元 状态值 联合分布概率
在线阅读 下载PDF
基于Q-learning算法的机场航班延误预测 被引量:1
12
作者 刘琪 乐美龙 《航空计算技术》 2025年第1期28-32,共5页
将改进的深度信念网络(DBN)和Q-learning算法结合建立组合预测模型。首先将延误预测问题建模为一个标准的马尔可夫决策过程,使用改进的深度信念网络来选择关键特征。经深度信念网络分析,从46个特征变量中选择出27个关键特征类别作为延... 将改进的深度信念网络(DBN)和Q-learning算法结合建立组合预测模型。首先将延误预测问题建模为一个标准的马尔可夫决策过程,使用改进的深度信念网络来选择关键特征。经深度信念网络分析,从46个特征变量中选择出27个关键特征类别作为延误时间的最终解释变量输入Q-learning算法中,从而实现对航班延误的实时预测。使用北京首都国际机场航班数据进行测试实验,实验结果表明,所提出的模型可以有效预测航班延误,平均误差为4.05 min。将提出的组合算法性能与4种基准方法进行比较,基于DBN的Q-learning算法的延误预测准确性高于另外四种算法,具有较高的预测精度。 展开更多
关键词 航空运输 航班延误预测 深度信念网络 q-learning 航班延误
在线阅读 下载PDF
基于天球网格的大规模LEO星座Q-Learning QoS路由算法
13
作者 马伟 肖嵩 +1 位作者 周诠 蔡宇茜 《空间电子技术》 2025年第S1期132-139,共8页
智能化QoS路由是大规模LEO星座的研究热点和难点。文章聚焦LEO星座虚实拓扑漂移、多业务QoS冲突、动态负载失衡等问题,提出了一种基于天球网格的Q-Learning QoS路由算法。通过将非均匀离散化天球与北斗网格编码融合,解决链路频繁切换及... 智能化QoS路由是大规模LEO星座的研究热点和难点。文章聚焦LEO星座虚实拓扑漂移、多业务QoS冲突、动态负载失衡等问题,提出了一种基于天球网格的Q-Learning QoS路由算法。通过将非均匀离散化天球与北斗网格编码融合,解决链路频繁切换及虚实拓扑同步问题。在此基础上结合业务热力图设计了Q-Learning路由算法,以带宽、负载、热力等级、跳数为联合优化目标,构建差异化QoS奖励机制,通过实时学习动态规避拥塞链路。仿真结果表明,本文算法相较HLLMR和Dijkstra算法,丢包率分别降低4%和11%,吞吐量提升7%和15%,时延与HLLMR相当,实现了大规模LEO星座QoS保障与负载均衡的协同优化。 展开更多
关键词 天球网格 热力图 q-learning QOS路由
在线阅读 下载PDF
融合Q-learning的A^(*)预引导蚁群路径规划算法
14
作者 殷笑天 杨丽英 +1 位作者 刘干 何玉庆 《传感器与微系统》 北大核心 2025年第8期143-147,153,共6页
针对传统蚁群优化(ACO)算法在复杂环境路径规划中存在易陷入局部最优、收敛速度慢及避障能力不足的问题,提出了一种融合Q-learning基于分层信息素机制的A^(*)算法预引导蚁群路径规划算法-QHACO算法。首先,通过A^(*)算法预分配全局信息素... 针对传统蚁群优化(ACO)算法在复杂环境路径规划中存在易陷入局部最优、收敛速度慢及避障能力不足的问题,提出了一种融合Q-learning基于分层信息素机制的A^(*)算法预引导蚁群路径规划算法-QHACO算法。首先,通过A^(*)算法预分配全局信息素,引导初始路径快速逼近最优解;其次,构建全局-局部双层信息素协同模型,利用全局层保留历史精英路径经验、局部层实时响应环境变化;最后,引入Q-learning方向性奖励函数优化决策过程,在路径拐点与障碍边缘施加强化引导信号。实验表明:在25×24中等复杂度地图中,QHACO算法较传统ACO算法最优路径缩短22.7%,收敛速度提升98.7%;在50×50高密度障碍环境中,最优路径长度优化16.9%,迭代次数减少95.1%。相比传统ACO算法,QHACO算法在最优性、收敛速度与避障能力上均有显著提升,展现出较强环境适应性。 展开更多
关键词 蚁群优化算法 路径规划 局部最优 收敛速度 q-learning 分层信息素 A^(*)算法
在线阅读 下载PDF
改进的自校正Q-learning应用于智能机器人路径规划 被引量:1
15
作者 任伟 朱建鸿 《机械科学与技术》 北大核心 2025年第1期126-132,共7页
为了解决智能机器人路径规划中存在的一些问题,提出了一种改进的自校正Q-learning算法。首先,对其贪婪搜索因子进行了改进,采用动态的搜索因子,对探索和利用之间的关系进行了更好地平衡;其次,在Q值初始化阶段,利用当前位置和目标位置距... 为了解决智能机器人路径规划中存在的一些问题,提出了一种改进的自校正Q-learning算法。首先,对其贪婪搜索因子进行了改进,采用动态的搜索因子,对探索和利用之间的关系进行了更好地平衡;其次,在Q值初始化阶段,利用当前位置和目标位置距离的倒数代替传统的Q-learning算法中的全零或随机初始化,大大加快了收敛速度;最后,针对传统的Q-learning算法中Q函数的最大化偏差,引入自校正估计器来修正最大化偏差。通过仿真实验对提出的改进思路进行了验证,结果表明:改进的算法能够很大程度的提高算法的学习效率,在各个方面相比传统算法都有了较大的提升。 展开更多
关键词 路径规划 q-learning 贪婪搜索 初始化 自校正
在线阅读 下载PDF
基于非策略Q-learning的欺骗攻击下未知线性离散系统最优跟踪控制
16
作者 宋星星 储昭碧 《控制与决策》 北大核心 2025年第5期1641-1650,共10页
针对多重欺骗攻击下动力学信息未知的线性离散系统,提出一种非策略Q-learning算法解决系统的最优跟踪控制问题.首先,考虑加入一个权重矩阵建立控制器通信信道遭受多重欺骗攻击的输入模型,并结合参考命令生成器构建增广跟踪系统.在线性... 针对多重欺骗攻击下动力学信息未知的线性离散系统,提出一种非策略Q-learning算法解决系统的最优跟踪控制问题.首先,考虑加入一个权重矩阵建立控制器通信信道遭受多重欺骗攻击的输入模型,并结合参考命令生成器构建增广跟踪系统.在线性二次跟踪框架内将系统的最优跟踪控制表达为欺骗攻击与控制输入同时参与的零和博弈问题.其次,设计一种基于状态数据的非策略Q-learning算法学习系统最优跟踪控制增益,解决应用中控制增益不能按照给定要求更新的问题,并证明在满足持续激励条件的探测噪声下该算法的求解不存在偏差.同时考虑系统状态不可测的情况,设计基于输出数据的非策略Q-learning算法.最后,通过对F-16飞机自动驾驶仪的跟踪控制仿真,验证所设计非策略Q-learning算法的有效性以及对探测噪声影响的无偏性. 展开更多
关键词 欺骗攻击 最优跟踪 非策略q-learning 零和博弈
原文传递
基于Q-learning的改进NSGA-Ⅲ求解高维多目标柔性作业车间调度问题
17
作者 张小培 陈勇 +1 位作者 王宸 袁春辉 《湖北汽车工业学院学报》 2025年第3期56-63,共8页
针对机械加工车间多品种、小批量的生产模式,以最小化总能耗、最大完工时间、机器负载和总拖期为优化目标建立高维多目标柔性作业车间调度模型,并利用改进NSGA-Ⅲ进行求解。采用机器、工序和批量的三重编码方式进行编码,通过Logistic映... 针对机械加工车间多品种、小批量的生产模式,以最小化总能耗、最大完工时间、机器负载和总拖期为优化目标建立高维多目标柔性作业车间调度模型,并利用改进NSGA-Ⅲ进行求解。采用机器、工序和批量的三重编码方式进行编码,通过Logistic映射生成初始混沌序列初始化种群,根据目标解的质量指标构建强化学习状态空间,通过Q-learning训练调整邻域搜索策略。最后通过对比基准算例及实例验证了模型的有效性和优越性。 展开更多
关键词 柔性作业 目标优化 批量调度 q-learning 邻域搜索
在线阅读 下载PDF
基于Double Q-Learning的改进蝗虫算法求解分布式柔性作业车间逆调度问题
18
作者 胡旭伦 唐红涛 《机床与液压》 北大核心 2025年第20期52-63,共12页
针对分布式柔性作业车间中存在的资源分配不均和调度稳定性不足问题,构建以最小化最大完工时间、机器总能耗和偏离度为目标的逆调度数学模型,提出一种基于Double Q-Learning的改进多目标蝗虫优化算法(DQIGOA)。针对该问题设计一种混合... 针对分布式柔性作业车间中存在的资源分配不均和调度稳定性不足问题,构建以最小化最大完工时间、机器总能耗和偏离度为目标的逆调度数学模型,提出一种基于Double Q-Learning的改进多目标蝗虫优化算法(DQIGOA)。针对该问题设计一种混合三层编码方式;提出一种基于逆调度特点的种群初始化方式以提高种群质量;引入权重平衡因子来提高非支配解存档中解集的多样性;将强化学习中的Double Q-Learning机制融入非支配解的选择过程,通过动态动作策略优化目标解的选取,提升调度方案的全局搜索能力与局部优化效率。最后构建26组算例,通过策略有效性分析证明了所提策略可显著提升DQIGOA算法的性能,并通过与NSGA-II、DE和SPEA-II算法进行对比证明DQIGOA算法的有效性。结果表明:相比NSGA-II、DE和SPEA-II算法,DQIGOA算法在HV、IGD、SP指标上均有优势,证明了DQIGOA能够有效提升解的收敛速度和多样性分布,在动态扰动条件下表现出更强的鲁棒性。 展开更多
关键词 分布式柔性作业车间 逆调度 蝗虫算法 Double q-learning机制
在线阅读 下载PDF
基于Q-learning的广域物联网热点地区MAC层机制设计
19
作者 雷迪 刘向 +4 位作者 孙文彬 杨欣 许茜 陈丽丽 易波 《移动通信》 2025年第8期90-95,共6页
在广域物联网应用中,热点地区由于终端密集接入、业务负载波动大,存在接入冲突频发、信道资源利用率低下等问题。作为共享无线信道管理重要一环,MAC层协议在提升系统吞吐量与接入效率方面发挥着核心作用。分析了热点地区MAC层协议所需特... 在广域物联网应用中,热点地区由于终端密集接入、业务负载波动大,存在接入冲突频发、信道资源利用率低下等问题。作为共享无线信道管理重要一环,MAC层协议在提升系统吞吐量与接入效率方面发挥着核心作用。分析了热点地区MAC层协议所需特征,提出基于Q-learning算法的优化方案,在节点侧引入强化学习模型以实现参数自适应调整。在传统CSMA/CA协议基础上,设计了结合动态RTS/CTS机制与动态退避窗口的接入机制。仿真结果表明,所提出的优化方案在系统吞吐量、平衡信道冲突率与利用率方面有一定提升。 展开更多
关键词 物联网 MAC层协议 CSMA/CA q-learning
在线阅读 下载PDF
面向物流机器人的改进Q-Learning动态避障算法研究 被引量:1
20
作者 王力 赵全海 黄石磊 《计算机测量与控制》 2025年第3期267-274,共8页
为提升物流机器人(AMR)在复杂环境中的自主导航与避障能力,改善传统Q-Learning算法在动态环境中的收敛速度慢、路径规划不够优化等问题;研究引入模糊退火算法对Q-Learning算法进行路径节点和搜索路径优化,删除多余节点和非必要转折;并... 为提升物流机器人(AMR)在复杂环境中的自主导航与避障能力,改善传统Q-Learning算法在动态环境中的收敛速度慢、路径规划不够优化等问题;研究引入模糊退火算法对Q-Learning算法进行路径节点和搜索路径优化,删除多余节点和非必要转折;并为平衡好Q-Learning算法的探索和利用问题,提出以贪婪法优化搜索策略,并借助改进动态窗口法对进行路径节点和平滑加速改进,实现局部路径规划,以提高改进Q-Learning算法在AMR动态避障中的搜索性能和效率;结果表明,改进Q-Learning算法能有效优化搜索路径,能较好避开动态障碍物和静态障碍物,与其他算法的距离差幅至少大于1 m;改进算法在局部路径中的避障轨迹更趋近于期望值,最大搜索时间不超过3 s,优于其他算法,且其在不同场景下的避障路径长度和运动时间减少幅度均超过10%,避障成功率超过90%;研究方法能满足智慧仓储、智能制造等工程领域对物流机器人高效、安全作业的需求。 展开更多
关键词 物流机器人 q-learning算法 DWA 多目标规划 障碍物 避障
在线阅读 下载PDF
上一页 1 2 250 下一页 到第
使用帮助 返回顶部