期刊文献+
共找到37篇文章
< 1 2 >
每页显示 20 50 100
Decoding topological XYZ^(2) codes with reinforcement learning based on attention mechanisms
1
作者 陈庆辉 姬宇欣 +2 位作者 王柯涵 马鸿洋 纪乃华 《Chinese Physics B》 SCIE EI CAS CSCD 2024年第6期262-270,共9页
Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum co... Quantum error correction, a technique that relies on the principle of redundancy to encode logical information into additional qubits to better protect the system from noise, is necessary to design a viable quantum computer. For this new topological stabilizer code-XYZ^(2) code defined on the cellular lattice, it is implemented on a hexagonal lattice of qubits and it encodes the logical qubits with the help of stabilizer measurements of weight six and weight two. However topological stabilizer codes in cellular lattice quantum systems suffer from the detrimental effects of noise due to interaction with the environment. Several decoding approaches have been proposed to address this problem. Here, we propose the use of a state-attention based reinforcement learning decoder to decode XYZ^(2) codes, which enables the decoder to more accurately focus on the information related to the current decoding position, and the error correction accuracy of our reinforcement learning decoder model under the optimisation conditions can reach 83.27% under the depolarizing noise model, and we have measured thresholds of 0.18856 and 0.19043 for XYZ^(2) codes at code spacing of 3–7 and 7–11, respectively. our study provides directions and ideas for applications of decoding schemes combining reinforcement learning attention mechanisms to other topological quantum error-correcting codes. 展开更多
关键词 quantum error correction topological quantum stabilizer code reinforcement learning attention mechanism
原文传递
Approximate error correction scheme for three-dimensional surface codes based reinforcement learning
2
作者 曲英杰 陈钊 +1 位作者 王伟杰 马鸿洋 《Chinese Physics B》 SCIE EI CAS CSCD 2023年第10期229-240,共12页
Quantum error correction technology is an important method to eliminate errors during the operation of quantum computers.In order to solve the problem of influence of errors on physical qubits,we propose an approximat... Quantum error correction technology is an important method to eliminate errors during the operation of quantum computers.In order to solve the problem of influence of errors on physical qubits,we propose an approximate error correction scheme that performs dimension mapping operations on surface codes.This error correction scheme utilizes the topological properties of error correction codes to map the surface code dimension to three dimensions.Compared to previous error correction schemes,the present three-dimensional surface code exhibits good scalability due to its higher redundancy and more efficient error correction capabilities.By reducing the number of ancilla qubits required for error correction,this approach achieves savings in measurement space and reduces resource consumption costs.In order to improve the decoding efficiency and solve the problem of the correlation between the surface code stabilizer and the 3D space after dimension mapping,we employ a reinforcement learning(RL)decoder based on deep Q-learning,which enables faster identification of the optimal syndrome and achieves better thresholds through conditional optimization.Compared to the minimum weight perfect matching decoding,the threshold of the RL trained model reaches 0.78%,which is 56%higher and enables large-scale fault-tolerant quantum computation. 展开更多
关键词 fault-tolerant quantum computing surface code approximate error correction reinforcement learning
原文传递
Linear Complementary Dual Codes Constructed from Reinforcement Learning
3
作者 WU Yansheng MA Jin YANG Shangdong 《Journal of Systems Science & Complexity》 2025年第3期1388-1403,共16页
Recently,linear complementary dual(LCD)codes have garnered substantial interest within coding theory research due to their diverse applications and favorable attributes.This paper directs its attention to the construc... Recently,linear complementary dual(LCD)codes have garnered substantial interest within coding theory research due to their diverse applications and favorable attributes.This paper directs its attention to the construction of binary and ternary LCD codes leveraging curiosity-driven reinforcement learning(RL).By establishing reward and devising well-reasoned mappings from actions to states,it aims to facilitate the successful synthesis of binary or ternary LCD codes.Experimental results indicate that LCD codes constructed using RL exhibit slightly superior error-correction performance compared to those conventionally constructed LCD codes and those developed via standard RL methodologies.The paper introduces novel binary and ternary LCD codes with enhanced minimum distance bounds.Finally,it showcases how random network distillation aids agents in exploring beyond local optima,enhancing the overall performance of the models without compromising convergence. 展开更多
关键词 artificial intelligence error correcting code LCD code reinforcement learning
原文传递
A SPEECH RECOGNITION METHOD USING COMPETITIVE AND SELECTIVE LEARNING NEURAL NETWORKS
4
作者 徐雄 胡光锐 严永红 《Journal of Shanghai Jiaotong university(Science)》 EI 2000年第2期10-13,共4页
On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have exc... On the basis of asymptotic theory of Gersho, the isodistortion principle of vector clustering was discussed and a kind of competitive and selective learning method (CSL) which may avoid local optimization and have excellent result in application to clusters of HMM model was also proposed. In combining the parallel, self organizational hierarchical neural networks (PSHNN) to reclassify the scores of every form output by HMM, the CSL speech recognition rate is obviously elevated. 展开更多
关键词 SPEECH recognition COMPETITIVE learning classification NEURaL networks document code:a
在线阅读 下载PDF
一种基于Dyna-Q学习的旋翼无人机视觉伺服智能控制方法 被引量:8
5
作者 史豪斌 徐梦 +1 位作者 刘珈妤 李继超 《控制与决策》 EI CSCD 北大核心 2019年第12期2517-2526,共10页
基于图像的视觉伺服机器人控制方法通过机器人的视觉获取图像信息,然后形成基于图像信息的闭环反馈来控制机器人的合理运动.经典视觉伺服的伺服增益的选取在大多数条件下是人工赋值的,故存在鲁棒性差、收敛速度慢等问题.针对该问题,提... 基于图像的视觉伺服机器人控制方法通过机器人的视觉获取图像信息,然后形成基于图像信息的闭环反馈来控制机器人的合理运动.经典视觉伺服的伺服增益的选取在大多数条件下是人工赋值的,故存在鲁棒性差、收敛速度慢等问题.针对该问题,提出一种基于Dyna-Q的旋翼无人机视觉伺服智能控制方法调节伺服增益以提高其自适应性.首先,使用基于费尔曼链码的图像特征提取算法提取目标特征点;然后,使用基于图像的视觉伺服形成特征误差的闭环控制;其次,针对旋翼无人机强耦合欠驱动的动力学特性提出一种解耦的视觉伺服控制模型;最后,建立使用Dyna-Q学习调节伺服增益的强化学习模型,通过训练可以使得旋翼无人机自主选择伺服增益.Dyna-Q学习在经典的Q学习的基础上通过建立环境模型来存储经验,环境模型产生的虚拟样本可以作为学习样本来进行值函数的迭代.实验结果表明,所提出的方法相比于传统控制方法PID控制以及经典的基于图像视觉伺服方法具有收敛速度快、稳定性高的优势. 展开更多
关键词 视觉伺服 Dyna-Q学习 增益调节 旋翼无人机 费尔曼连码 强化学习
原文传递
基于强化学习的异常反射编码超表面设计
6
作者 潘宇轩 蒋伊琳 +1 位作者 闫成越 李金鑫 《哈尔滨商业大学学报(自然科学版)》 2025年第1期38-42,共5页
研究了电磁超表面对电磁波束的异常反射调控,采用数字化编码的方式控制超表面的附加相位实现人工控制反射波束的方向.设计并仿真了一种32位二进制码控制的编码超表面,超表面的控制编码由强化学习算法生成.强化学习算法在两个方向上搜索... 研究了电磁超表面对电磁波束的异常反射调控,采用数字化编码的方式控制超表面的附加相位实现人工控制反射波束的方向.设计并仿真了一种32位二进制码控制的编码超表面,超表面的控制编码由强化学习算法生成.强化学习算法在两个方向上搜索编码(电磁波束的异常反射方向和反射信号的功率),以确保超表面实现异常反射功能并减少能量衰减.对编码调控后的超表面在入射波角度固定情况下仿真,结果表明,经过强化学习控制编码调控后的超表面在一定范围内实现了对45°入射的电磁波束反射方向的人工控制使其拥有五个不同的反射角度,反射信号功率可以保持在入射功率的40%以上. 展开更多
关键词 编码超表面 编码序列 强化学习 反射相位 异常反射 波束调控
在线阅读 下载PDF
基于Tile Coding编码和模型学习的Actor-Critic算法 被引量:3
7
作者 金玉净 朱文文 +1 位作者 伏玉琛 刘全 《计算机科学》 CSCD 北大核心 2014年第6期239-242,249,共5页
Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状... Actor-Critic是一类具有较好性能及收敛保证的强化学习方法,然而,Agent在学习和改进策略的过程中并没有对环境的动态性进行学习,导致Actor-Critic方法的性能受到一定限制。此外,Actor-Critic方法中需要近似地表示策略以及值函数,其中状态和动作的编码方法以及参数对Actor-Critic方法有重要的影响。Tile Coding编码具有简单易用、计算时间复杂度较低等优点,因此,将Tile Coding编码与基于模型的Actor-Critic方法结合,并将所得算法应用于强化学习仿真实验。实验结果表明,所得算法具有较好的性能。 展开更多
关键词 强化学习 TILE CODING actor-Critic 模型学习 函数逼近
在线阅读 下载PDF
基于多目标强化学习的抗强干扰Polar编码优化方法 被引量:1
8
作者 梁豪 叶淦华 +2 位作者 陆锐敏 王恒 魏鹏 《电子与信息学报》 EI CSCD 北大核心 2023年第11期4092-4100,共9页
为提升跳频(FH)通信系统信息传输的可靠性和抗干扰能力,该文基于新型Polar编码的慢跳频抗干扰通信系统模型,提出一种适应强干扰环境的Polar编码构造优化方法。首先,面向包含常态和干扰态的混合信道设计多目标强化学习算法,然后优化编码... 为提升跳频(FH)通信系统信息传输的可靠性和抗干扰能力,该文基于新型Polar编码的慢跳频抗干扰通信系统模型,提出一种适应强干扰环境的Polar编码构造优化方法。首先,面向包含常态和干扰态的混合信道设计多目标强化学习算法,然后优化编码过程中的信息位比特信道序列,提升码字的纠错性能,并通过初始化预处理和理论计算回报值降低算法执行复杂度。仿真结果表明,在包含强干扰的混合信道条件下,所提编码优化方法的全局误码性能优于传统编码构造方法,相比于第5代移动通信系统(5G)第3代合作伙伴计划(3GPP)标准方案全局编码增益达0.5 dB,有效改善Polar编码跳频通信高可靠抗干扰传输性能。 展开更多
关键词 信道编码 抗干扰 Polar码 强化学习 可靠性能
在线阅读 下载PDF
Cooperative Caching for Scalable Video Coding Using Value-Decomposed Dimensional Networks 被引量:2
9
作者 Youjia Chen Yuekai Cai +2 位作者 Haifeng Zheng Jinsong Hu Jun Li 《China Communications》 SCIE CSCD 2022年第9期146-161,共16页
Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Und... Scalable video coding(SVC)has been widely used in video-on-demand(VOD)service,to efficiently satisfy users’different video quality requirements and dynamically adjust video stream to timevariant wireless channels.Under the 5G network structure,we consider a cooperative caching scheme inside each cluster with SVC to economically utilize the limited caching storage.A novel multi-agent deep reinforcement learning(MADRL)framework is proposed to jointly optimize the video access delay and users’satisfaction,where an aggregation node is introduced helping individual agents to achieve global observations and overall system rewards.Moreover,to cope with the large action space caused by the large number of videos and users,a dimension decomposition method is embedded into the neural network in each agent,which greatly reduce the computational complexity and memory cost of the reinforcement learning.Experimental results show that:1)the proposed value-decomposed dimensional network(VDDN)algorithm achieves an obvious performance gain versus the traditional MADRL;2)the proposed VDDN algorithm can handle an extremely large action space and quickly converge with a low computational complexity. 展开更多
关键词 cooperative caching multi-agent deep reinforcement learning scalable video coding value-decomposition network
在线阅读 下载PDF
Rich-text document styling restoration via reinforcement learning 被引量:1
10
作者 Hongwei LI Yingpeng HU +2 位作者 Yixuan CAO Ganbin ZHOU Ping LUO 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第4期93-103,共11页
Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside ... Richly formatted documents,such as financial disclosures,scientific articles,government regulations,widely exist on Web.However,since most of these documents are only for public reading,the styling information inside them is usually missing,making them improper or even burdensome to be displayed and edited in different formats and platforms.In this study we formulate the task of document styling restoration as an optimization problem,which aims to identify the styling settings on the document elements,e.g.,lines,table cells,text,so that rendering with the output styling settings results in a document,where each element inside it holds the(closely)exact position with the one in the original document.Considering that each styling setting is a decision,this problem can be transformed as a multi-step decision-making task over all the document elements,and then be solved by reinforcement learning.Specifically,Monte-Carlo Tree Search(MCTS)is leveraged to explore the different styling settings,and the policy function is learnt under the supervision of the delayed rewards.As a case study,we restore the styling information inside tables,where structural and functional data in the documents are usually presented.Experiment shows that,our best reinforcement method successfully restores the stylings in 87.65%of the tables,with 25.75%absolute improvement over the greedymethod.We also discuss the tradeoff between the inference time and restoration success rate,and argue that although the reinforcement methods cannot be used in real-time scenarios,it is suitable for the offline tasks with high-quality requirement.Finally,this model has been applied in a PDF parser to support cross-format display. 展开更多
关键词 styling restoration monte-carlo tree search reinforcement learning richly formatted documents TaBLES
原文传递
RSOFCPN:CONTROL SYSTEM STRUCTURE ANDALGORITHM DESIGN
11
作者 马勇 杨煜普 +1 位作者 张卫东 许晓鸣 《Journal of Shanghai Jiaotong university(Science)》 EI 2000年第2期57-61,共5页
A stable control scheme for a class of unknown nonlinear systems was presented. The control architecture is composed of two parts, the fuzzy sliding mode controller (FSMC) is applied to drive the state to a designed s... A stable control scheme for a class of unknown nonlinear systems was presented. The control architecture is composed of two parts, the fuzzy sliding mode controller (FSMC) is applied to drive the state to a designed switching hyperplane, and a reinforcement self organizing fuzzy CPN (RSOFCPN) as a feedforward compensator is used to reduce the influence of system uncertainties. The simulation results demonstrate the effectiveness of the proposed control scheme. 展开更多
关键词 nonlinear systems fuzzy SLIDING mode control self ORGaNIZED CPN reinforcement learning document code:a
在线阅读 下载PDF
Arc-length technique for nonlinear finite element analysis 被引量:9
12
作者 MEMONBashir-Ahmed 苏小卒 《Journal of Zhejiang University Science》 EI CSCD 2004年第5期618-628,共11页
Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ... Nonlinear solution of reinforced concrete structures, particularly complete load-deflection response, requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard, ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle, received wide acceptance in finite element analysis, and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades, with particular emphasis on nonlinear finite element analysis of reinforced concrete structures. 展开更多
关键词 arc-length method Nonlinear analysis Finite element method Reinforced concrete Load-deflection path document code: a CLC number: TU31 arc-length technique for nonlinear finite element analysis* MEMON Bashir-ahmed# SU Xiao-zu (苏小卒) (Department of Structural Engineering Tongji University Shanghai 200092 China) E-mail: bashirmemon@sohu.com xiaozub@online.sh.cn Received July 30 2003 revision accepted Sept. 11 2003 abstract: Nonlinear solution of reinforced concrete structures particularly complete load-deflection response requires tracing of the equilibrium path and proper treatment of the limit and bifurcation points. In this regard ordinary solution techniques lead to instability near the limit points and also have problems in case of snap-through and snap-back. Thus they fail to predict the complete load-displacement response. The arc-length method serves the purpose well in principle received wide acceptance in finite element analysis and has been used extensively. However modifications to the basic idea are vital to meet the particular needs of the analysis. This paper reviews some of the recent developments of the method in the last two decades with particular emphasis on nonlinear finite element analysis of reinforced concrete structures. Key words: arc-length method Nonlinear analysis Finite element method Reinforced concrete Load-deflection path
在线阅读 下载PDF
Movement and behavior analysis using neural spike signals in CA1 of rat hippocampus
13
作者 Hyejin An Kyungjin You +1 位作者 Minwhan Jung Hyunchool Shin 《Journal of Measurement Science and Instrumentation》 CAS 2013年第4期392-396,共5页
The hippocampus which lies in the temporal lobe plays an important role in spatial navigation,learning and memory.Several studies have been made on the place cell activity,spatial memory,prediction of future locations... The hippocampus which lies in the temporal lobe plays an important role in spatial navigation,learning and memory.Several studies have been made on the place cell activity,spatial memory,prediction of future locations and various learning paradigms.However,there are no attempts which have focused on finding whether neurons which contribute largely to both spatial memory and learning about the reward exist.This paper proposes that there are neurons that can simultaneously engage in forming place memory and reward learning in a rat hippocampus' s CA1 area.With a trained rat,a reward experiment was conducted in a modified 8-shaped maze with five stages,and utterance information was obtained from a CA1 neuron.The firing rate which is the count of spikes per unit time was calculated.The decoding was conducted with log-maximum likelihood estimation(Log-MLE) using Gaussian distribution model.Our outcomes provide evidence of neurons which play a part in spatial memory and learning regarding reward. 展开更多
关键词 HIPPOCaMPUS Ca1 place cell reward learning spatial memory Gaussian distribution maximum likelihood estimation(MLE)document codeaarticle ID:1674-8042(2013)04-0392-05
在线阅读 下载PDF
基于类型辅助引导的代码注释生成模型 被引量:1
14
作者 刘利 吕韦岑 汪洋 《无线电通信技术》 北大核心 2024年第4期807-814,共8页
代码注释生成方法通常基于结构-序列(Structure-Sequence, Struct2Seq)框架,但忽略了代码注释的类型信息,例如操作符、字符串等。由于类型信息之间的层次具有依赖性,将类型信息引入已有的Struct2Seq框架并不适用。为了解决上述问题,提... 代码注释生成方法通常基于结构-序列(Structure-Sequence, Struct2Seq)框架,但忽略了代码注释的类型信息,例如操作符、字符串等。由于类型信息之间的层次具有依赖性,将类型信息引入已有的Struct2Seq框架并不适用。为了解决上述问题,提出一种基于类型辅助引导的代码注释生成(Code Comment Generation based on Type-assisted Guidance, CCG-TG)模型,将源代码视为带有类型信息的n元树。该模型包含一个关联类型编码器和一个限制类型解码器,可以对源代码进行自适应总结。此外,提出一种多级强化学习(Multi-level Reinforcement Learning, MRL)方法来优化所提模型的训练过程。在多个数据集上进行实验,与多种基准模型对比,证明所提CCG-TG模型在所有评价指标上的性能最优。 展开更多
关键词 代码注释生成 类型信息 结构序列框架 类型辅助引导 强化学习
在线阅读 下载PDF
面向漏洞检测模型的强化学习式对抗攻击方法 被引量:1
15
作者 陈思然 吴敬征 +3 位作者 凌祥 罗天悦 刘镓煜 武延军 《软件学报》 EI CSCD 北大核心 2024年第8期3647-3667,共21页
基于深度学习的代码漏洞检测模型因其检测效率高和精度准的优势,逐步成为检测软件漏洞的重要方法,并在代码托管平台GitHub的代码审计服务中发挥重要作用.然而,深度神经网络已被证明容易受到对抗攻击的干扰,这导致基于深度学习的漏洞检... 基于深度学习的代码漏洞检测模型因其检测效率高和精度准的优势,逐步成为检测软件漏洞的重要方法,并在代码托管平台GitHub的代码审计服务中发挥重要作用.然而,深度神经网络已被证明容易受到对抗攻击的干扰,这导致基于深度学习的漏洞检测模型存在遭受攻击、降低检测准确率的风险.因此,构建针对漏洞检测模型的对抗攻击不仅可以发掘此类模型的安全缺陷,而且有助于评估模型的鲁棒性,进而通过相应的方法提升模型性能.但现有的面向漏洞检测模型的对抗攻击方法依赖于通用的代码转换工具,并未提出针对性的代码扰动操作和决策算法,因此难以生成有效的对抗样本,且对抗样本的合法性依赖于人工检查.针对上述问题,提出了一种面向漏洞检测模型的强化学习式对抗攻击方法.该方法首先设计了一系列语义约束且漏洞保留的代码扰动操作作为扰动集合;其次,将具备漏洞的代码样本作为输入,利用强化学习模型选取具体的扰动操作序列;最后,根据代码样本的语法树节点类型寻找扰动的潜在位置,进行代码转换,从而生成对抗样本.基于SARD和NVD构建了两个实验数据集,共14278个代码样本,并以此训练了4个具备不同特点的漏洞检测模型作为攻击目标.针对每个目标模型,训练了一个强化学习网络进行对抗攻击.结果显示,该攻击方法导致模型的召回率降低了74.34%,攻击成功率达到96.71%,相较基线方法,攻击成功率平均提升了68.76%.实验证明了当前的漏洞检测模型存在被攻击的风险,需要进一步研究提升模型的鲁棒性. 展开更多
关键词 对抗攻击 漏洞检测 强化学习 代码转换
在线阅读 下载PDF
利用强化学习的改进遗传算法求解柔性作业车间调度问题 被引量:4
16
作者 陈祉烨 胡毅 +2 位作者 刘俊 王军 张曦阳 《科学技术与工程》 北大核心 2024年第25期10848-10856,共9页
针对传统遗传算法在解决柔性作业车间调度问题时易陷入局部最优解、参数不能智能调整、局部搜索能力差的问题,建立以最大完工时间最小为目标的柔性作业车间调度模型,并提出一种基于强化学习的改进遗传算法(reinforcement learning impro... 针对传统遗传算法在解决柔性作业车间调度问题时易陷入局部最优解、参数不能智能调整、局部搜索能力差的问题,建立以最大完工时间最小为目标的柔性作业车间调度模型,并提出一种基于强化学习的改进遗传算法(reinforcement learning improved genetic algorithm,RLIGA)求解该模型。首先,在遗传算法迭代过程中,利用强化学习动态调整关键参数。其次,引入基于工序编码距离的离散莱维飞行机制,改进求解空间。最后,引入变邻域搜索机制,提升算法的局部开发能力。使用PyCharm运行Brandimarte算例,验证算法的求解性能,实验证明所提算法求解效率较高,跳出局部最优能力更强,求解结果更好。 展开更多
关键词 强化学习 遗传算法 离散莱维飞行 工序编码距离 变邻域搜索
在线阅读 下载PDF
软件中代码注释质量问题研究综述 被引量:1
17
作者 王潮 徐卫伟 周明辉 《软件学报》 EI CSCD 北大核心 2024年第2期513-531,共19页
代码注释作为辅助软件开发群体协作的关键机制,被开发者所广泛使用以提升开发效率.然而,由于代码注释并不直接影响软件运行,使其常被开发者忽视,导致出现代码注释质量问题,进而影响开发效率.代码注释中存在的质量问题会影响开发者理解... 代码注释作为辅助软件开发群体协作的关键机制,被开发者所广泛使用以提升开发效率.然而,由于代码注释并不直接影响软件运行,使其常被开发者忽视,导致出现代码注释质量问题,进而影响开发效率.代码注释中存在的质量问题会影响开发者理解相关代码,甚至可能产生误解从而引入代码缺陷,因此这一问题受到研究者的广泛关注.采用系统文献调研,对近年来国内外学者在代码注释质量问题上的研究工作进行系统的分析.从代码注释质量的评价维度、度量指标以及提升策略这3个方面总结研究现状,并提出当前研究所存在的不足、挑战及建议. 展开更多
关键词 代码注释 软件文档 自然语言处理 机器学习
在线阅读 下载PDF
基于深度强化学习的二进制代码模糊测试方法
18
作者 王栓奇 赵健鑫 +2 位作者 刘驰 武伟 刘钊 《计算机科学》 CSCD 北大核心 2024年第S01期852-858,共7页
漏洞挖掘是计算机软件安全领域的主要研究方向,其中模糊测试是重要的动态挖掘方法。为解决二进制代码漏洞挖掘中汇编代码体积庞大导致检测既困难又耗时、模糊测试效率低下等问题,提出基于深度强化学习的二进制代码模糊测试方法。首先将... 漏洞挖掘是计算机软件安全领域的主要研究方向,其中模糊测试是重要的动态挖掘方法。为解决二进制代码漏洞挖掘中汇编代码体积庞大导致检测既困难又耗时、模糊测试效率低下等问题,提出基于深度强化学习的二进制代码模糊测试方法。首先将模糊测试过程建模为面向强化学习的多步马尔可夫决策过程,通过构建深度强化学习模型辅助模糊测试变异策略选择,实现对变异策略的动态优化。然后设计和搭建基于深度强化学习的二进制代码模糊测试平台,利用AFL实现模糊测试环境,并使用Keras-RL2库和OpenAI Gym框架实现深度强化学习算法和强化学习环境。最后通过实验分析来验证所提方法和测试平台的有效性和适用性,实验结果显示深度强化学习模型能够辅助模糊测试过程快速覆盖更多路径,能够暴露更多漏洞缺陷,显著提高二进制代码漏洞挖掘和定位的效率。 展开更多
关键词 二进制代码 漏洞挖掘 模糊测试 深度强化学习 测试平台
在线阅读 下载PDF
多强化学习算法驱动的BATS码批构造策略比较研究
19
作者 苏天辰 《社会科学理论与实践》 2025年第3期130-139,共10页
针对无线多跳网络中BATS码存在的批构造效率低、动态适应性差等问题,本文提出分层强化学习优化框架。通过Tanner图拓扑预训练生成均匀覆盖的批构造策略,设计混合BP-灭活解码算法降低30%计算复杂度。实验表明:Rainbow-DQN在稳定网络中解... 针对无线多跳网络中BATS码存在的批构造效率低、动态适应性差等问题,本文提出分层强化学习优化框架。通过Tanner图拓扑预训练生成均匀覆盖的批构造策略,设计混合BP-灭活解码算法降低30%计算复杂度。实验表明:Rainbow-DQN在稳定网络中解码成功率提升25%,PPO在时变信道下性能波动降低40%,DDPG在边缘设备实现15ms级低延迟推理。本方案在NS-3仿真中达到90.1%的传输可靠性,为5G/6G网络提供高效工程实现路径。 展开更多
关键词 BaTS码 强化学习 多跳网络 动态优化 边缘计算
在线阅读 下载PDF
基于自适应网络编码的异构无线链路并发传输控制方法研究 被引量:15
20
作者 赵夙 王伟 +1 位作者 朱晓荣 倪钦崟 《电子与信息学报》 EI CSCD 北大核心 2022年第8期2777-2784,共8页
随着高清视频直播、虚拟现实等高速率业务不断兴起,单一的网络很难满足用户的业务需求。利用多种异构链路实现并发传输,可以有效聚合带宽资源,提高服务质量。但是,在异构无线网络中,由于链路状况复杂多变,多条链路质量不一,现有的多路... 随着高清视频直播、虚拟现实等高速率业务不断兴起,单一的网络很难满足用户的业务需求。利用多种异构链路实现并发传输,可以有效聚合带宽资源,提高服务质量。但是,在异构无线网络中,由于链路状况复杂多变,多条链路质量不一,现有的多路径并发传输算法并不能自适应地根据复杂的网络状况做出最优的决策。该文提出了一种自适应网络编码的多路径并发传输控制算法,引入Asynchronous Advantage Actor-Critic(A3C)强化学习,通过自适应的网络编码,根据当前网络状况智能地选择编码分组大小和冗余大小,从而解决数据包的乱序问题。仿真结果表明,该算法能够提高10%左右的传输速率,提升了用户体验。 展开更多
关键词 无线网络 并发传输 网络编码 强化学习
在线阅读 下载PDF
上一页 1 2 下一页 到第
使用帮助 返回顶部