在一些修船企业建立的修船结算系统和电子价格库中,人工匹配结算编码步骤易出错且耗时长,直接影响结算效率。为解决该问题,提出一种基于多特征融合的修船结算编码智能匹配复合模型。采用来自变换器的双向编码器表示(Bidirectional Encod...在一些修船企业建立的修船结算系统和电子价格库中,人工匹配结算编码步骤易出错且耗时长,直接影响结算效率。为解决该问题,提出一种基于多特征融合的修船结算编码智能匹配复合模型。采用来自变换器的双向编码器表示(Bidirectional Encoder Representations from Transformers,BERT)模型将工程内容文本表示为词向量,采用卷积神经网络(Convolutional Neural Network,CNN)模型提取文本的局部特征,采用双向长短期记忆网络结合注意力机制(Bidirectional Long Short-Term Memory with Attention Mechanism,BiLSTM-Attention)模型提取上下文特征,得到对应的结算编码。试验结果表明,所提出的复合模型在整体准确率方面实现显著提升,充分证明该复合模型在处理复杂文本分类任务中的优势。展开更多
自动程序修复(automatic program repair,APR)技术在软件工程领域占据重要地位,但现有基于大语言模型的APR方法在搜索定位、执行效率和成本控制方面存在明显不足。为此,提出一种基于轻量智能搜索的自适应程序修复系统AutoVulnFix。该系...自动程序修复(automatic program repair,APR)技术在软件工程领域占据重要地位,但现有基于大语言模型的APR方法在搜索定位、执行效率和成本控制方面存在明显不足。为此,提出一种基于轻量智能搜索的自适应程序修复系统AutoVulnFix。该系统通过精简Prompt策略、双轨智能搜索算法和双层缓存三大核心技术解决现有问题。具体而言,首先通过智能上下文管理和精准问题描述降低Token开销,然后通过延迟加载和智能缓存策略提升系统的响应速度,最后结合快速搜索轨道和智能搜索轨道,根据任务复杂度自适应选择最优搜索策略。在包含231个代码缺陷样本的数据集上对5种大语言模型进行了全面的对比评估,结果表明,AutoVulnFix在修复准确率方面平均提升4.2%,在执行时间方面平均缩短25.8%,在Token开销方面平均降低21.4%。该系统为APR技术的实用化部署提供了有效的解决方案。展开更多
In distributed cloud storage systems, inevitably there exist multiple node failures at the same time. The existing methods of regenerating codes, including minimum storage regenerating(MSR) codes and minimum bandwidth...In distributed cloud storage systems, inevitably there exist multiple node failures at the same time. The existing methods of regenerating codes, including minimum storage regenerating(MSR) codes and minimum bandwidth regenerating(MBR) codes, are mainly to repair one single or several failed nodes, unable to meet the repair need of distributed cloud storage systems. In this paper, we present locally minimum storage regenerating(LMSR) codes to recover multiple failed nodes at the same time. Specifically, the nodes in distributed cloud storage systems are divided into multiple local groups, and in each local group(4, 2) or(5, 3) MSR codes are constructed. Moreover, the grouping method of storage nodes and the repairing process of failed nodes in local groups are studied. Theoretical analysis shows that LMSR codes can achieve the same storage overhead as MSR codes. Furthermore, we verify by means of simulation that, compared with MSR codes, LMSR codes can reduce the repair bandwidth and disk I/O overhead effectively.展开更多
Erasure code is widely used as the redundancy scheme in distributed storage system. When a storage node fails, the repair process often requires to transfer a large amount of data. Regenerating code and hierarchical c...Erasure code is widely used as the redundancy scheme in distributed storage system. When a storage node fails, the repair process often requires to transfer a large amount of data. Regenerating code and hierarchical code are two classes of codes proposed to reduce the repair bandwidth cost. Regenerating codes reduce the amount of data transferred by each helping node, while hierarchical codes reduce the number of nodes participating in the repair process. In this paper, we propose a "sub-code nesting framework" to combine them together. The resulting regenerating hierarchical code has low repair degree as hierarchical code and lower repair cost than hierarchical code. Our code can achieve exact regeneration of the failed node, and has the additional property of low updating complexity.展开更多
在大规模分布式存储系统的广泛应用背景下,传统容错编码方案在单盘和双盘故障修复过程中面临读取资源消耗高、修复效率不足等技术难题,提出一种具有局部修复特性的混合校验编码方案——VC-code(vertical central symmetric code)。VC-c...在大规模分布式存储系统的广泛应用背景下,传统容错编码方案在单盘和双盘故障修复过程中面临读取资源消耗高、修复效率不足等技术难题,提出一种具有局部修复特性的混合校验编码方案——VC-code(vertical central symmetric code)。VC-code通过融合横纵式阵列码的快速修复与负载均衡特性,设计了一种局部水平校验与对角校验交叉融合的结构,并采用纵向中心对称校验布局优化数据依赖关系。该设计将单盘和双盘故障修复的数据读取量显著降低,同时通过缩短修复链提升整体效率。理论分析表明,在单双盘故障恢复时大幅降低了数据读取开销。实验结果进一步验证了其性能优势,与RDP码、LRRDP码以及DRDP码相比,VC-code在单盘故障修复时间上减少了10.45%~29.57%,在双盘故障修复时间上减少了6.35%~33.24%。展开更多
软件系统在各行各业中发挥着不可忽视的作用,承载着大规模、高密度的数据,但软件系统中存在的种种缺陷一直以来困扰着系统的开发者,时刻威胁着系统数据要素的安全.自动代码修复(automated program repair,APR)技术旨在帮助开发者在软件...软件系统在各行各业中发挥着不可忽视的作用,承载着大规模、高密度的数据,但软件系统中存在的种种缺陷一直以来困扰着系统的开发者,时刻威胁着系统数据要素的安全.自动代码修复(automated program repair,APR)技术旨在帮助开发者在软件系统的开发过程中自动地修复代码中存在的缺陷,节约软件系统开发和维护成本,提高软件系统中数据要素的保密性、可用性和完整性.随着大语言模型(large language model,LLM)技术的发展,涌现出许多能力强大的代码大语言模型,并且代码LLM在APR领域的应用中表现出了强大的修复能力,弥补了传统方案对于代码理解能力、补丁生成能力方面的不足,进一步提高了代码修复工具的水平.全面调研分析了近年APR相关的高水平论文,总结了APR领域的最新发展,系统归纳了完形填空模式和神经机器翻译模式2类基于LLM的APR技术,并从模型类型、模型规模、修复的缺陷类型、修复的编程语言和修复方案优缺点等角度进行全方位的对比与研讨.同时,对APR数据集和评价APR修复能力的指标进行了梳理和分析,并且对现有的实证研究展开深入探讨.最后,分析了当前APR领域存在的挑战及未来的研究方向.展开更多
文摘在一些修船企业建立的修船结算系统和电子价格库中,人工匹配结算编码步骤易出错且耗时长,直接影响结算效率。为解决该问题,提出一种基于多特征融合的修船结算编码智能匹配复合模型。采用来自变换器的双向编码器表示(Bidirectional Encoder Representations from Transformers,BERT)模型将工程内容文本表示为词向量,采用卷积神经网络(Convolutional Neural Network,CNN)模型提取文本的局部特征,采用双向长短期记忆网络结合注意力机制(Bidirectional Long Short-Term Memory with Attention Mechanism,BiLSTM-Attention)模型提取上下文特征,得到对应的结算编码。试验结果表明,所提出的复合模型在整体准确率方面实现显著提升,充分证明该复合模型在处理复杂文本分类任务中的优势。
文摘自动程序修复(automatic program repair,APR)技术在软件工程领域占据重要地位,但现有基于大语言模型的APR方法在搜索定位、执行效率和成本控制方面存在明显不足。为此,提出一种基于轻量智能搜索的自适应程序修复系统AutoVulnFix。该系统通过精简Prompt策略、双轨智能搜索算法和双层缓存三大核心技术解决现有问题。具体而言,首先通过智能上下文管理和精准问题描述降低Token开销,然后通过延迟加载和智能缓存策略提升系统的响应速度,最后结合快速搜索轨道和智能搜索轨道,根据任务复杂度自适应选择最优搜索策略。在包含231个代码缺陷样本的数据集上对5种大语言模型进行了全面的对比评估,结果表明,AutoVulnFix在修复准确率方面平均提升4.2%,在执行时间方面平均缩短25.8%,在Token开销方面平均降低21.4%。该系统为APR技术的实用化部署提供了有效的解决方案。
基金supported in part by the National Natural Science Foundation of China (61640006, 61572188)the Natural Science Foundation of Shaanxi Province, China (2015JM6307, 2016JQ6011)the project of science and technology of Xi’an City (2017088CG/RC051(CADX002))
文摘In distributed cloud storage systems, inevitably there exist multiple node failures at the same time. The existing methods of regenerating codes, including minimum storage regenerating(MSR) codes and minimum bandwidth regenerating(MBR) codes, are mainly to repair one single or several failed nodes, unable to meet the repair need of distributed cloud storage systems. In this paper, we present locally minimum storage regenerating(LMSR) codes to recover multiple failed nodes at the same time. Specifically, the nodes in distributed cloud storage systems are divided into multiple local groups, and in each local group(4, 2) or(5, 3) MSR codes are constructed. Moreover, the grouping method of storage nodes and the repairing process of failed nodes in local groups are studied. Theoretical analysis shows that LMSR codes can achieve the same storage overhead as MSR codes. Furthermore, we verify by means of simulation that, compared with MSR codes, LMSR codes can reduce the repair bandwidth and disk I/O overhead effectively.
基金Supported by 973 Project of China (No. 2012CB315803)Research Fund for the Doctoral Program of Higher Education of China (No. 20100002110033)Open research Fund of National Mobile Communications Research Laboratory, Southeast University (No. 2011D11)
文摘Erasure code is widely used as the redundancy scheme in distributed storage system. When a storage node fails, the repair process often requires to transfer a large amount of data. Regenerating code and hierarchical code are two classes of codes proposed to reduce the repair bandwidth cost. Regenerating codes reduce the amount of data transferred by each helping node, while hierarchical codes reduce the number of nodes participating in the repair process. In this paper, we propose a "sub-code nesting framework" to combine them together. The resulting regenerating hierarchical code has low repair degree as hierarchical code and lower repair cost than hierarchical code. Our code can achieve exact regeneration of the failed node, and has the additional property of low updating complexity.
文摘在大规模分布式存储系统的广泛应用背景下,传统容错编码方案在单盘和双盘故障修复过程中面临读取资源消耗高、修复效率不足等技术难题,提出一种具有局部修复特性的混合校验编码方案——VC-code(vertical central symmetric code)。VC-code通过融合横纵式阵列码的快速修复与负载均衡特性,设计了一种局部水平校验与对角校验交叉融合的结构,并采用纵向中心对称校验布局优化数据依赖关系。该设计将单盘和双盘故障修复的数据读取量显著降低,同时通过缩短修复链提升整体效率。理论分析表明,在单双盘故障恢复时大幅降低了数据读取开销。实验结果进一步验证了其性能优势,与RDP码、LRRDP码以及DRDP码相比,VC-code在单盘故障修复时间上减少了10.45%~29.57%,在双盘故障修复时间上减少了6.35%~33.24%。
文摘软件系统在各行各业中发挥着不可忽视的作用,承载着大规模、高密度的数据,但软件系统中存在的种种缺陷一直以来困扰着系统的开发者,时刻威胁着系统数据要素的安全.自动代码修复(automated program repair,APR)技术旨在帮助开发者在软件系统的开发过程中自动地修复代码中存在的缺陷,节约软件系统开发和维护成本,提高软件系统中数据要素的保密性、可用性和完整性.随着大语言模型(large language model,LLM)技术的发展,涌现出许多能力强大的代码大语言模型,并且代码LLM在APR领域的应用中表现出了强大的修复能力,弥补了传统方案对于代码理解能力、补丁生成能力方面的不足,进一步提高了代码修复工具的水平.全面调研分析了近年APR相关的高水平论文,总结了APR领域的最新发展,系统归纳了完形填空模式和神经机器翻译模式2类基于LLM的APR技术,并从模型类型、模型规模、修复的缺陷类型、修复的编程语言和修复方案优缺点等角度进行全方位的对比与研讨.同时,对APR数据集和评价APR修复能力的指标进行了梳理和分析,并且对现有的实证研究展开深入探讨.最后,分析了当前APR领域存在的挑战及未来的研究方向.