Smart contracts,which automatically execute on decentralized platforms like Ethereum,require high security and low gas consumption.As a result,developers have a strong demand for semantic code search tools that utiliz...Smart contracts,which automatically execute on decentralized platforms like Ethereum,require high security and low gas consumption.As a result,developers have a strong demand for semantic code search tools that utilize natural language queries to efficiently search for existing code snippets.However,existing code search models face a semantic gap between code and queries,which requires a large amount of training data.In this paper,we propose a fine-tuning approach to bridge the semantic gap in code search and improve the search accuracy.We collect 80723 different pairs of<comment,code snippet>from Etherscan.io and use these pairs to fine-tune,validate,and test the pre-trained CodeBERT model.Using the fine-tuned model,we develop a code search engine specifically for smart contracts.We evaluate the Recall@k and Mean Reciprocal Rank(MRR)of the fine-tuned CodeBERT model using different proportions of the finetuned data.It is encouraging that even a small amount of fine-tuned data can produce satisfactory results.In addition,we perform a comparative analysis between the fine-tuned CodeBERT model and the two state-of-the-art models.The experimental results show that the finetuned CodeBERT model has superior performance in terms of Recall@k and MRR.These findings highlight the effectiveness of our finetuning approach and its potential to significantly improve the code search accuracy.展开更多
自动程序修复(automatic program repair,APR)技术在软件工程领域占据重要地位,但现有基于大语言模型的APR方法在搜索定位、执行效率和成本控制方面存在明显不足。为此,提出一种基于轻量智能搜索的自适应程序修复系统AutoVulnFix。该系...自动程序修复(automatic program repair,APR)技术在软件工程领域占据重要地位,但现有基于大语言模型的APR方法在搜索定位、执行效率和成本控制方面存在明显不足。为此,提出一种基于轻量智能搜索的自适应程序修复系统AutoVulnFix。该系统通过精简Prompt策略、双轨智能搜索算法和双层缓存三大核心技术解决现有问题。具体而言,首先通过智能上下文管理和精准问题描述降低Token开销,然后通过延迟加载和智能缓存策略提升系统的响应速度,最后结合快速搜索轨道和智能搜索轨道,根据任务复杂度自适应选择最优搜索策略。在包含231个代码缺陷样本的数据集上对5种大语言模型进行了全面的对比评估,结果表明,AutoVulnFix在修复准确率方面平均提升4.2%,在执行时间方面平均缩短25.8%,在Token开销方面平均降低21.4%。该系统为APR技术的实用化部署提供了有效的解决方案。展开更多
The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolvi...The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolving contexts(EC) that are added/deleted terms and their dependent terms during code evolution. On expanding a query, we appended the added terms as relevant terms, and excluded the deleted terms as noisy terms. We also developed a QE-integrating framework based on the Support Vector Machine(SVM) Ranking, called QESR, to simultaneously integrate multiple QE methods. Our experiment shows that QESR outperforms the state-of-the-art QE methods CodeHow and Query Expansion based on Crowd Knowledge(QECK) by 13%-16% in terms of precision when the first query result is inspected.展开更多
Test points selection for integer-coded fault wise table is a discrete optimization problem. The global minimum set of test points can only be guaranteed by an exhaustive search which is eompurationally expensive. In ...Test points selection for integer-coded fault wise table is a discrete optimization problem. The global minimum set of test points can only be guaranteed by an exhaustive search which is eompurationally expensive. In this paper, this problem is formulated as a heuristic depth-first graph search problem at first. The graph node expanding method and rules are given. Then, rollout strategies are applied, which can be combined with the heuristic graph search algorithms, in a computationally more efficient manner than the optimal strategies, to obtain solutions superior to those using the greedy heuristic algorithms. The proposed rollout-based test points selection algorithm is illustrated and tested using an analog circuit and a set of simulated integer-coded fault wise tables. Computa- tional results are shown, which suggest that the rollout strategy policies are significantly better than other strategies.展开更多
A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-o...A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.展开更多
An improved technique with a fractional sampling based on two samples per chip, according to the Nyquist criterion, has been employed by the authors to enhance the performance in the code synchronization of UMTS (or W...An improved technique with a fractional sampling based on two samples per chip, according to the Nyquist criterion, has been employed by the authors to enhance the performance in the code synchronization of UMTS (or W-CDMA) systems. In this paper, we investigate on the theoretical rationale of such a promising behavior. The performance is analyzed for several wireless channels, in the presence of typical pedestrian and vehicular scenarios of the IMT2000/UMTS cellular systems.展开更多
The concepts of ordered code-book and the priority of code-vector are proposedin this paper.The statistical properties of a signal are investigated through its coded sequence.Experimental results are presented which p...The concepts of ordered code-book and the priority of code-vector are proposedin this paper.The statistical properties of a signal are investigated through its coded sequence.Experimental results are presented which provide some insight into the statistical properties ofvector quantized sequences.Based on the given concepts and experimental results,a fast searchmethod for the vector quantization of correlated information sources,such as Gauss-Markovsources,is proposed and has shown its efficiency in simulation results.展开更多
基金Supported by Jiangxi Higher Education and Teaching Reform Project(JXJG-20-24-2)Science and Technology Project of Jiangxi Education Department(GJJ212023)Jiangxi University of Technology Education and Teaching Reform Project(JY2104)
文摘Smart contracts,which automatically execute on decentralized platforms like Ethereum,require high security and low gas consumption.As a result,developers have a strong demand for semantic code search tools that utilize natural language queries to efficiently search for existing code snippets.However,existing code search models face a semantic gap between code and queries,which requires a large amount of training data.In this paper,we propose a fine-tuning approach to bridge the semantic gap in code search and improve the search accuracy.We collect 80723 different pairs of<comment,code snippet>from Etherscan.io and use these pairs to fine-tune,validate,and test the pre-trained CodeBERT model.Using the fine-tuned model,we develop a code search engine specifically for smart contracts.We evaluate the Recall@k and Mean Reciprocal Rank(MRR)of the fine-tuned CodeBERT model using different proportions of the finetuned data.It is encouraging that even a small amount of fine-tuned data can produce satisfactory results.In addition,we perform a comparative analysis between the fine-tuned CodeBERT model and the two state-of-the-art models.The experimental results show that the finetuned CodeBERT model has superior performance in terms of Recall@k and MRR.These findings highlight the effectiveness of our finetuning approach and its potential to significantly improve the code search accuracy.
文摘自动程序修复(automatic program repair,APR)技术在软件工程领域占据重要地位,但现有基于大语言模型的APR方法在搜索定位、执行效率和成本控制方面存在明显不足。为此,提出一种基于轻量智能搜索的自适应程序修复系统AutoVulnFix。该系统通过精简Prompt策略、双轨智能搜索算法和双层缓存三大核心技术解决现有问题。具体而言,首先通过智能上下文管理和精准问题描述降低Token开销,然后通过延迟加载和智能缓存策略提升系统的响应速度,最后结合快速搜索轨道和智能搜索轨道,根据任务复杂度自适应选择最优搜索策略。在包含231个代码缺陷样本的数据集上对5种大语言模型进行了全面的对比评估,结果表明,AutoVulnFix在修复准确率方面平均提升4.2%,在执行时间方面平均缩短25.8%,在Token开销方面平均降低21.4%。该系统为APR技术的实用化部署提供了有效的解决方案。
基金Supported by the Science and Technology Project of Jiangxi Education Department(GJJ161151)the School-Level Team Building Project(JXTD1404)
文摘The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolving contexts(EC) that are added/deleted terms and their dependent terms during code evolution. On expanding a query, we appended the added terms as relevant terms, and excluded the deleted terms as noisy terms. We also developed a QE-integrating framework based on the Support Vector Machine(SVM) Ranking, called QESR, to simultaneously integrate multiple QE methods. Our experiment shows that QESR outperforms the state-of-the-art QE methods CodeHow and Query Expansion based on Crowd Knowledge(QECK) by 13%-16% in terms of precision when the first query result is inspected.
基金supported by Commission of Science Technology and Industry for National Defence of China under Grant No.A1420061264National Natural Science Foundation of China under Grant No.60934002General Armament Department under Grand No.51317040102)
文摘Test points selection for integer-coded fault wise table is a discrete optimization problem. The global minimum set of test points can only be guaranteed by an exhaustive search which is eompurationally expensive. In this paper, this problem is formulated as a heuristic depth-first graph search problem at first. The graph node expanding method and rules are given. Then, rollout strategies are applied, which can be combined with the heuristic graph search algorithms, in a computationally more efficient manner than the optimal strategies, to obtain solutions superior to those using the greedy heuristic algorithms. The proposed rollout-based test points selection algorithm is illustrated and tested using an analog circuit and a set of simulated integer-coded fault wise tables. Computa- tional results are shown, which suggest that the rollout strategy policies are significantly better than other strategies.
基金Supported by the National Natural Science Foundation of China (60473085)
文摘A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.
文摘An improved technique with a fractional sampling based on two samples per chip, according to the Nyquist criterion, has been employed by the authors to enhance the performance in the code synchronization of UMTS (or W-CDMA) systems. In this paper, we investigate on the theoretical rationale of such a promising behavior. The performance is analyzed for several wireless channels, in the presence of typical pedestrian and vehicular scenarios of the IMT2000/UMTS cellular systems.
文摘The concepts of ordered code-book and the priority of code-vector are proposedin this paper.The statistical properties of a signal are investigated through its coded sequence.Experimental results are presented which provide some insight into the statistical properties ofvector quantized sequences.Based on the given concepts and experimental results,a fast searchmethod for the vector quantization of correlated information sources,such as Gauss-Markovsources,is proposed and has shown its efficiency in simulation results.