Smart contracts,which automatically execute on decentralized platforms like Ethereum,require high security and low gas consumption.As a result,developers have a strong demand for semantic code search tools that utiliz...Smart contracts,which automatically execute on decentralized platforms like Ethereum,require high security and low gas consumption.As a result,developers have a strong demand for semantic code search tools that utilize natural language queries to efficiently search for existing code snippets.However,existing code search models face a semantic gap between code and queries,which requires a large amount of training data.In this paper,we propose a fine-tuning approach to bridge the semantic gap in code search and improve the search accuracy.We collect 80723 different pairs of<comment,code snippet>from Etherscan.io and use these pairs to fine-tune,validate,and test the pre-trained CodeBERT model.Using the fine-tuned model,we develop a code search engine specifically for smart contracts.We evaluate the Recall@k and Mean Reciprocal Rank(MRR)of the fine-tuned CodeBERT model using different proportions of the finetuned data.It is encouraging that even a small amount of fine-tuned data can produce satisfactory results.In addition,we perform a comparative analysis between the fine-tuned CodeBERT model and the two state-of-the-art models.The experimental results show that the finetuned CodeBERT model has superior performance in terms of Recall@k and MRR.These findings highlight the effectiveness of our finetuning approach and its potential to significantly improve the code search accuracy.展开更多
The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolvi...The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolving contexts(EC) that are added/deleted terms and their dependent terms during code evolution. On expanding a query, we appended the added terms as relevant terms, and excluded the deleted terms as noisy terms. We also developed a QE-integrating framework based on the Support Vector Machine(SVM) Ranking, called QESR, to simultaneously integrate multiple QE methods. Our experiment shows that QESR outperforms the state-of-the-art QE methods CodeHow and Query Expansion based on Crowd Knowledge(QECK) by 13%-16% in terms of precision when the first query result is inspected.展开更多
Test points selection for integer-coded fault wise table is a discrete optimization problem. The global minimum set of test points can only be guaranteed by an exhaustive search which is eompurationally expensive. In ...Test points selection for integer-coded fault wise table is a discrete optimization problem. The global minimum set of test points can only be guaranteed by an exhaustive search which is eompurationally expensive. In this paper, this problem is formulated as a heuristic depth-first graph search problem at first. The graph node expanding method and rules are given. Then, rollout strategies are applied, which can be combined with the heuristic graph search algorithms, in a computationally more efficient manner than the optimal strategies, to obtain solutions superior to those using the greedy heuristic algorithms. The proposed rollout-based test points selection algorithm is illustrated and tested using an analog circuit and a set of simulated integer-coded fault wise tables. Computa- tional results are shown, which suggest that the rollout strategy policies are significantly better than other strategies.展开更多
A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-o...A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.展开更多
An improved technique with a fractional sampling based on two samples per chip, according to the Nyquist criterion, has been employed by the authors to enhance the performance in the code synchronization of UMTS (or W...An improved technique with a fractional sampling based on two samples per chip, according to the Nyquist criterion, has been employed by the authors to enhance the performance in the code synchronization of UMTS (or W-CDMA) systems. In this paper, we investigate on the theoretical rationale of such a promising behavior. The performance is analyzed for several wireless channels, in the presence of typical pedestrian and vehicular scenarios of the IMT2000/UMTS cellular systems.展开更多
基金Supported by Jiangxi Higher Education and Teaching Reform Project(JXJG-20-24-2)Science and Technology Project of Jiangxi Education Department(GJJ212023)Jiangxi University of Technology Education and Teaching Reform Project(JY2104)
文摘Smart contracts,which automatically execute on decentralized platforms like Ethereum,require high security and low gas consumption.As a result,developers have a strong demand for semantic code search tools that utilize natural language queries to efficiently search for existing code snippets.However,existing code search models face a semantic gap between code and queries,which requires a large amount of training data.In this paper,we propose a fine-tuning approach to bridge the semantic gap in code search and improve the search accuracy.We collect 80723 different pairs of<comment,code snippet>from Etherscan.io and use these pairs to fine-tune,validate,and test the pre-trained CodeBERT model.Using the fine-tuned model,we develop a code search engine specifically for smart contracts.We evaluate the Recall@k and Mean Reciprocal Rank(MRR)of the fine-tuned CodeBERT model using different proportions of the finetuned data.It is encouraging that even a small amount of fine-tuned data can produce satisfactory results.In addition,we perform a comparative analysis between the fine-tuned CodeBERT model and the two state-of-the-art models.The experimental results show that the finetuned CodeBERT model has superior performance in terms of Recall@k and MRR.These findings highlight the effectiveness of our finetuning approach and its potential to significantly improve the code search accuracy.
基金Supported by the Science and Technology Project of Jiangxi Education Department(GJJ161151)the School-Level Team Building Project(JXTD1404)
文摘The existing query expansion(QE) methods cannot find the most users-requested source code version at times due to the over-expansion resulting from noises. To solve this problem, we propose a QE method based on evolving contexts(EC) that are added/deleted terms and their dependent terms during code evolution. On expanding a query, we appended the added terms as relevant terms, and excluded the deleted terms as noisy terms. We also developed a QE-integrating framework based on the Support Vector Machine(SVM) Ranking, called QESR, to simultaneously integrate multiple QE methods. Our experiment shows that QESR outperforms the state-of-the-art QE methods CodeHow and Query Expansion based on Crowd Knowledge(QECK) by 13%-16% in terms of precision when the first query result is inspected.
基金supported by Commission of Science Technology and Industry for National Defence of China under Grant No.A1420061264National Natural Science Foundation of China under Grant No.60934002General Armament Department under Grand No.51317040102)
文摘Test points selection for integer-coded fault wise table is a discrete optimization problem. The global minimum set of test points can only be guaranteed by an exhaustive search which is eompurationally expensive. In this paper, this problem is formulated as a heuristic depth-first graph search problem at first. The graph node expanding method and rules are given. Then, rollout strategies are applied, which can be combined with the heuristic graph search algorithms, in a computationally more efficient manner than the optimal strategies, to obtain solutions superior to those using the greedy heuristic algorithms. The proposed rollout-based test points selection algorithm is illustrated and tested using an analog circuit and a set of simulated integer-coded fault wise tables. Computa- tional results are shown, which suggest that the rollout strategy policies are significantly better than other strategies.
基金Supported by the National Natural Science Foundation of China (60473085)
文摘A new way of indexing and processing twig patterns in an XML documents is proposed in this paper. Every path in XML document can be transformed into a sequence of labels by Structure-Encoded that constructs a one-to-one correspondence between XML tree and sequence. Base on identifying characteristics of nodes in XML tree, the elements are classified and clustered. During query proceeding, the twig pattern is also transformed into its Structure-Encoded. By performing subsequence matching on the set of sequences in XML documents, all the occurrences of path in the XML documents are refined. Using the index, the numbers of elements retrieved are minimized. The search results with pertinent format provide more structure information without any false dismissals or false alarms. The index also supports keyword search Experiment results indicate the index has significantly efficiency with high precision.
文摘An improved technique with a fractional sampling based on two samples per chip, according to the Nyquist criterion, has been employed by the authors to enhance the performance in the code synchronization of UMTS (or W-CDMA) systems. In this paper, we investigate on the theoretical rationale of such a promising behavior. The performance is analyzed for several wireless channels, in the presence of typical pedestrian and vehicular scenarios of the IMT2000/UMTS cellular systems.