In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cl...In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.展开更多
There are lots of code clones appearing in software,which are similar code fragments with each other. In the past decades,researchers have proposed some state-of-the-art methods to detect clones. The code clones have ...There are lots of code clones appearing in software,which are similar code fragments with each other. In the past decades,researchers have proposed some state-of-the-art methods to detect clones. The code clones have showing some relationship with the evolution of software. In order to explore relationships between clones and their evolution,we propose a framework to cluster clones with a Fuzzy C-means clustering method.Firstly,we detect all the clones using Ni Cad,and build the clone genealogies for multiple versions software.Secondly,we extract some metrics to describe the clones and their evolution. Finally,we cluster all clone's vectors,which are generated with the different metrics for different proposes. Experimental results on six open source software packages have shown the relationships among the clone life,the number of change times,the clone pattern and et al. can help developers to understand clones.展开更多
Reusing code fragments by copying and pasting them with or without minor adaptation is a common activity in software development.As a result,software systems often contain sections of code that are very similar,called...Reusing code fragments by copying and pasting them with or without minor adaptation is a common activity in software development.As a result,software systems often contain sections of code that are very similar,called code clones.Code clones are beneficial in reducing software development costs and development risks.However,recent studies have indicated some negative impacts as a result.In order to effectively manage and utilize the clones,we design an approach for recommending refactoring clones based on a Bayesian network.Firstly,clone codes are detected from the source code.Secondly,the clones that need to be refactored are identified,and the static and evolutions features are extracted to build the feature database.Finally,the Bayesian network classifier is used for training and evaluating the classification results.Based on more than 640 refactor examples of five open source software developed in C,we observe a considerable enhancement.The results show that the accuracy of the approach is larger than 90%.We believe our approach will provide a more accurate and reasonable code refactoring and maintenance advice for software developers.展开更多
Modifying a code segment may give rise to a consistency issue when the code segment belongs to a clone group comprising closely similar code segments.Recent studies have demonstrated that such consistent changes can i...Modifying a code segment may give rise to a consistency issue when the code segment belongs to a clone group comprising closely similar code segments.Recent studies have demonstrated that such consistent changes can incur extra maintenance costs when clones are checked for consistency and introduce defects if developers forget to change clones consistently when needed.To address this problem,researchers have proposed an approach to predict clone consistency in advance with handcrafted attributes,notably using machine learning methods.Although these attributes can help predict clone consistency to some extent,the capability of such an approach is generally weak and unsatisfactory in practice.Such limitations in capability are especially severe at a project's infancy stage when there is not sufficient within-project data to model clone consistency behavior,and cross-project data have not been helpful in supporting prediction.In this paper,we propose the Clone Hierarchical Attention Neural Network(CHANN)to represent code clones and their evolution by adopting a hierarchical perspective of code,context,and code evolution,and thus enhancing the effectiveness of clone consistency prediction.To assess the effectiveness of CHANN,we conduct experiments on the dataset collected from eight open-source projects.The experimental results show that CHANN is highly effective in predicting clone consistency,and the precision,recall,and F-measure attained in prediction are around 82%.These findings support our hypothesis that the hierarchical neural network can help developers predict clone consistency effectively in the case of cross-project incubation when insufficient data are available at the early stage of software development.展开更多
Information technology facilitates people’s lives greatly,while it also brings many security issues, such as code plagiarism, softwarein-fringement, and malicious code. In order to solve the problems,reverse engineer...Information technology facilitates people’s lives greatly,while it also brings many security issues, such as code plagiarism, softwarein-fringement, and malicious code. In order to solve the problems,reverse engineering is applied to analyze abundant binary code manually,which costs a lot of time. However, due to the maturity of differentobfuscation techniques, the disassembly code generated from the samefunction differs greatly in the opcode and control flow graph throughdifferent obfuscation options. This paper propose a method inspired bynatural language processing, to realize the semantic similarity matchingof binary code in basic block granularity and function granularity. In thesimilarity matching task of binary code obtained by different obfuscationoptions of LLVM, the indicator reaches 99%, which is better than theexisting technologies.展开更多
文摘In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.
基金Sponsored by the National Natural Science Foundation of China(Grant No.61173021)
文摘There are lots of code clones appearing in software,which are similar code fragments with each other. In the past decades,researchers have proposed some state-of-the-art methods to detect clones. The code clones have showing some relationship with the evolution of software. In order to explore relationships between clones and their evolution,we propose a framework to cluster clones with a Fuzzy C-means clustering method.Firstly,we detect all the clones using Ni Cad,and build the clone genealogies for multiple versions software.Secondly,we extract some metrics to describe the clones and their evolution. Finally,we cluster all clone's vectors,which are generated with the different metrics for different proposes. Experimental results on six open source software packages have shown the relationships among the clone life,the number of change times,the clone pattern and et al. can help developers to understand clones.
基金This work was supported by the National Natural Science Foundation(61363017)of China.The author is Liu,D.S.and the website is https://isisn.nsfc.gov.cn.
文摘Reusing code fragments by copying and pasting them with or without minor adaptation is a common activity in software development.As a result,software systems often contain sections of code that are very similar,called code clones.Code clones are beneficial in reducing software development costs and development risks.However,recent studies have indicated some negative impacts as a result.In order to effectively manage and utilize the clones,we design an approach for recommending refactoring clones based on a Bayesian network.Firstly,clone codes are detected from the source code.Secondly,the clones that need to be refactored are identified,and the static and evolutions features are extracted to build the feature database.Finally,the Bayesian network classifier is used for training and evaluating the classification results.Based on more than 640 refactor examples of five open source software developed in C,we observe a considerable enhancement.The results show that the accuracy of the approach is larger than 90%.We believe our approach will provide a more accurate and reasonable code refactoring and maintenance advice for software developers.
基金supported by the National Natural Science Foundation of China under Grant Nos.U20A6003 and 62237001the Guangdong Science and Technology Plan Project under Grant No.2021B1212100004+1 种基金the Guangdong Natural Science Fund Project under Grant No.2021A1515011243Guangdong Joint Fund of the National Natural Science Foundation of China underGrant Nos.U1801263and U1701262.
文摘Modifying a code segment may give rise to a consistency issue when the code segment belongs to a clone group comprising closely similar code segments.Recent studies have demonstrated that such consistent changes can incur extra maintenance costs when clones are checked for consistency and introduce defects if developers forget to change clones consistently when needed.To address this problem,researchers have proposed an approach to predict clone consistency in advance with handcrafted attributes,notably using machine learning methods.Although these attributes can help predict clone consistency to some extent,the capability of such an approach is generally weak and unsatisfactory in practice.Such limitations in capability are especially severe at a project's infancy stage when there is not sufficient within-project data to model clone consistency behavior,and cross-project data have not been helpful in supporting prediction.In this paper,we propose the Clone Hierarchical Attention Neural Network(CHANN)to represent code clones and their evolution by adopting a hierarchical perspective of code,context,and code evolution,and thus enhancing the effectiveness of clone consistency prediction.To assess the effectiveness of CHANN,we conduct experiments on the dataset collected from eight open-source projects.The experimental results show that CHANN is highly effective in predicting clone consistency,and the precision,recall,and F-measure attained in prediction are around 82%.These findings support our hypothesis that the hierarchical neural network can help developers predict clone consistency effectively in the case of cross-project incubation when insufficient data are available at the early stage of software development.
基金Supported by the Foundation of National Natural Science Foundation of China(No.61802435).
文摘Information technology facilitates people’s lives greatly,while it also brings many security issues, such as code plagiarism, softwarein-fringement, and malicious code. In order to solve the problems,reverse engineering is applied to analyze abundant binary code manually,which costs a lot of time. However, due to the maturity of differentobfuscation techniques, the disassembly code generated from the samefunction differs greatly in the opcode and control flow graph throughdifferent obfuscation options. This paper propose a method inspired bynatural language processing, to realize the semantic similarity matchingof binary code in basic block granularity and function granularity. In thesimilarity matching task of binary code obtained by different obfuscationoptions of LLVM, the indicator reaches 99%, which is better than theexisting technologies.