Software projects are becoming larger and more complicated. Managing those projects is based on several software development methodologies. One of those methodologies is software version control, which is used in the ...Software projects are becoming larger and more complicated. Managing those projects is based on several software development methodologies. One of those methodologies is software version control, which is used in the majority of worldwide software projects. Although existing version control systems provide sufficient functionality in many situations, they are lacking in terms of semantics and structure for source code. It is commonly believed that improving software version control can contribute substantially to the development of software. We present a solution that considers a structural model for matching source code that can be used in version control.展开更多
The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactic...The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.展开更多
In order to deal with the complex association relationships between classes in an object-oriented software system,a novel approach for identifying refactoring opportunities is proposed.The approach can be used to dete...In order to deal with the complex association relationships between classes in an object-oriented software system,a novel approach for identifying refactoring opportunities is proposed.The approach can be used to detect complex and duplicated many-to-many association relationships in source code,and to provide guidance for further refactoring.In the approach,source code is first transformed to an abstract syntax tree from which all data members of each class are extracted,then each class is characterized in connection with a set of association classes saving its data members.Next,classes in common associations are obtained by comparing different association classes sets in integrated analysis.Finally,on condition of pre-defined thresholds,all class sets in candidate for refactoring and their common association classes are saved and exported.This approach is tested on 4 projects.The results show that the precision is over 96%when the threshold is 3,and 100%when the threshold is 4.Meanwhile,this approach has good execution efficiency as the execution time taken for a project with more than 500 classes is less than 4 s,which also indicates that it can be applied to projects of different scales to identify their refactoring opportunities effectively.展开更多
In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cl...In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.展开更多
In this paper, a method to initiate, develop and visualize an abstract syntax tree (AST) in C++ source code is presented. The approach is in chronological order starting with collection of program codes as a string an...In this paper, a method to initiate, develop and visualize an abstract syntax tree (AST) in C++ source code is presented. The approach is in chronological order starting with collection of program codes as a string and split into individual characters using regular expression. This will be followed by separating the token grammar using best first search (BFS) algorithm to determine node having lowest value, lastly followed by graph presentation of intermediate representation achieved with the help of graph visualization software (GraphViz) while former is implemented using python programming language version 3. The efficacy of our approach is used in analyzing C++ code and yielded a satisfactory result.展开更多
文摘Software projects are becoming larger and more complicated. Managing those projects is based on several software development methodologies. One of those methodologies is software version control, which is used in the majority of worldwide software projects. Although existing version control systems provide sufficient functionality in many situations, they are lacking in terms of semantics and structure for source code. It is commonly believed that improving software version control can contribute substantially to the development of software. We present a solution that considers a structural model for matching source code that can be used in version control.
基金supported by the National Key R&D Program of China (2018YFB1702700)。
文摘The existing software bug localization models treat the source file as natural language, which leads to the loss of syntactical and structure information of the source file. A bug localization model based on syntactical and semantic information of source code is proposed. Firstly, abstract syntax tree(AST) is divided based on node category to obtain statement sequence. The statement tree is encoded into vectors to capture lexical and syntactical knowledge at the statement level.Secondly, the source code is transformed into vector representation by the sequence naturalness of the statement. Therefore,the problem of gradient vanishing and explosion caused by a large AST size is obviated when using AST to the represent source code. Finally, the correlation between bug reports and source files are comprehensively analyzed from three aspects of syntax, semantics and text to locate the buggy code. Experiments show that compared with other standard models, the proposed model improves the performance of bug localization, and it has good advantages in mean reciprocal rank(MRR), mean average precision(MAP) and Top N Rank.
文摘In order to deal with the complex association relationships between classes in an object-oriented software system,a novel approach for identifying refactoring opportunities is proposed.The approach can be used to detect complex and duplicated many-to-many association relationships in source code,and to provide guidance for further refactoring.In the approach,source code is first transformed to an abstract syntax tree from which all data members of each class are extracted,then each class is characterized in connection with a set of association classes saving its data members.Next,classes in common associations are obtained by comparing different association classes sets in integrated analysis.Finally,on condition of pre-defined thresholds,all class sets in candidate for refactoring and their common association classes are saved and exported.This approach is tested on 4 projects.The results show that the precision is over 96%when the threshold is 3,and 100%when the threshold is 4.Meanwhile,this approach has good execution efficiency as the execution time taken for a project with more than 500 classes is less than 4 s,which also indicates that it can be applied to projects of different scales to identify their refactoring opportunities effectively.
文摘In the recent era of software development,reusing software is one of the major activities that is widely used to save time.To reuse software,the copy and paste method is used and this whole process is known as code cloning.This activity leads to problems like difficulty in debugging,increase in time to debug and manage software code.In the literature,various algorithms have been developed to find out the clones but it takes too much time as well as more space to figure out the clones.Unfortunately,most of them are not scalable.This problem has been targeted upon in this paper.In the proposed framework,authors have proposed a new method of identifying clones that takes lesser time to find out clones as compared with many popular code clone detection algorithms.The proposed framework has also addressed one of the key issues in code clone detection i.e.,detection of near-miss(Type-3)and semantic clones(Type-4)with significant accuracy of 95.52%and 92.80%respectively.The present study is divided into two phases,the first method converts any code into an intermediate representation form i.e.,Hashinspired abstract syntax trees.In the second phase,these abstract syntax trees are passed to a novel approach“Similarity-based self-adjusting hash inspired abstract syntax tree”algorithm that helps in knowing the similarity level of codes.The proposed method has shown a lot of improvement over the existing code clones identification methods.
文摘In this paper, a method to initiate, develop and visualize an abstract syntax tree (AST) in C++ source code is presented. The approach is in chronological order starting with collection of program codes as a string and split into individual characters using regular expression. This will be followed by separating the token grammar using best first search (BFS) algorithm to determine node having lowest value, lastly followed by graph presentation of intermediate representation achieved with the help of graph visualization software (GraphViz) while former is implemented using python programming language version 3. The efficacy of our approach is used in analyzing C++ code and yielded a satisfactory result.