Graph analysis can be done at scale by using Spark GraphX which loading data into memory and running graph analysis in parallel.In this way,we should take data out of graph databases and put it into memory.Considering...Graph analysis can be done at scale by using Spark GraphX which loading data into memory and running graph analysis in parallel.In this way,we should take data out of graph databases and put it into memory.Considering the limitation of memory size,the premise of accelerating graph analytical process reduces the graph data to a suitable size without too much loss of similarity to the original graph.This paper presents our method of data cleaning on the software graph.We use SEQUITUR data compression algorithm to find out hot code path and store it as a whole paths directed acyclic graph.Hot code path is inherent regularity of a program.About 10 to 200 hot code path account for 40%-99%of a program’s execution cost.These hot paths are acyclic contribute more than 0.1%-1.0%of some execution metric.We expand hot code path to a suitable size which is good for runtime and keeps similarity to the original graph.展开更多
Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This pape...Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This paper divides multi-source software knowledge entities into unstructured data,semi-structured data and code data.For these different types of data,Bi-directional Long Short-Term Memory(Bi-LSTM)with Conditional Random Field(CRF),template matching,and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model(MEIM)to extract software entities.The model can be updated continuously based on user’s feedbacks to improve the accuracy.To deal with the shortage of entity annotation datasets,keyword extraction methods based on Term Frequency–Inverse Document Frequency(TF-IDF),TextRank,and K-Means are applied to annotate tasks.The proposed MEIM model is applied to the Spring Boot framework,which demonstrates good adaptability.The extracted entities are used to construct a knowledge graph,which is applied to association retrieval and association visualization.展开更多
Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-o...Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.展开更多
Software intelligent development has become one of the most important research trends in software engineering. In this paper, we put forward two key concepts -- intelligent development environment (IntelliDE) and so...Software intelligent development has become one of the most important research trends in software engineering. In this paper, we put forward two key concepts -- intelligent development environment (IntelliDE) and software knowledge graph -- for the first time. IntelliDE is an ecosystem in which software big data are aggregated, mined and analyzed to provide intelligent assistance in the life cycle of software development. We present its architecture and discuss its key research issues and challenges. Software knowledge graph is a software knowledge representation and management framework, which plays an important role in IntelliDE. We study its concept and introduce some concrete details and examples to show how it could be constructed and leveraged.展开更多
基金This research work is supported by Hunan Provincial Education Science 13th Five-Year Plan(Grant No.XJK016BXX001)Social Science Foundation of Hunan Province(Grant No.17YBA049)+2 种基金Hunan Provincial Natural Science Foundation of China(Grant No.2017JJ2016)The work is also supported by Open foundation for University Innovation Platform from Hunan Province,China(Grand No.16K013)the 2011 Collaborative Innovation Center of Big Data for Financial and Economical Asset Development and Utility in Universities of Hunan Province.National Students Platform for Innovation and Entrepreneurship Training(Grand No.201811532010).
文摘Graph analysis can be done at scale by using Spark GraphX which loading data into memory and running graph analysis in parallel.In this way,we should take data out of graph databases and put it into memory.Considering the limitation of memory size,the premise of accelerating graph analytical process reduces the graph data to a suitable size without too much loss of similarity to the original graph.This paper presents our method of data cleaning on the software graph.We use SEQUITUR data compression algorithm to find out hot code path and store it as a whole paths directed acyclic graph.Hot code path is inherent regularity of a program.About 10 to 200 hot code path account for 40%-99%of a program’s execution cost.These hot paths are acyclic contribute more than 0.1%-1.0%of some execution metric.We expand hot code path to a suitable size which is good for runtime and keeps similarity to the original graph.
基金Zhifang Liao:Ministry of Science and Technology:Key Research and Development Project(2018YFB003800),Hunan Provincial Key Laboratory of Finance&Economics Big Data Scienceand Technology(Hunan University of Finance and Economics)2017TP1025,HNNSF 2018JJ2535Shengzong Liu:NSF61802120.
文摘Entity recognition and extraction are the foundations of knowledge graph construction.Entity data in the field of software engineering come from different platforms and communities,and have different formats.This paper divides multi-source software knowledge entities into unstructured data,semi-structured data and code data.For these different types of data,Bi-directional Long Short-Term Memory(Bi-LSTM)with Conditional Random Field(CRF),template matching,and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model(MEIM)to extract software entities.The model can be updated continuously based on user’s feedbacks to improve the accuracy.To deal with the shortage of entity annotation datasets,keyword extraction methods based on Term Frequency–Inverse Document Frequency(TF-IDF),TextRank,and K-Means are applied to annotate tasks.The proposed MEIM model is applied to the Spring Boot framework,which demonstrates good adaptability.The extracted entities are used to construct a knowledge graph,which is applied to association retrieval and association visualization.
基金Supported by the National Pre-research Project (513150601)
文摘Identifying composite crosscutting concerns(CCs) is a research task and challenge of aspect mining.In this paper,we propose a scatter-based graph clustering approach to identify composite CCs.Inspired by the state-of-the-art link analysis tech-niques,we propose a two-state model to approximate how CCs tangle with core modules.According to this model,we obtain scatter and centralization scores for each program element.Espe-cially,the scatter scores are adopted to select CC seeds.Further-more,to identify composite CCs,we adopt a novel similarity measurement and develop an undirected graph clustering to group these seeds.Finally,we compare it with the previous work and illustrate its effectiveness in identifying composite CCs.
文摘Software intelligent development has become one of the most important research trends in software engineering. In this paper, we put forward two key concepts -- intelligent development environment (IntelliDE) and software knowledge graph -- for the first time. IntelliDE is an ecosystem in which software big data are aggregated, mined and analyzed to provide intelligent assistance in the life cycle of software development. We present its architecture and discuss its key research issues and challenges. Software knowledge graph is a software knowledge representation and management framework, which plays an important role in IntelliDE. We study its concept and introduce some concrete details and examples to show how it could be constructed and leveraged.