期刊文献+

软件缺陷预测中基于聚类分析的特征选择方法 被引量:25

A cluster-analysis-based feature-selection method for software defect prediction
原文传递
导出
摘要 软件缺陷预测通过挖掘软件历史仓库,构建缺陷预测模型来预测出被测项目内的潜在缺陷程序模块.但有时候搜集到的缺陷预测数据集中含有的冗余特征和无关特征会影响到缺陷预测模型的性能.提出一种基于聚类分析的特征选择方法 FECAR.具体来说,首先基于特征之间的关联性(即FFC),将已有特征进行聚类分析.随后基于特征与类标间的相关性(即FCR),对每个簇中的特征从高到低进行排序并选出指定数量的特征.在实证研究中,借助对称不确定性(symmetric uncertainty)来计算FFC,借助信息增益(information gain)、卡方值(chi-square)或Relief F来计算FCR.以Eclipse和NASA数据集等实际项目为评测对象,重点分析了应用FECAR方法后的缺陷预测模型的性能,FECAR方法选出的特征子集冗余率和比例.结果验证了FECAR方法的有效性. By mining historical software repositories, software defect prediction can construct defect-prediction models to predict potentially faulty modules in projects under testing. However, redundant and irrelevant features in the gathered datasets may influence the effectiveness of existing methods. A novel cluster-analysis-based feature-selection method(FECAR) is proposed. In particular, the original features are first clustered, based on a specific feature correlation(i.e., FFC) measure. Then, for each cluster, features are ranked based on a specific feature and class relevance(i.e., FCR) measure and a given number of features are chosen. In empirical studies,we chose symmetric uncertainty as the FFC measure, and information gain, chi-square, or Relief F as the FCR measures. Based on some real-world projects, such as Eclipse and NASA, we focus on the prediction performance after using FECAR, and analyze the redundancy rate and selection proportion of the selected feature subset. The final results show the effectiveness of FECAR.
出处 《中国科学:信息科学》 CSCD 北大核心 2016年第9期1298-1320,共23页 Scientia Sinica(Informationis)
基金 国家自然科学基金(批准号:61373012 61321491 91218302 61202006) 国家重点基础研究发展计划(973计划)(批准号:2009C B320705) 江苏省高校自然科学研究项目(批准号:12KJB520014) 南京大学计算机软件新技术国家重点实验室开放课题(批准号:KFKT2016B18) 南京大学软件新技术与产业化协同创新中心资助项目
关键词 软件质量保障 缺陷预测 数据挖掘 特征选择 聚类分析 software quality assurance defect prediction data mining feature selection cluster analysis
  • 相关文献

参考文献53

  • 1Wang Q, Wu S J, Li M S. Software defect prediction. J Softw, 2008, 19:1565-1580.
  • 2王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580. 被引量:149
  • 3Hall T, Beecham S, Bowes D, et al. A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng, 2012, 38:1276-1304.
  • 4Yu S S, Zhou S G, Guan J H. Software engineering data mining: a survey. J Front Comput Sci Tech, 2012, 6:1-31.
  • 5郁抒思,周水庚,关佶红.软件工程数据挖掘研究进展[J].计算机科学与探索,2012,6(1):1-31. 被引量:25
  • 6Chen X, Gu Q, Liu W S, et al. Survey of static software defect prediction. J Softw, 2016, 1:1-25.
  • 7陈翔,顾庆,刘望舒,刘树龙,倪超.静态软件缺陷预测方法研究[J].软件学报,2016,27(1):1-25. 被引量:127
  • 8Ghotra B, McIntosh S, Hassan A E. Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 789 -800.
  • 9Peters F, Menzies T, Layman L. LACE2: better privacy-preserving data sharing for cross project defect prediction. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 801-811.
  • 10Tantithamthavorn C, McIntosh S, Hassan A E, et al. The impact of mislabelling on the performance and interpretation of defect prediction models. In: Proceedings of the International Conference on Software Engineering, Firenze, 2015. 812-823.

二级参考文献259

共引文献253

同被引文献97

引证文献25

二级引证文献94

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部