摘要
针对软件缺陷数据集中不相关特征和冗余特征会降低软件缺陷个数预测模型的性能的问题,提出了一种面向软件缺陷个数预测的混合式特征选择方法——HFSNFP。首先,利用Relief F算法计算每个特征与缺陷个数之间的相关性,选出相关性最高的m个特征;然后,基于特征之间的关联性利用谱聚类对这m个特征进行聚类;最后,利用基于包裹式特征选择思想从每个簇中依次挑选最相关的特征形成最终的特征子集。实验结果表明,相比于已有的五种过滤式特征选择方法,HFSNFP方法在提高预测率的同时降低了误报率,且G-measure与RMSE度量值更佳;相比于已有的两种包裹式特征选择方法,HFSNFP方法在保证缺陷个数预测性能的同时可以显著降低特征选择的时间。
Focused on the issue that the irrelevant and redundant features in software defect data would degrade the perfor- mance of the number of software faults prediction models, this paper proposed a hybrid feature selection method for the number of faults prediction (HFSNFP). Firstly, HFSNFP computed the relevance between every feature and the number of fault with ReliefF algorithm and selected the top m most relevant features. Then, HFSNFP grouped the m features with spectral clustering algorithm according to the correlation between every two features. Finally, HFSNFP selected the most relevant features from each resulted cluster to form the final feature subset using a wrapper search. Compared with the five existing filter-based fea- ture selection methods, the experimental results show that HFSNFP increases PD value, reduces PF value and achieves better G-measure and RMSE values. Comparied with the two wrapper-based feature selection methods, it demonstrates that HFSNFP can achieve the high performance of the number of faults prediction and reduce the running time of feature selection.
出处
《计算机应用研究》
CSCD
北大核心
2018年第2期487-492,502,共7页
Application Research of Computers
基金
湖北大学精品课程(013665
150145)
关键词
软件缺陷个数预测
特征选择
谱聚类
包裹式特征选择
number of software faults prediction
feature selection
spectral clustering
wrapper-based feature selection