期刊文献+

一种软件缺陷不平衡数据分类新方法 被引量:7

A novel unbalanced data classification method for software defects
在线阅读 下载PDF
导出
摘要 针对软件缺陷预测数据中的数据不平衡、预测精度低以及特征维度高的问题,提出了一种RUS-RSMOTE-PCA-Vote的软件缺陷不平衡数据分类方法。首先通过随机欠采样来减少无缺陷样本的数量;在此基础上进行SMOTE过采样,在过采样中综合总体样本的分布状况引入影响因素posFac指导新样本的合成;对经过RUS-RSMOTE混合采样处理后的数据集进行PCA降维,最后应用Vote组合K最近邻、决策树、支持向量机构造集成分类器。在NASA数据集上的实验结果表明,与现有不平衡数据分类方法相比,所提方法在F-value值、G-mean值和AUC值上更优,有效地改善了软件缺陷预测数据集的分类性能。 To solve the problems of data imbalance,low prediction accuracy and feature dimension in software defect prediction data,a RUS-RSMOTE-PCA-Vote(random under sampling-random synthetic minority oversampling technique-principal components analysis-vote)software defect imbalance data classification method was proposed.Firstly,the number of non-defective samples was reduced by random under sampling.On this basis,SMOTE oversampling was carried out,during which the influence factor posFac(position factor)was introduced into the overall sample distribution to guide the synthesis of the new sample.Then the data set after RUS-RSMOTE sampling was subjected to PCA dimensionality reduction.Finally,an integrated classifier was constructed by using Vote in combination with K nearest neighbor,decision tree,and support vector machine.The experimental results on the NASA(National Aeronautics and Space Administration)data set show that the proposed method is superior to the existing unbalanced data classification methods in terms of F-value,G-mean value and AUC value,thus effectively improves the classification performance of the software defect prediction data set.
作者 刘文英 林亚林 李克文 雷永秀 LIU Wenying;LIN Yalin;LI Kewen;LEI Yongxiu(College of Computer Science and Technology,China University of Petroleum(East China),Qingdao,Shandong 266580,China)
出处 《山东科技大学学报(自然科学版)》 CAS 北大核心 2021年第2期84-94,共11页 Journal of Shandong University of Science and Technology(Natural Science)
基金 国家自然科学基金项目(61673396) 山东省自然科学基金项目(ZR2017MF032)。
关键词 软件缺陷预测 不平衡数据 混合采样 特征降维 集成分类器 software defect prediction unbalanced data hybrid sampling feature dimensionality reduction ensemble classifier
  • 相关文献

参考文献5

二级参考文献55

  • 1杨智明.面向不平衡数据的支持向量机分类方法研究[D].哈尔滨:哈尔滨工业大学,2009.
  • 2董燕杰.不平衡数据集分类的Random-SMOTE方法研究[D].大连:大连理工大学,2009.
  • 3MENZIES T, MILTON Z, TURHAN B, et al. Defect prediction from static code features : current results, limitations, new approaches [ J ]. Automated Software Engineering, 2010,17 ( 5 ) : 375-407.
  • 4JIANG Y, CUKIC B, MENZIES T, et al. Comparing design and code metrics for software quality prediction [ C ]//Proc of the 4th International Workshop on Predictor Models in Software Engineering. New York : ACM Press ,2008 : 11-18.
  • 5MENZIES T, TURHAN B, BENER A, et al. Implications of ceiling effects in defect predictors [ C ]// Proc of ACM International Conference on Predictive Models in Software Engineering. 2008: 47-54.
  • 6ZHANG H, NELSON A, MENZIES T. On the value of learning from defect dense components for software defect prediction [ C ]// Proc of ACM International Conference on Predictive Models in Software Engineering. 2010 : 1-9.
  • 7STEFANO C D, FONTANELLA F, FOLINO G, et al. A Bayesian approach for combining ensembles of GP classifiers[ C]//Proc of the 10th International Workshop on Multiple Classifier Systems. 2011:26-35.
  • 8TOSUN A, TURHAN B, BERNER A B. Ensemble of software defect predictors : a case study[ C ]//Proc of the 2nd International Symposium on Empirical Software Engineering and Measurement. 2008:318- 320.
  • 9ZHENG J. Cost-sensitive boosting neural networks for software defect prediction[ J]. Expert Systems with Applications, 2010,37 ( 6 ) : 4537-4543.
  • 10BREIMAN L. Bagging predictors [ J]. Machine Learning, 1996,24 (2) :123-140.

共引文献66

同被引文献67

引证文献7

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部