期刊文献+

基于机器学习的软件缺陷预测研究 被引量:2

Research on software defect prediction based on machine learning
在线阅读 下载PDF
导出
摘要 在机器学习技术逐渐渗透到各个领域的背景下,软件开发流程中的软件测试非常重要,面对在软件缺陷预测过程中出现的类别不平衡和准确性问题,提出一种基于监督学习的解决方案,采用样本平衡技术,结合合成少数类过采样技术(synthetic minority over-sampling technique,SMOTE)与编辑最近邻(edited nearest neighbor,ENN)算法,对局部加权学习(local weight learning,LWL)、J48、C4.8、随机森林、贝叶斯网络(Bayes net,BN)、多层前馈神经网络(multilayer feedforward neural network,MFNN)、支持向量机(supported vector machine,SVM)以及朴素贝叶斯(naive Bayes key,NB-K)等多种算法进行测试。这些算法被应用于NASA数据库的3个不同数据集(KK1,KK3,PK2),并对其效果进行详细比较分析。研究结果显示,结合了SMOTE和ENN的随机森林模型在处理类别不平衡问题方面展现出高效且避免过拟合的优势,为解决软件缺陷预测中的类别不平衡提供了一种有效的解决方案。 With the gradual penetration of machine learning technology into various fields,software testing in the software development process is very important.Software defect prediction faces class imbalance problem and accuracy issue.This paper proposes a supervised learning-based software prediction method for solving these two core problems.The method adopts sample balancing technique,combined with synthetic minority over-sampling technique(SMOTE)and edited nearest neighbor(ENN)algorithm,to test local weight learning(LWL),J48,C4.8,random forest,Bayes net(BN),multilayer feedforward neural network(MFNN),supported vector machine(SVM),and naive Bayes key(NB-K).These algorithms are applied to three different datasets(KK1,KK3 and PK2)in the NASA database and their effects are compared and analyzed in detail.The results show that the random forest model combining SMOTE and ENN exhibits high efficiency and avoiding overfitting in dealing with class imbalance problems,which provides an effective way to solve the problem in software defect prediction.
作者 喻皓 张莹 李倩 姜立标 尚云鹏 YU Hao;ZHANG Ying;LI Qian;JIANG Libiao;SHANG Yunpeng(GAC Aion New Energy Automobile Co.,Ltd.,Guangzhou 511400,P.R.China;Syncore Autotech Co.,Ltd.,Guangzhou 510335,P.R.China;The Fifth Research Institute of Electronics,Ministry of Industry and Information Technology,Guangzhou 510463,P.R.China;School of Mechanical Engineering and Robotics,,Guangzhou City University of Technology,Guangzhou 510800,P.R.China;Institute of Engineering Research,Guangzhou City University of Technology,Guangzhou 510800,P.R.China;School of Mechanical&Automotive Engineering,South China University of Technology,Guangzhou 510641,P.R.China)
出处 《重庆大学学报》 北大核心 2025年第2期10-21,共12页 Journal of Chongqing University
基金 国家自然科学基金(61602345)。
关键词 软件缺陷预测 机器学习 类不平衡 XGBoost 随机森林 software defect prediction machine learning class imbalance XGBoost random forest
  • 相关文献

参考文献12

二级参考文献100

  • 1徐宗云,王世伟.基于SOA架构的PDM系统与制造管理系统集成设计与实现[J].冶金自动化,2010,34(1):10-15. 被引量:6
  • 2闫明松,周志华.代价敏感分类算法的实验比较[J].模式识别与人工智能,2005,18(5):628-635. 被引量:14
  • 3胡旺,李志蜀.一种更简化而高效的粒子群优化算法[J].软件学报,2007,18(4):861-868. 被引量:346
  • 4Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In Proc. the 11th Annual Conference on Computational Learning Theory, Madison, USA, Jul.24-26,1998,pp.92-100.
  • 5Goldman S, Zhou Y. Enhancing supervised learning with un-labeled data. In Proc. the 17th International Conference onMachine Learning, San Francisco, USA, Jun. 29-Jul.2,2000,pp.327-334.
  • 6Li M, Zhou Z H. Improve computer-aided diagnosis with ma-chine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics - Part A: Systems and Humans,2007,37(6):1088-1098.
  • 7Zhou Z H, Li M. Tri-training: Exploiting unlabeled data us-ing three classifiers. IEEE Transactions on Knowledge and Data Engineering,2005,17(11):1529-1541.
  • 8Zhou Z H, Li M. Semi-supervised regression with co-training style algorithms. IEEE Transactions on Knowledge and Data Engineering,2007,19(11):1479-1493.
  • 9Steedman M, Osborne M, Sarkar A et al. Bootstrapping sta-tistical parsers from small data sets. In Proc. the 11th Con-ference on the European Chapter of the Association for Com-putational Linguistics, Budapest, Hungary, Apr.12-17,2003, pp.331-338.
  • 10Li M, Zhou Z H. Semi-supervised document retrieval. Infor-mation Processing & Management,2009,45(3):341-355.

共引文献93

同被引文献24

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部