摘要
代码异味会导致软件质量逐渐衰退,降低软件可理解性和可维护性.为检测软件结构中的代码异味,提出了一种基于CK度量的、经过两步特征选择的软投票集成学习的代码异味检测方法,该方法首先进行特征选择,使用Pearson相关系数剔除冗余特征,并在剩余度量中使用XGBoost特征重要性筛选相关度大的度量.然后,针对仅使用单一机器学习模型泛化性能不佳的问题,提出一种基于5种较成熟机器学习模型的软投票集成学习模型,完成代码异味分类检测任务.实验基于CK度量,利用含7个开源项目、4种代码异味的数据集,实验结果表明,此种方法能够减少特征维度,且在性能指标上优于其它分类模型,其中F1值最高提升3.24%,AUC最高提升2.32%.
Code smells can lead to the gradual deterioration of software quality and reduce the understandability and maintainability.To detect code smells in software structure,it is proposed a method based on CK metrics and two-step feature selection soft voting ensemble learning in this paper.Firstly,Pearson correlation coefficient was used to remove redundant attributes,and XGBoost feature importance was used to select the attributes with high correlation in the remaining attributes.Then,in order to solve the problem of poor generalization performance using only one single machine learning model,a soft voting ensemble learning model based on five mature machine learning models was proposed to complete the code smells classification detection task.The experiment is based on CK metrics,the data set containing 7 open source projects and 4 types of code odor is used.The results show that the proposed method can reduce the characteristic dimension and is superior to other classification models in terms of performance index,in which F1 value and AUC value increase by 3.24%and 2.32%respectively.
作者
黄晨峻
高建华
HUANG Chenjun;GAO Jianhua(Department of Computer Science and Technology,Shanghai Normal University,Shanghai 200234,China)
出处
《小型微型计算机系统》
北大核心
2025年第2期504-512,共9页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61672355)资助。