摘要
在对实际问题进行数据挖掘时面临的多数是非均衡数据集,即各种类型的数据分布并不均匀,且关注的类型常是少数类。运用含有少量少数类型事例的数据集训练后的模型进行预测时,通常对多数类的预测精度很高,而少数类的预测精确性却很差。提出了一种集成方法SMOTEBoostSVM,通过SMOTE技术人工生成增加少数类样本量,以具有较强分类性能和泛化性能的SVM作为弱分类器,并以AdaBoost算法构建集成分类器。实验结果表明,SMOTEBoostSVM集成分类器比单纯运用SMOTE技术、AdaBoost算法以及SVM等的分类器,在非均衡数据集的分类预测中具有更好的效果。
Many real world data mining applications involve imbalanced data sets, where all kinds of data are unevently distributed and the particular events of interest may be very few when compared to the other classes. Data sets that contain rare events usually produces biased classifiers that have a higher predictive accuracy over the majority classes, but poorer predictive accuracy over the minority class of interest. This paper presents a novel ensemble algorithm, SMOTEBoostSVM, which balances the classes distribution with SMOTE, and combines AdaBoost algorithm with SMOTE, using SVM as weaker. Experiments on imbalanced datasets showed that the SMOTEBoostSVM algorithm performed better in classifying prediction of imblanced data sets SMOTE, AdaBoost or SVM used alone.
出处
《系统工程》
CSCD
北大核心
2008年第5期116-119,共4页
Systems Engineering