期刊文献+

若干评价准则对不平衡数据学习的影响 被引量:23

Effects of Several Evaluation Metrics on Imbalanced Data Learning
在线阅读 下载PDF
导出
摘要 为解决绝大部分传统的以精度准则为优化目标而获得的分类器不适于不平衡数据学习(IDL)的问题,文中通过在支持向量机(SVM)模型上进行"元学习",研究了精度、平衡精度、几何平均、F1得分、信息增益、AUC(ROC曲线下方图面积)以及文中新提出的GAF和GBF等评价准则对IDL的影响.在16个来自UCI的不平衡数据集上进行了仿真实验.对实验结果的统计分析表明:不同准则对分类器性能的影响有显著差异;即便是对于先进的学习方法支持向量机(SVM)而言,若以精度准则最大化选择分类器,那么得到的SVM分类器也容易偏向预测多类;通过在其他准则上优化,能输出纠偏了的SVM分类器,它们的整体性能更好,尤其是在预测少类能力方面;在GAF以及GBF准则上优化所得的SVM分类器具有稳定且良好的性能. As most traditional classifiers optimized with the accuracy metric are unsuitable for imbalanced data learning(IDL),this paper performs a meta-learning on a support vector machine(SVM) model,and investigates the IDL affected by such metrics as the accuracy,the balance accuracy,the geometric mean,the F1 score,the information gain,the AUC(Area Under ROC Curve),as well as the two new metrics proposed in this paper,namely GAF and GBF.Moreover,simulation experiments are conducted on 16 imbalanced datasets from UCI,with a statistical analysis of the experimental results being also carried out.It is indicated that(1) there are distinct differences in the effects of these metrics on the classifier's performances;(2) even for the support vector machine(SVM),an advanced learning method,its output classifier is still readily biased to majority class when the classifier is selected by maximizing the accuracy;(3) through the optimization with the help of other metrics,it is feasible to output bias-rectified SVM classifiers,which are of better overall performance,especially in terms of the prediction ability for minor classes;and(4) the output SVM classifiers optimized with GAF and GBF metrics are of stable and good performance.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2010年第4期147-155,共9页 Journal of South China University of Technology(Natural Science Edition)
基金 广东省教育部产学研结合项目(2007B090400031) 广东高校优秀青年创新人才培育项目(LYM08074)
关键词 评价准则 不平衡数据学习 支持向量机 GAF准则 GBF准则 evaluation metric imbalanced data learning support vector machine GAF metric GBF metric
  • 相关文献

参考文献15

  • 1Chawla N V,Japkowicz N,Kotcz A.Editorial:special issue on learning from imbalanced data sets[J].ACM SIGKDD Explorations Newsletter,2004,6(1):1-6.
  • 2Yang Q,Wu X.10 challenging problems in data mining research[J].International Journal of Information Technology & Decision Making,2006,5(4):597-604.
  • 3Sokolova M,Japkowicz N,Szpakowicz S.Beyond accuracy,F-score and ROC:a family of discriminant measures for performance evaluation[C]∥Proceedings of the 2006 Australian Joint Conference on Artificial Intelligence (AI 2006).Hobart:Springer,2006:1015-1021.
  • 4Caruana R,Niculescu-Mizil A.Data mining in metric space:an empirical analysis of supervised learning performance criteria[C]∥Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD 2004).Seattle:ACM Press,2004:69-78.
  • 5Ferri C,Hernández-Orallo J,Modroiu R.An experimental comparison of performance measures for classification[J].Pattern Recognition Letters,2009,30(1):27-38.
  • 6Vapnik V N.Statistical learning theory[M].New York:John Wiley & Sons,1998.
  • 7Duda R O,Hart P E,Stork D G.Pattern Classification[M].2nd ed.New York:John Wiley & Sons,2001.
  • 8Yan L,Dodier R,Mozer M C,et al.Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic[C]∥Proceedings of the 20th International Conference on Machine Learning (ICML 2003).Washington:AAAI Press,2003:848-855.
  • 9王泳,胡包钢.应用统计方法综合评估核函数分类能力的研究[J].计算机学报,2008,31(6):942-952. 被引量:22
  • 10Veropoulos K,Campbell C,Cristianini N.Controlling the sensitivity of support vector machines[C]∥Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence.Stockholm:Morgan Kaufmann,1999:55-60.

二级参考文献21

  • 1刘向东,骆斌,陈兆乾.支持向量机最优模型选择的研究[J].计算机研究与发展,2005,42(4):576-581. 被引量:50
  • 2王玲,薄列峰,刘芳,焦李成.最小二乘隐空间支持向量机[J].计算机学报,2005,28(8):1302-1307. 被引量:12
  • 3Vapnik V. The Nature of Statistical Learning Theory. 2nd Edition. New York: Springer-Verlag, 2000.
  • 4Steinwart I, On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2002, 2(2) : 67 -93.
  • 5Chalimourda A, Scholkopf B, Smola A. Experimentally optimal v in support vector regression for different noise models and parameter settings. Neural Networks, 2004, 17(1): 127-141.
  • 6Tan Y, Wang J. A support vector machine with a hybrid kernel and minimal vapnikchervonenkis dimension. IEEE Transactions on Knowledge and Data Engineering, 2004, 16 (4) : 385-395.
  • 7Chen Y-X, Wang J-Z. Support vector learning for fuzzy rulebased classification systems. IEEE Transactions on Fuzzy System, 2003, 11(6): 716- 728.
  • 8Browne M W. Cross-validation methods. Journal of Mathe matical Psychology, 2000, 44(1): 108- 132.
  • 9Sincich T. Business Statistics by Example. 5th Edition. New Jersey: Prentice Hall, 1996.
  • 10Nadeau C, Bengio Y. Inference for the generalization error. Machine Learning, 2003, 52(3): 239-281.

共引文献21

同被引文献188

引证文献23

二级引证文献114

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部