期刊文献+

Evolutionary under-sampling based bagging ensemble method for imbalanced data classification 被引量:12

Evolutionary under-sampling based bagging ensemble method for imbalanced data classification
原文传递
导出
摘要 In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel under- sampling technique has been successfully applied in searching for the best majority class subset for training a good- performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance. In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel under- sampling technique has been successfully applied in searching for the best majority class subset for training a good- performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2018年第2期331-350,共20页 中国计算机科学前沿(英文版)
基金 Acknowledgements We would like to express our gratitude to both the associate editor and the anonymous reviewers for their constructive comments that improved the quality of our manuscript to a large extent. This work was supported by the National Natural Science Foundation of China (Grant No.61501229) and the Fundamental Research Funds for the Central Universities (NS2015091, NS2014067, NJ20160013).
关键词 class imbalanced problem UNDER-SAMPLING BAGGING evolutionary under-sampling ensemble learning machine learning data mining class imbalanced problem, under-sampling, bagging, evolutionary under-sampling, ensemble learning, machine learning, data mining
  • 相关文献

参考文献3

二级参考文献78

  • 1Tan C, Lee L, Tang J, Jiang L, Zhou M, Li E User-level sentiment anal- ysis incorporating social networks. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2011, 1397-1405.
  • 2Beineke P, Hastie T, Manning C, Vaithyanathan S. Exploring senti- ment summarization. In: Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text: Theories and Applications. 2004.
  • 3Pang B, Lee L, Vaithyanathan S. Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the 2002 Con- ference on Empirical Methods in Natural Language Processing. 2002, 79-86.
  • 4Cardie C, Wiebe J, Wilson T, Litman D J. Combining low-level and summary representations of opinions for multi-perspective question answering. In: Proceedings of New Directions in Question Answering. 2003, 20--27.
  • 5Dave K, Lawrence S, Pennock D M. Mining the peanut gallery: Opin- ion extraction and semantic classification of product reviews. In: Pro- ceedings of the 12th International World Wide Web Conference. 2003, 519-528.
  • 6Kim S M, Hovy E H. Automatic identification of pro and con reasons in online reviews. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the As- sociation for Computational Linguistics. 2006.
  • 7Socher R, Pennington J, Huang E H, Ng A Y, Manning C D. Semi- supervised recursive autoencoders for predicting sentiment distribu- tions. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011,151-161.
  • 8Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th An- nual Meeting of the Association for Computational Linguistics: Hu- man Language Technologies. 2011, 142-150.
  • 9Turney P D. Thumbs up or thumbs down? Semantic orientation ap- plied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguis- tics. 2002, 417-424.
  • 10Li J, Zheng R, Chen H. From fingerprint to writeprint. Communica- tions of the ACM, 2006, 49(4): 76-82.

共引文献8

同被引文献53

引证文献12

二级引证文献140

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部