不均衡数据情形的基于聚焦损失的CGAN的集成分类方法

Focused loss-based for imbalanced data scenarios integrated classification methods for CGAN

下载PDF

导出

摘要针对非均衡数据的情形,基于条件生成对抗网络(conditional generative adversarial networks,CGAN),利用梯度提升树研究了聚焦损失的CGAN的集成分类方法.该方法首先通过CGAN降低不均衡率,通过聚焦损失的权值均衡结合GBDT算法,适当增加对少数类样本的关注度进而进一步提升分类器的分类性能.对方法的性质进行了研究,获得了若干理论成果.证明了:在一定条件下,由CGAN产生的经验条件分布收敛于相应总体的条件分布;聚集损失的CGAN方法其经验风险收敛到期望风险;该方法的估计量会收敛到使得期望风险最小化的函数.实验结果显示了聚焦损失的CGAN方法具有良好的表现. For the case of imbalanced data,an integrated classification method for CGAN-focal-loss was investigated based on conditional generative adversarial networks(CGAN)using gradient boosting trees.The method first reduces the imbalance rate by CGAN,and further improves the classification performance of the classifier by increasing the focus on a few classes of samples through the weight balancing of the focused loss combined with the GBDT algorithm.The properties of the method were investigated and several theoretical results were obtained.It was proved that the empirical conditional distribution generated by CGAN converges to the conditional distribution of the corresponding aggregate under certain conditions;that the empirical risk of the CGAN method with focused loss converges to the expected risk;and that the estimator of the method converges to the function that minimizes the expected risk.The experimental results show the good performance of the CGAN-focal-loss method.

作者崔文泉余厚莹侯晓天 CUI Wenquan;YU Houying;HOU Xiaotian(Department of Statistics and Finance,School of Management,University of Science and of Technology of China,Hefei 230026,China)

机构地区中国科学技术大学管理学院统计与金融系

出处《中国科学技术大学学报》 CAS CSCD 北大核心 2020年第7期968-976,共9页 JUSTC

基金国家自然科学基金(71873128)资助

关键词非均衡数据条件生成对抗网络聚焦损失集成学习 imbalanced data conditional generative adversarial networks(CGAN) focal loss ensemble learning

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献3

1赵海霞,石洪波,武建,陈鑫.基于条件生成对抗网络的不平衡学习研究[J].控制与决策,2021,36(3):619-628. 被引量：9
2莫赞,盖彦蓉,樊冠龙.基于GAN-AdaBoost-DT不平衡分类算法的信用卡欺诈分类[J].计算机应用,2019,39(2):618-622. 被引量：24
3李诒靖,郭海湘,李亚楠,刘晓.一种基于Boosting的集成学习算法在不均衡数据中的分类[J].系统工程理论与实践,2016,36(1):189-199. 被引量：60

二级参考文献35

1郑恩辉,李平,宋执环.代价敏感支持向量机[J].控制与决策,2006,21(4):473-476. 被引量：35
2毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量：95
3Searle S R. Linear models for unbalanced data[M]. Wiley, 1987.
4Chawla N V, Bowyer K W, Hall L O. SMOTE: Synthetic minority over-sampling technique[J]. Artificial Intelli- gence Research, 2002, 16(3): 321-357.
5Lee C Y, Lee Z J. A novel algorithm applied to classify unbalanced data[J]. Applied Soft Computing, 2012(12): 2481 2485.
6Schapire R E. The strength of weak learnability[J]. Machine Learning, 1990, 5(2): 197-227.
7Breiman L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
8Galar M, Fern~ndez A, Barrenechea E. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 2012,42(4): 463-484.
9Nitesh V C, Aleksandar L, Lawrence O H, et al. SMOTEBoost: Improving prediction of the minority class in boosting[C]// 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2003, 107-119.
10Zhou Z H, Liu X Y. Training cost-sensitive neural networks with methods addressing the class imbalance prob- lem[J]. IEEE Transactions on Knowledge and Data Engineering, 2006: 63-77.

共引文献89

1张钊光,蒋庆磊,詹瑜滨,侯修群,郑英,崔运佳.基于VAE-GAN数据增强算法的小样本滚动轴承故障分类方法[J].原子能科学技术,2023,57(S01):228-237. 被引量：18
2姜新盈,王舒梵,严涛.基于层次密度聚类的去噪自适应混合采样[J].计算机系统应用,2022,31(10):206-210. 被引量：1
3严嘉钰,贝世之,章乐.基于VAE-GAN算法的信用卡欺诈检测模型[J].北京电子科技学院学报,2022,30(4):70-81.
4徐畅,丁俊琦,赵聃桐,乔岩,张领先.基于LightGBM和处方数据的番茄病害诊断方法[J].农业机械学报,2022,53(9):286-294. 被引量：11
5刘允峰,佟季萱,叶应图.动态数据流集成分类算法综述[J].渤海大学学报（自然科学版）,2023,44(1):79-91. 被引量：2
6靳燕,姚悦.Boosting方法在网络攻击分类中的性能分析[J].网络空间安全,2016,7(6):25-28. 被引量：4
7严玥,严实,杨永斌,江赟.Adaboost集成BP神经网络在火电厂SO_2浓度检测中的应用[J].传感器与微系统,2016,35(9):148-151. 被引量：4
8熊魏.基于boosting算法的新闻文本分类研究[J].电子技术与软件工程,2017(8):174-175. 被引量：1
9杜红乐,张燕.基于Tri-training直推式支持向量机算法[J].河南科学,2017,35(7):1032-1036.
10杜红乐,张燕.基于拆分集成的不均衡数据分类算法[J].计算机系统应用,2017,26(8):223-226. 被引量：1

1Zhenming Yu,Zhenyu Ju,Xinlei Zhang,Ziyi Meng,Feifei Yin,Kun Xu.High-speed multimode fiber imaging system based on conditional generative adversarial network[J].Chinese Optics Letters,2021,19(8):1-5. 被引量：7
2程一元,查星星,张永全.随机加速梯度算法的回归学习收敛速度[J].数学物理学报（A辑）,2021,41(5):1574-1584. 被引量：1

中国科学技术大学学报

2020年第7期

浏览历史

内容加载中请稍等...

不均衡数据情形的基于聚焦损失的CGAN的集成分类方法

参考文献3

二级参考文献35

共引文献89

相关作者

相关机构

相关主题

浏览历史