摘要
针对非均衡数据的情形,基于条件生成对抗网络(conditional generative adversarial networks,CGAN),利用梯度提升树研究了聚焦损失的CGAN的集成分类方法.该方法首先通过CGAN降低不均衡率,通过聚焦损失的权值均衡结合GBDT算法,适当增加对少数类样本的关注度进而进一步提升分类器的分类性能.对方法的性质进行了研究,获得了若干理论成果.证明了:在一定条件下,由CGAN产生的经验条件分布收敛于相应总体的条件分布;聚集损失的CGAN方法其经验风险收敛到期望风险;该方法的估计量会收敛到使得期望风险最小化的函数.实验结果显示了聚焦损失的CGAN方法具有良好的表现.
For the case of imbalanced data,an integrated classification method for CGAN-focal-loss was investigated based on conditional generative adversarial networks(CGAN)using gradient boosting trees.The method first reduces the imbalance rate by CGAN,and further improves the classification performance of the classifier by increasing the focus on a few classes of samples through the weight balancing of the focused loss combined with the GBDT algorithm.The properties of the method were investigated and several theoretical results were obtained.It was proved that the empirical conditional distribution generated by CGAN converges to the conditional distribution of the corresponding aggregate under certain conditions;that the empirical risk of the CGAN method with focused loss converges to the expected risk;and that the estimator of the method converges to the function that minimizes the expected risk.The experimental results show the good performance of the CGAN-focal-loss method.
作者
崔文泉
余厚莹
侯晓天
CUI Wenquan;YU Houying;HOU Xiaotian(Department of Statistics and Finance,School of Management,University of Science and of Technology of China,Hefei 230026,China)
基金
国家自然科学基金(71873128)资助
关键词
非均衡数据
条件生成对抗网络
聚焦损失
集成学习
imbalanced data
conditional generative adversarial networks(CGAN)
focal loss
ensemble learning