摘要
人工智能在信用风险评估中能有效识别风险并提升决策效率,然而,现有信用风险数据普遍存在类别不平衡问题,导致模型在预测时偏向多数类,影响评估的准确性和可靠性。针对数据不平衡问题,提出一种融合变分自编码器(VAE)和条件表格生成对抗网络(CTGAN)的混合生成模型(VCTGAN),用于合成高质量平衡数据集。通过VAE中的隐变量学习真实数据的关键特征和潜在分布,生成结构化隐变量作为原始CTGAN的输入;在数据生成器中引入自注意力机制用于更好地捕捉不平衡数据的突出特征;在判别器中加入对比损失模块来增强生成数据的类别间差异,达到提高生成数据质量的目的。通过在Taiwan Credit和Give Me Some Credit两个基准数据集上的系统实验验证,分别取得了89.91%和96.89%的最佳分类准确率,结果表明这种改进方法在处理信用数据不平衡方面明显优于传统方法。消融实验进一步验证了各组件对性能的贡献,证实了所提方法的合理性和有效性。它不仅生成高质量的平衡数据集,而且提高模型识别少数类别的能力,为解决金融领域的数据不平衡问题提供了新的技术方案。
Artificial intelligence can effectively identify risks and improve decision-making efficiency in credit risk assessment.However,the existing credit risk data generally suffer from the category imbalance problem,which causes the model to be biased toward the majority of categories in prediction and affects the accuracy and reliability of assessment.To address the data imbalance problem,a hybrid generative model(VCTGAN)incorporating variational autoencoder(VAE)and conditional tabular generative adversarial network(CTGAN)is proposed for synthesizing highly balanced datasets.First,the key features and potential distributions of real data are learnt through the hidden variables in VAE to generate structured hidden variables as inputs to the original CTGAN.Then,a self-attention mechanism is introduced into the data generator for better capturing the salient features of the imbalanced data.Finally,a contrast loss module is added into the discriminator to enhance the inter-category differences of the generated data for the purpose of improving the generated data.Through systematic experimental validation on two benchmark datasets,Taiwan Credit and Give Me Some Credit,89.91%and 96.89%classification accuracies are achieved,respectively,and the results show that this improvement is significantly better in dealing with credit data imbalance.The ablation experiments further validate the contribution of each component to the performance and confirm the rationality and effectiveness of the proposed model.It not only generates high balanced datasets,but also improves the model??s ability to recognize the minority of categories,which provides a new technical solution to solve the data imbalance problem in the financial field.
作者
王轶群
王笑
高燕程
WANG Yiqun;WANG Xiao;GAO Yancheng(School of Artificial Intelligence,Gansu University of Political Science and Law,Lanzhou 730070,China)
出处
《计算机科学与探索》
北大核心
2026年第2期561-573,共13页
Journal of Frontiers of Computer Science and Technology
基金
2025年甘肃省高校青年博士支持项目(2025QB-076)
甘肃省高等学校创新基金(2023B-118)
甘肃政法大学校级科研创新项目(GZF2022XZD07)
高等学校产业支撑计划项目(CYZC-2024-24)。