信用风险不平衡数据的表格生成对抗网络优化与分类

Table Generation Adversarial Network Optimization and Classification of Credit Risk Imbalance Data

下载PDF

导出

摘要人工智能在信用风险评估中能有效识别风险并提升决策效率,然而,现有信用风险数据普遍存在类别不平衡问题,导致模型在预测时偏向多数类,影响评估的准确性和可靠性。针对数据不平衡问题,提出一种融合变分自编码器(VAE)和条件表格生成对抗网络(CTGAN)的混合生成模型(VCTGAN),用于合成高质量平衡数据集。通过VAE中的隐变量学习真实数据的关键特征和潜在分布,生成结构化隐变量作为原始CTGAN的输入;在数据生成器中引入自注意力机制用于更好地捕捉不平衡数据的突出特征;在判别器中加入对比损失模块来增强生成数据的类别间差异,达到提高生成数据质量的目的。通过在Taiwan Credit和Give Me Some Credit两个基准数据集上的系统实验验证,分别取得了89.91%和96.89%的最佳分类准确率,结果表明这种改进方法在处理信用数据不平衡方面明显优于传统方法。消融实验进一步验证了各组件对性能的贡献,证实了所提方法的合理性和有效性。它不仅生成高质量的平衡数据集,而且提高模型识别少数类别的能力,为解决金融领域的数据不平衡问题提供了新的技术方案。 Artificial intelligence can effectively identify risks and improve decision-making efficiency in credit risk assessment.However,the existing credit risk data generally suffer from the category imbalance problem,which causes the model to be biased toward the majority of categories in prediction and affects the accuracy and reliability of assessment.To address the data imbalance problem,a hybrid generative model(VCTGAN)incorporating variational autoencoder(VAE)and conditional tabular generative adversarial network(CTGAN)is proposed for synthesizing highly balanced datasets.First,the key features and potential distributions of real data are learnt through the hidden variables in VAE to generate structured hidden variables as inputs to the original CTGAN.Then,a self-attention mechanism is introduced into the data generator for better capturing the salient features of the imbalanced data.Finally,a contrast loss module is added into the discriminator to enhance the inter-category differences of the generated data for the purpose of improving the generated data.Through systematic experimental validation on two benchmark datasets,Taiwan Credit and Give Me Some Credit,89.91%and 96.89%classification accuracies are achieved,respectively,and the results show that this improvement is significantly better in dealing with credit data imbalance.The ablation experiments further validate the contribution of each component to the performance and confirm the rationality and effectiveness of the proposed model.It not only generates high balanced datasets,but also improves the model??s ability to recognize the minority of categories,which provides a new technical solution to solve the data imbalance problem in the financial field.

作者王轶群王笑高燕程 WANG Yiqun;WANG Xiao;GAO Yancheng(School of Artificial Intelligence,Gansu University of Political Science and Law,Lanzhou 730070,China)

机构地区甘肃政法大学人工智能学院

出处《计算机科学与探索》北大核心 2026年第2期561-573,共13页 Journal of Frontiers of Computer Science and Technology

基金 2025年甘肃省高校青年博士支持项目(2025QB-076) 甘肃省高等学校创新基金(2023B-118) 甘肃政法大学校级科研创新项目(GZF2022XZD07) 高等学校产业支撑计划项目(CYZC-2024-24)。

关键词条件表格生成对抗网络(CTGAN) 生成模型不平衡数据集机器学习信用风险评估 conditional tabular generative adversarial network(CTGAN) generative modeling imbalanced datasets machine learning credit risk assessment

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献1

1肖曼君,欧缘媛,李颖.我国P2P网络借贷信用风险影响因素研究——基于排序选择模型的实证分析[J].财经理论与实践,2015,36(1):2-6. 被引量：82

二级参考文献14

1刘莉亚.商业银行个人信贷信用评分模型的构建与应用[J].财经研究,2007,33(2):26-36. 被引量：14
2Devin G. Pope, Justin R. Sydnor J. R. What's in a picture? evi dence of discrimination from prosper[J]. Journal of Human Re sources,2011,(46): 53 -92.
3Michaels,J. Do unverifiable disclosures matters? evidence from peer to peer lending[J]. The Accounting Review, 2012,87 (4): 36 -41.
4Michael Klafft . Peer to Peer Lending: Auctioning Microcredits over the Internet[A]. Proceedings of the International Confer ence on Information Systems[J]. Technology and Management, 2008,(2) : 1-8.
5Eunkyoung Lee, Byungtae Lee, Myungsin Chae. Herding be havior in online P2P lending: an empirical investigation [J]. Journal Electronic Commerce Research and Applications, 2012, 11(5):495- 503.
6Freedman & Jin, Do social networks solve information problems for peer to peer lending[J]. Evidence from Prosper. com. NET Institute Working Paper No, 2008, (2) :08 -43.
7Lin M, N. Prabhala and S. Viswanathan . Judging borrowers by the company they keep:friendship networks and information asymmetry in online peer-to- peer lending[J]. Management Sci ence,2013,59(1) : 17- 35.
8Singh, Gopal, Li. Risk and return of investments in online peer-to -peer lending [ R]. Harpreet Singh, Ram Gopal, Xinxin Li, Universtiy of Texas, 2008.
9Iyer, R. , Khwaja, A. I. , Luttmer, E. F. P. , &. Shue, K. Screening in new credit markets can individual lenders Infer bor rower creditworthiness in peer-to-peer lending [J ]. Manage- ment. Cambridge, MA. 2009,(3):4 -10.
10李爱君.民间借贷网络平台法律制度的完善[J].福州大学学报（哲学社会科学版）,2011,25(6):107-113. 被引量：28

共引文献81

1曾鸣,谢佳.互联网金融个人信用风险评估的指标选择方法[J].时代金融,2019,0(33):6-9. 被引量：5
2谭中明,谢坤,彭耀鹏.基于梯度提升决策树模型的P2P网贷借款人信用风险评测研究[J].软科学,2018,32(12):136-140. 被引量：18
3蔡友兰.基于互联网的中小企业融资探析[J].中国市场,2015(25):95-95.
4惠炜.中国互联网金融研究的方向探寻[J].西北大学学报（哲学社会科学版）,2015,45(4):44-50. 被引量：4
5刘文朝,陈辰,刘敏.影子银行(民间借贷类)风险分析与模型[J].西南大学学报（自然科学版）,2015,37(10):99-105. 被引量：1
6李昌荣,胡斐斐,毛顺标.借款人在P2P小额贷款市场中的信用行为——基于博弈论的分析[J].南方金融,2015(9):28-34. 被引量：12
7于晗.P2P互联网借贷简述及其信用风险的防控建议解读[J].现代商业,2016(1):51-52.
8姚凤阁,隋昕.P2P网络借贷平台借款人信用风险影响因素研究——来自“拍拍贷”的经验依据[J].哈尔滨商业大学学报（社会科学版）,2016(1):3-10. 被引量：28
9于晓虹,楼文高.基于随机森林的P2P网贷信用风险评价、预警与实证研究[J].金融理论与实践,2016(2):53-58. 被引量：51
10鲁钊阳.P2P网络借贷能解决农户贷款难问题吗?[J].中南财经政法大学学报,2016(2):149-156. 被引量：18

1王磊,丁一新,蔡清远,宋浒,吉兰芳.基于CTAB-GAN+与LightGBM的国产桌面终端安全配置优化方法[J].电气自动化,2025,47(6):77-78.
2王美玲,张恒,赵星宇,张玉蕾,朱睿,张高魁.一项基于数字孪生技术的非小细胞肺癌虚拟临床试验[J].中国卫生统计,2025,42(6):885-887.
3Yu Xiaodong.Contingency or Provocation?[J].China Weekly,2026(1):14-17.
4于思萌.基于PSO-SVM模型的工程项目施工质量风险预测[J].散装水泥,2026(1):176-178.
5陶宁宁,吉梦丽,曹利娜,缑豪兵,黄立辉.大气细颗粒物诱发心肌梗死及心力衰竭的风险分析[J].环境卫生学杂志,2026,16(1):39-50.
6陈翠萍,王志琴,余红梅,周朝彬,王景燕.基于优化的MaxEnt模型的合肥乡土花椒适生区预测[J].生态学杂志,2026,45(1):276-283.
7Qiu Guangyu.On the Record[J].China Weekly,2026(1):50-53.
8邵育群.“台湾保证实施法”在美国国会由慢转快完成立法程序[J].世界知识,2026(1):42-43.
9黎锐烽,黄国泳,刘颖,黄杰辉.基于红外传感远程监控的电力系统发热风险自动感知[J].传感技术学报,2026,39(1):221-226.
10景博,刘垚,李福生,孙燕玲.基于随机森林模型的宁夏枸杞适生区分布研究[J].江西农业,2026(2):21-23.

计算机科学与探索

2026年第2期

浏览历史

内容加载中请稍等...

信用风险不平衡数据的表格生成对抗网络优化与分类

参考文献1

二级参考文献14

共引文献81

相关作者

相关机构

相关主题

浏览历史