摘要
传统的信用评分模型主要基于有监督学习(supervised learning)方法,但是,在实际的贷款问题中,有标记样本信息的获取往往成本较高、难度较大、周期较长,而无标记样本信息则大量存在.为了能在建模中充分利用无标记样本信息,本文提出了一种基于半监督广义可加(semi-supervised generalized additive,SSGA) Logistic回归的信用评分模型.该模型不但能处理线性不可分问题,也能同时利用有标记与无标记样本信息,并同步实现模型参数的估计和显著变量的选择.通过模拟实验表明,所提出的模型在外推预测和变量选择方面的表现均显著优于有监督模型.最后,将该模型应用于个人信用贷款违约风险的评估中.
The traditional credit scoring model is mainly based on the supervised learning method.However,in the actual loan problem,the labeled sample information is often acquired at a higher cost and longer cycle,while the unlabeled sample information is abundant.In order to make full use of the information of unlabeled samples in modeling,this paper proposes a credit scoring model based on semi-supervised generalized additive(SSGA) logistic regression.The model can not only use both labeled and unlabeled sample information,but also realize the estimation of model parameters and the selection of significant variables.The simulation experiments show that the proposed model has significantly better performance in extrapolation prediction and variable selection than the supervised model.Finally,the model is applied to the assessment of personal credit loan default risk.
作者
方匡南
陈子岚
FANG Kuangnan;CHEN Zilan(Department of Statistics,School of Economics,Xiamen University,Xiamen 361000,China;The MOE Key Laboratory of Econometrics(Xiamen University),Xiamen 361000,China;Student Affairs Office,Xiamen University,Xiamen 361000,China)
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2020年第2期392-402,共11页
Systems Engineering-Theory & Practice
基金
国家自然科学基金面上项目(71471152)
中央高校基本科研业务费专项资金(20720171095,20720181003)
国家统计局重大项目(2019LD02)。
关键词
半监督
广义可加logistic回归
信用评分
无标记样本
semi-supervised learning
generalized additive logistic regression
credit scoring
unlabeled sample