摘要
数据分布特征往往会影响模型的划分结果,聚类分析是获取数据分布情况的有效方法。文章首先比较硬划分聚类、模型聚类、模糊聚类三种聚类算法,寻找适合于信用数据分析的方法;同时,根据变量集中趋势和离散程度利用拉依达准则对数据进行预处理,采用遗传算法优化模型参数,提出适合于信用评价数据分布特征的GAσFCM算法。所提算法在分类精度上相比于传统FCM算法提高近3个百分点,同时为了避免非均衡样本对聚类算法产生的影响,对正样本选取有一定倾向性。算法对比分析表明,基于GAσFCM模糊聚类算法适用于信用风险评估特征分布,有效提高信用风险评估的准确性,动态灵敏的捕捉上市公司的信用变化,是信用风险管理和控制方法的有益补充。
Data distribution characteristics often affect the segmentation results of the model,and cluster analysis is an effective method to obtain data distribution.Firstly,this paper makes comparisons on three clustering algorithms:hard partition clustering,model clustering and fuzzy clustering to find suitable methods for analyzing credit data.In addition,according to the central tendency of variables and the degree of dispersion,the paper uses the rajda criterion to process the data,adopts genetic algorithm to optimize model parameters,and proposes a GAσFCM algorithm suitable for the distribution characteristics of credit evaluation data.The proposed algorithm improves the classification accuracy by nearly 3 percentage points compared with the traditional FCM algorithm.At the same time,in order to avoid the influence of unbalanced samples on the clustering algorithm,the paper takes a certain tendency to select positive samples.The comparative analysis of the algorithm shows that the GAσFCM fuzzy clustering algorithm is applicable to the distribution of credit risk assessment features,effectively improving the accuracy of credit risk assessment,and that dynamic and sensitive capture of the credit changes of listed companies is a beneficial supplement to credit risk management and control methods.
作者
刘颖
唐毓蔓
Liu Ying;Tang Yuman(School of Management Science and Information Engineering,Jilin University of Finance and Economics,Changchun 130117,China)
出处
《统计与决策》
CSSCI
北大核心
2020年第2期34-38,共5页
Statistics & Decision
基金
国家自然科学基金青年项目(61402193)
吉林省科技厅自然基金资助项目(20180101337JC)
吉林省社会科学基金资助项目(2019B67)
吉林省高校重点实验室开放基金资助项目(201702)。
关键词
模糊聚类
正态分布
噪声
遗传算法
信用风险
fuzzy clustering
normal distribution
noise data
genetic algorithm
credit risk