摘要
当前上市公司信用风险数据所呈现出的高维度以及高相关性的特点严重影响了信用风险模型的准确性。为此本文结合已有算法以及信用风险模型的特点设计了一种新的基于非参数的变量选择方法。通过该方法对上市公司用风险相关变量进行分析筛选可以消除数据集中包含的噪声变量以及线性相关变量。本文同时还针对该方法设计了高变量维度下最优解求解算法。文章以Logistic模型为例对上市公司信用风险做了实证分析,研究结果表明与以往的变量选择方法相比该方法可以有效的降低数据维度,消除变量间的相关性,并同时提高模型的可靠性和预测精度。
The character of high dimension and high correlation of the credit risk data set has been considered as a serious effect on the model accuracy. Considering the demand of credit risk model and existing variable selection algorithm, this paper designs a new non-parametrical method for the variable selection, with which the noise and collinear variables are excluded from the original data set. This article also proposes a "forward and backward" algorithm to find the optimal solution for the new variable selection method. In this paper the Logistic regression model is used as an example in the empirical analysis. The result shows that comparing with other variable selection methods, the proposed method can not only reduce the data dimension and remove the collinear variables but also make the model more precise and reliable.
出处
《数理统计与管理》
CSSCI
北大核心
2012年第6期1117-1124,共8页
Journal of Applied Statistics and Management
基金
国家自然科学基金青年科学基金项目(71001095)
高等学校博士学科点专项科研基金(20103402120010)