摘要
针对近年来时常出现的上市公司财务数据造假及暴雷情况,建立适用于中国市场不同行业的上市公司财务数据造假识别和预测模型.应用一系列数据分析与机器学习算法筛选出财务造假识别关键指标,并进行参数调优,最终选择决策树算法作为最佳的造假识别预测方法,其精确率在测试集中达0.949.同时将数据样本较少的行业所属聚类簇作为特征选择与造假识别,实现了不同行业上市公司财务造假识别和预测.
In this paper,aiming at the situation of financial data fraud and thunderstorm of listed companies from time to time in recent years,we establishes the identification,prediction and analysis model of financial data fraud of Listed Companies in different industries in the Chinese market.A series of data analysis and machine learning algorithms are used to screen the key indicators of financial fraud identification and optimize the parameters.Finally,the decision tree algorithm,the accuracy rate of which has been up to 0.949 in the test set,is selected as the best one for fraud identification and prediction.Clusters belonging to industries with few data samples are used as feature selection and fraud identification to realize financial fraud identification and prediction of companies in different industries.
作者
陆欣怡
郭佳怡
方博平
马丹
宋涛
LU Xinyi;GUO Jiayi;FANG Boping;MA Dan;SONG Tao(School of Science,Huzhou University,Huzhou 313000,China;Huzhou Key Laboratory of Data Modeling and Analysis,Huzhou 313000,China)
出处
《湖州师范学院学报》
2023年第4期14-20,共7页
Journal of Huzhou University
基金
国家自然科学基金项目(12271158)
浙江省自然科学基金项目(Z22A013952)
浙江省大学生科技创新活动计划项目新苗人才计划(2022R431A016)。
关键词
财务数据
造假识别
指标筛选
决策树算法
K-均值算法
financial data
faking identification
index screening
decision tree algorithm
K-means clustering