摘要
针对加权信息熵异常检测算法在高维数据检测中存在精度无法保证的问题。本文针对高维数据提出了一种基于信息增益的异常检测算法。首先运用信息增益方法结合Top-k算法选取待检测数据集前M个属性用于降维;其次,选取两点直径距离尽量远的K个中心点的K-means算法进行聚类降低迭代次数,最后得到一个高维数据的异常检测算法。实验结果表明,在数据维度删减的情况下,比加权信息熵算法的召回率与精确率分别提高53.65%和29.49%。能够有效的检测出高维数据中的异常点。
Aiming at the problem that the accuracy of weighted information entropy anomaly detection algo-rithm cannot be guaranteed in high-dimensional data detection.This paper proposes an anomaly detection algorithm based on information gain for high-dimensional data.Firstly,Top-k algorithm combined with information gain method is used to select the first m attributes of the data set to be detected for dimensionality reduction;Secondly,K-means algorithm is used to cluster the K centers whose diameters are as far away as possible to reduce the number of iterations.Finally,an anomaly detection algorithm for high-dimensional data is obtained.The experimental results show that the recall rate and accuracy rate are improved by 53.65%and 29.49%respectively compared with weighted information entropy algorithm in the case of data dimension deletion.It can effectively detect outliers in high-dimensional data.
作者
陈晓
阎少宏
葛子轩
史冰冰
CHEN Xiao;YAN Shao-hong;GE Zi-xuan;SHI Bing-bing(College School of Science,North China University of Science and Technology,Hebei Tangshan 063210,China;Hebei Province Key Laboratory of Data Science and Application,Hebei Tangshan 063210,China;Tangshan Key Laboratory of Data Science,Hebei Tangshan 063210,China;College of Electrical Engineering,North China University of Science and Technology,Hebei Tangshan 063210,China;School of Artificial Intelligence,North China University of Science and Technology,Hebei Tangshan 063210,China)
出处
《新一代信息技术》
2021年第18期1-4,20,共5页
New Generation of Information Technology
基金
模糊数学(项目编号:KCJS2020053)。