摘要
为精准识别数据异常点,提高对多维数据的利用率,基于改进聚类算法的应用,以多维数据为例,开展异常点挖掘方法的设计研究。获取数据挖掘中的自相关函数,结合数据之间的相似数据,进行其属性分布排列处理,提取多维数据关联特征;将提取到的多维数据关联特征进行语义规则聚类,突出多维数据中的异常点;引进基于云分段熵值的辨识方法,基于极值(多维数据的最大值、最小值)挖掘异常数据点。对比实验结果表明,设计的方法不仅可以挖掘多维数据异常点,还能在确保多维数据异常点特征数量挖掘与实际值高度一致的基础上,确保数据挖掘结果不会受到数据量增加的影响。
In order to achieve accurate identification of data outliers and improve the utilization of multidimensional data,based on the application of improved clustering algorithms,this study focuses on the design and research of outlier mining methods using multidimensional data as an example.Obtain autocorrelation functions in data mining,combine similar data between them,arrange their attribute distributions,and extract multidimensional data association features;Perform semantic rule clustering on the extracted multidimensional data associated features to highlight outliers in the multidimensional data;Introduce an identification method based on cloud segmentation entropy value,and use extreme values(maximum and minimum values of multidimensional data)to mine abnormal data points.The comparative experimental results show that the designed method can not only achieve the mining of multi-dimensional data outliers,but also ensure that the feature quantity of multidimensional data outliers is highly consistent with the actual values,and ensure that the data mining results are not affected by the increase in data volume.
作者
武江毅
WU Jiangyi(Sichuan Vocational and Technical College of Chemical Industry,Luzhou,Sichuan 646300,China)
出处
《智能物联技术》
2025年第4期50-55,共6页
Technology of Io T& AI
关键词
改进聚类算法
关联特征提取
挖掘方法
特征聚类
异常点
多维数据
improved clustering algorithm
association feature extraction
mining methods
feature clustering
outlier points
multidimensional data