期刊文献+

基于邻域密度的异构数据局部离群点挖掘算法 被引量:7

Local Outlier Mining Algorithm for Heterogeneous Data Based on Neighborhood Density
在线阅读 下载PDF
导出
摘要 由于数据集规模、维数,以及复杂程度的不断提高,导致对其离群点的挖掘难度越来越大,提出了基于邻域密度的局部离群点挖掘算法。首先依据节点计算性能对高维数据进行区域分割,通过各个维度的数据分布来评价区域分割的效果。然后采取核密度来描述局部密度,根据高斯分布得到数据出现次数,进一步计算出数据邻域密度。再由邻域及密度关系计算得到各数据离群度,从而判断异构数据中的离群点。最后针对可能存在的离群误判情况,采取离群分数计算,为增强此过程的检测性能,利用权重进行剪枝处理。人工与UCI数据集上的仿真结果表明,当数据量和数据维数改变时,算法对离群点挖掘的准确度几乎不受影响,挖掘时间和覆盖率指标也显著优于其它方法;同时对于不同类型和复杂度的异构数据,算法仍然保持良好的挖掘准确度和效率。 As the increasing of the size, dimension and complexity of data sets, it is more and more difficult to mine outliers. Therefore, a local outlier mining algorithm based on neighborhood density is proposed. Firstly, the high-dimensional data was segmented according to the node computing performance, and the effect of region segmentation was evaluated by the data distribution of each dimension. Then the kernel density was used to describe the local density, and the occurrence times of the data were obtained according to the Gaussian distribution, and the data neighborhood density was further calculated. Then the outlier degree of each data was calculated by neighborhood and density relationship, so as to judge the outlier in heterogeneous data. Finally, in view of the possible outlier misjudgment, the outlier score was calculated. In order to enhance the detection performance of this process, pruning was processed by weight. Simulation results on the datasets of artificial and UCI show that, when the amount of data and the dimension of data change, the accuracy of outlier mining is hardly affected, and mining time and coverage index are also significantly better than other methods;At the same time, for different types and complexity of heterogeneous data, the algorithm still maintains good accuracy and efficiency.
作者 王晓辉 宋学坤 王晓川 WANG Xiao-hui;SONG Xue-kun;WANG Xiao-chuan(Henan University of Chinese Medicine,Zhengzhou Henan 450046,China;Zhengzhou University,Zhengzhou Henan 450001,China)
出处 《计算机仿真》 北大核心 2021年第7期281-285,共5页 Computer Simulation
基金 国家自然基金青年项目(61702164,81703946) 河南省科技攻关计划项目(172102310535) 河南省高等学校青年骨干教师培养计划(2020GGJS104)。
关键词 离群点挖掘 区域分割 邻域密度 异构数据 离群分数 Outlier mining Region segmentation Neighborhood density Heterogeneous data Outlier fraction
  • 相关文献

参考文献5

二级参考文献24

共引文献31

同被引文献75

引证文献7

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部