摘要
孤立点检测是一项有价值的、重要的知识发现任务。在对大规模数据集中的孤立点数据进行检测时,样本数据集的选择技术至关重要。本文提出了一种新的基于密度的偏差抽样技术作为数据约简的手段,并给出了基于密度偏差抽样的孤立点检测算法,该算法可以用来识别样本数据集低密度区域中的孤立点数据,并从理论和实验两个方面对其进行分析评估,分析与实践证明该算法是有效的。
Outlier detection is a meaningful and important knowledge discovery task. The choice of sampling data set is very important during the process of outlier detection in large data sets. We propose a new density biased sampling as a data reduction technique to speed up the operation of outlier detection in large data sets, and introduce an algorithm based on density biased sampling. The algorithm can identify outliers of the sparse region. Finally, by evaluating the proposed method and presenting a experimental evaluation, we verify the utility of our approach.
出处
《计算机科学》
CSCD
北大核心
2004年第10期206-208,共3页
Computer Science
基金
重庆市教委资助项目(030201)