摘要
传统的离群点挖掘算法无法有效挖掘数据流中的离群点。针对数据流的无限输入和动态变化等特点,提出一种新的基于距离的数据流离群点挖掘算法。通过Hoeffding定理及独立同分布中心极限定理,对数据流概率分布变化进行动态检测,利用检测结果自适应调整滑动窗口大小对数据流离群点进行挖掘。实验结果表明,该算法在人工数据集和真实数据集KDD-CUP99中可以对数据流中的离群点进行有效挖掘。
The traditional algorithm of mining outliers cannot mine outliers in data stream effectively. Concerning the infinite input and dynamic change in data stream environment, a new algorithm for detecting data stream outliers based on distance was proposed. Change of data stream probability distribution was dynamically detected by Hoeffding theorem and independent identical distribution central limit theorem. Making use of detection outcome to self adaptation, sliding window size was adjusted to mine outliers in data stream. The experimental results show this algorithm can effectively mine data stream outliers in artificial data set and KDD-CUP99 date set.
出处
《计算机应用》
CSCD
北大核心
2010年第11期2949-2951,2973,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(60873037)