期刊文献+

基于距离的数据流离群点挖掘算法 被引量:3

Algorithm for mining data stream outliers based on distance
在线阅读 下载PDF
导出
摘要 传统的离群点挖掘算法无法有效挖掘数据流中的离群点。针对数据流的无限输入和动态变化等特点,提出一种新的基于距离的数据流离群点挖掘算法。通过Hoeffding定理及独立同分布中心极限定理,对数据流概率分布变化进行动态检测,利用检测结果自适应调整滑动窗口大小对数据流离群点进行挖掘。实验结果表明,该算法在人工数据集和真实数据集KDD-CUP99中可以对数据流中的离群点进行有效挖掘。 The traditional algorithm of mining outliers cannot mine outliers in data stream effectively. Concerning the infinite input and dynamic change in data stream environment, a new algorithm for detecting data stream outliers based on distance was proposed. Change of data stream probability distribution was dynamically detected by Hoeffding theorem and independent identical distribution central limit theorem. Making use of detection outcome to self adaptation, sliding window size was adjusted to mine outliers in data stream. The experimental results show this algorithm can effectively mine data stream outliers in artificial data set and KDD-CUP99 date set.
出处 《计算机应用》 CSCD 北大核心 2010年第11期2949-2951,2973,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60873037)
关键词 数据流 离群点 Hoeffding定理 滑动窗口 data stream outlier Hoeffding theorem sliding window
  • 相关文献

参考文献11

  • 1BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local oudier [ C]// Proceeding ACM SIGMOD'00 International Conference on Management of Data. Dalles, TEXAS: ACM, 2000: 93- 104.
  • 2ZHANG T, RAMAKRISHNAN R, LIVNY M. BIRCH: An efficient data clustering method for very large databases [ J]. ACM SIGMOD Record, 1996, 25(2): 103-114.
  • 3JOHNSON T, KWOK I, NG R. Fast computation of 2-dimensional depth contours [ C] // Proceedings of the fourth International Conference on Discovery and Data Mining. New York: AAAI, 1998:224 - 228.
  • 4KNORR E M, NG R T. Algorithms for mining distance-based outliers in large datasets [ C]// Proceedings of the 24th VLDB Conference. San Francisco: Morgan Kaufmann, 1998:392-403.
  • 5HAWKINS D. Identification of outliers [ M]. London: Chapman and Hall, 1980.
  • 6YAMANISHI K, TAKEUCHI J. A unifying framework for detecting outliers and change points from non-stationary time series data [ C]// Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2002:676-681.
  • 7MUTHUKRISHNAN S, SHAH R, vITrER J S. Mining deviants in time series data streams [ C]//Proceedings of the 16th International Conference on Scientific and Statistical Database Management. Washington, DC: IEEE Computer Society, 2004:41-51.
  • 8HAN F, WANG Y M, WANG H P. Odabk: An effective approach to detecting outlier in data stream [ C]//Proceedings of the Fifth International Conference on Machine Learning and Cybernetics. Washington, DC: IEEE, 2006:1036 - 1041.
  • 9周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量:21
  • 10HULTEN G, SPENCER L, DOMIGOS P. Mining time-changing data streams [ C]// Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2001 : 97 - 106.

二级参考文献5

共引文献45

同被引文献30

  • 1黄洪宇,林甲祥,陈崇成,樊明辉.离群数据挖掘综述[J].计算机应用研究,2006,23(8):8-13. 被引量:43
  • 2倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法[J].计算机研究与发展,2006,43(9):1639-1643. 被引量:20
  • 3常建龙,曹锋,周傲英+.基于滑动窗口的进化数据流聚类[J].软件学报,2007,18(4):905-918. 被引量:61
  • 4KNORR E M, NG R T. Algorithms for Mining Distance-based Outliers in Large Datasets[ C ]. New York:Proceedings of International Confe- nence on Very Large Data-bases( VLDB' 98 ). New York:[ s. n. ]. 1998:392-403.
  • 5BREUNIG M M,KRIEGEL H,NG R T,et al. LOF:identifying densi- ty-based local outliers : proceedings of 2000 ACM SIGMOD Interna- tional Conference on Management of Data, New York, 2000 [ C ]. New York : ACM Press,2000:93-104.
  • 6HAN Jiawei, KAMBER M. Data mining: concepts and techniques [ M ]. 2nd edition. San Francisco: Morgan Kaufmann Publishers, 2006.
  • 7AGARWAL C, HAN Jiawei, WANG Jianyong. A framework for cluste- ring evolving data streams[ C]. VLDB 2003 : Proceedings of the 29th International Conference on Very Large Data Bases. Berlin:VLDB En- dowment,2003,29 : 81-92.
  • 8HAWKINS D. Identification of Outliers [ M ]. London : Chapman and Hall, 1980.
  • 9陶运信,皮德常.基于邻域和密度的异常点检测算法[J].吉林大学学报(信息科学版),2008,26(4):398-403. 被引量:12
  • 10张忠平,梁永欣.基于反k近邻的流数据离群点挖掘算法[J].计算机工程,2009,35(12):11-13. 被引量:11

引证文献3

二级引证文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部