期刊文献+

信息熵度量的离群数据挖掘算法 被引量:7

An outlier mining algorithm based on information entropy
在线阅读 下载PDF
导出
摘要 离群数据挖掘是为了找出隐含在海量数据中相对稀疏而孤立的异常数据模式,但传统的离群数据挖掘方法受人为因素影响较大.通过引入基于信息熵的离群度量因子,给出一种离群数据挖掘新算法.该算法先利用信息熵计算每个数据对象的离群度量因子,然后通过离群度量因子来衡量每个对象的离群程度,进而检测离群数据,有效地消除了人为主观因素对离群检测的影响,并能很好地解释离群点的含义.最后,采用UCI和恒星光谱数据作为实验数据,通过对实验的分析,验证了该算法的可行性和有效性. The task of outlier mining is to discover patterns that are exceptional, interesting, and sparse or isolated even though they are concealed within tremendous volumes of data. Traditional outlier detection methods are easily influenced by man-made factors. A novel outlier mining algorithm based on information entropy has been formulated. It used an outlier measurement factor based on information entropy. In the algorithm, the outlier measurement factor of each record was calculated using information entropy. Outliers were then detected by analyzing the values of the outlier measurement factor. In this way the impact of man-made factors was eliminated in outlier mining. The definition of an outlier was based on an outlier measurement factor which could explain the meaning of the outliers. Experimental results proved the feasibility and effectiveness of the algorithm when it was used to analyze the UC Irvine (UCI) data set as well as high-dimensional star spectrum data.
出处 《智能系统学报》 2010年第2期150-155,共6页 CAAI Transactions on Intelligent Systems
基金 山西省青年科学基金资助项目(2008021028)
关键词 离群数据 信息熵 离群度量因子 数据挖掘 outlier information entropy outlier measure factor data mining
  • 相关文献

参考文献15

  • 1HAN Jiawei,KAMBER M.Data mining:concepts and techniques[M].Bejing:China Machine Press,2006:254-255.
  • 2HAWKINS D.Identification of outliers[M].London:Chapman and Hall,1980:2-28.
  • 3BARNETT V,LEWIS T.Outliers in statistical data[M].New York:John Wiley & Sons,1994:7,49.
  • 4RUTS I,ROUSSEEUW P.Computing depth contours of bivariate point clouds[J].Computational Statistics and Data Analysis,1996,23(1):153-168.
  • 5ARNING A,AGRAWAL R,RAGHAVAN P.A linear method for deviation in large database[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portlan,Oregon,USA,1996:164-169.
  • 6KNORR E M,NG R T.Algorithms of mining distance-based outliers in large datasets[C]//Proc of Int Conf on Very Large Database (VLDB'98).New York,USA,1998:392-402.
  • 7BREUNIG M M,KRIEGEL H P,NG R T,et al.LOF:identifying density-based local outliers[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data.Dallas:ACM Press,2000:93-104.
  • 8熊家军,李庆华.信息熵理论与入侵检测聚类问题研究[J].小型微型计算机系统,2005,26(7):1163-1166. 被引量:14
  • 9薛萍,金鸿章,王双.应用最大熵原理分析通信系统脆性风险[J].电机与控制学报,2007,11(1):74-78. 被引量:1
  • 10HE Zengyou,XU Xiaofei,DENG Shengchun.A fast greedy algorithm for outlier mining[C]//Proceedings of PAKDD'2006 (LNAI3918).Berlin:Springer-Verlag,2006:567-576.

二级参考文献53

共引文献63

同被引文献44

  • 1薛安荣,鞠时光,何伟华,陈伟鹤.局部离群点挖掘算法研究[J].计算机学报,2007,30(8):1455-1463. 被引量:96
  • 2HAN JW,KAMBER M.数据挖掘概念与技术[M].北京:机械工业出版社,2005.
  • 3Hawkins D. Identification of Outliers [M]. London: Chapman and Hall, 1980.
  • 4Sergios Theodoridis, Konstantinos Koutroumbas等著,李晶皎译.模式识别[M](第3版).北京:电子工业出版社,2006:138-258.
  • 5He Z Y,Xu X F,Deng S C. A Fast Greedy Algorithm for Outlier Mining [C] // In Proc of PAKDD 2006 (LNAI3918), 2006 567-576.
  • 6Zhang J F,Jiang Y Y,Chang K H,et al. A Concept Lattice Based Outlier Mining Method in Low Dimensional Subspaces [J]. Pattern Recognition Letters, 2009,30(15) : 1434-1439.
  • 7ARINDAM B, VIPIN K. Anomaly detection: a survey [ J ]. ACM Computing Surveys (CSUR) ,2009,41 ( 3 ) : 1-58.
  • 8HE Z Y, XU X F, DENG S H. A fast greedy algorithm for outlier mining [ C ]//Proceedings of PAKDD'2006 (LNAD918). Singa- pore: NTU, 2006 : 567 -576.
  • 9ANNA K. A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes[ J]. Data Mining and Knowledge Discovery,2010(20) :259-289.
  • 10AGARWAL C ,YU P S. An effective and efficient algorithm for high-dimensional outlier detection[ J]. The Iutemational Journal on Very Large Data Bases, 2005,14 ( 2 ) : 211-221.

引证文献7

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部