期刊文献+

基于共享最近邻的离群检测算法 被引量:2

Outlier detection algorithm based on shared nearest neighbor
在线阅读 下载PDF
导出
摘要 为识别混合属性数据集中的离群点,提出了一种基于共享最近邻的离群检测算法,通过计算增量聚类结果簇间的共享最近邻相似度,不但能够发现任意形状的簇,还可以检测到变密度数据集中的全局离群点。算法时间复杂度关于数据集的大小和属性个数呈近似线性。在人工数据集和真实数据集上的实验结果显示,提出的算法能有效检测到数据集中的离群点。 This paper introduced an outlier detection algorithm based on the shared nearest neighbor clustering in order to detect the outliers with the mixed attributes.The algorithm calculated the shared nearest neighbor similarity measure between result clusters caused by the incremental clustering.It could not only find the arbitrary shape clusters but also identify the global outlier in large and high-dimensional dataset with different density.Presented approach had nearly linear time complexity with the number of attributes and the size of dataset which results in good scalability.
出处 《计算机应用研究》 CSCD 北大核心 2012年第7期2426-2428,2453,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61163017) 郑州轻工业学院博士科研基金资助项目(2010BSJJ039) 河南省科技攻关资助项目(122102210125) 河南教育厅自然科学基础研究计划资助项目(12B520051)
关键词 共享最近邻 离群检测 任意形状簇 混合属性 shared nearest neighbor(SNN) outlier detection arbitrary shape cluster mixed attributes
  • 相关文献

参考文献10

  • 1PATCHA A,PARK J. An overview of anomaly detection techniques:existing solutions and latest technological trends[J].Computer Networks,2007,(12):3448-3470.
  • 2TSAI C,LIN C. A triangle area based nearest neighbors approach to intrusion detection[J].Pattern Recognition,2010,(01):222-229.
  • 3李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592. 被引量:28
  • 4JIANG Sheng-yi,SONG Xiao-yu. A clustering-based method for unsupervised intrusion detections[J].Pattern Recognition Letters,2006,(05):802-810.
  • 5BREUNIG M,KRIEGEL H,NG R. LOF:Identifying densitybased local outliers[A].New York:ACM,2000.93-104.
  • 6苏晓珂,兰洋.一种高效混合属性离群检测算法[J].小型微型计算机系统,2010,31(11):2282-2286. 被引量:2
  • 7LI Xia,JIANG Sheng-yi. A novel fast clustering algorithm[A].2009.284-288.
  • 8ASUNCION A,NEWMAN D. UCI machine learning repository[EB/OL].http://www.ics.uci.edu/~ mlearn/MLRepository.html,2007.
  • 9蒋盛益,李庆华,赵延喜.一种两阶段异常检测方法[J].小型微型计算机系统,2005,26(7):1237-1240. 被引量:7
  • 10蒋盛益,姜灵敏.一种高效异常检测方法[J].计算机工程,2007,33(7):166-168. 被引量:7

二级参考文献25

  • 1蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量:9
  • 2蒋盛益,李庆华,王卉,孟中楼.一种增强的局部异常挖掘方法[J].计算机研究与发展,2005,42(2):210-216. 被引量:8
  • 3蒋盛益,李庆华,赵延喜.一种两阶段异常检测方法[J].小型微型计算机系统,2005,26(7):1237-1240. 被引量:7
  • 4D Hawkins. Identification of Outliers. London: Chapman and Hall, 1980.
  • 5T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours. In: Proc of the 4th Int'l Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998. 224-228.
  • 6E M Knorr, R T Ng. Algorithms for mining distance-based outliers in large datasets. In: Proc of the 24th Int'l Conf on Very Large Databases. New York: Morgan Kaufmann, 1998. 392-403.
  • 7D Yu, G Sheikholeslami, A Zhang. Findout: Finding outliers in very large datasets. Department of Computer Science and Engineering, State University of New York at Buffalo, Tech Rep:99-03, 1999. http://www. cse. buffalo. edu/tech-reports.
  • 8M Breunig, H Kriegel, R T Ng et al. LOF: Identifying densitybased local outliers. In: Proc of ACM SIGMOD Int'l Cortf on Management of Data. Dallas, Texas: ACM Press, 2000. 93-104.
  • 9M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction. In: Proc of ACM SIGMOD Int'l Conf on Management of Data. Santa Barbara, CA: ACM Press, 2001. 91-102.
  • 10H Samet. The Design and Analysis of Spatial Data Structures.Boston, MA: Addison-Wesley, 1990.

共引文献38

同被引文献21

  • 1张鑫 王文剑.一种基于粒度的支持向量机学习策略.计算机科学,2008,35(8):101-103,116.
  • 2Vapnik V.The Nature of Statistical Learning Theory[M].New York:Springer-Verlay Press,1995:156.
  • 3Yuchun Tang.Granular Support Vector Machines Based On Granular Computing,Soft Computing and Statistical Learning[D].Georgia State University,2006.
  • 4Shifei Ding,Bingjun Qi.Research of granular support vector machine[J].Artif Intell Rev,2012,38(5):1-7.
  • 5Wang Wenjian,Guo Husheng,Jia Yuanfeng,et al.Granular support vector machine based on mixed measure[J].Neurocomputing,2013,101(5):116-128.
  • 6Yuchun Tang,Bo Jin,Yanqing Zhang.Granular support vector machines with association rules mining for protein homology prediction[M].Artificial Intelligence in Medicine,2005(35):121-134.
  • 7Mei Zhen,Shen Qi,Ye Baoxiao.Hybriedized KNN and SVM for gene expression data classification[J].Life Sci.,2009,6:61-66.
  • 8Lam Hong,lee,Chin Heng,et al.A Review of Nearest Neighbor-Support Vector Machines Hybrid Classification Models[J].Journal of Applied Sciences,2010,10(17):1841-1858.
  • 9Jarvis R A,Patrick EA.Clustering.Using a Similarity Measure Based on Shared Nearest Neighbors[J].IEEE Transacitions on Computers,1973,C-22(11):1025-1034.
  • 10Ertoz L,Steinbach M,Kumar V.A New Shared Nearest Neighbor Clustering Algorithm and its Applications[C]//Workshop on Clustering High Dimensional Data and its Applications,Proc.of Text Mine’01,First SIAM intl.Conf.on Data Mining,Chicago,IL,USA,2001.

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部