期刊文献+

融合Shadowed Sets聚类的离群点检测算法 被引量:3

Outlier Detection Algorithm on Shadowed Sets Clustering
在线阅读 下载PDF
导出
摘要 从数据整体和宏观特点给出了离群点的新的定义,并基于数据宏观模式定义了一种新的离群因子,该因子考虑了数据点偏离数据模式的程度和数据点本身归类的不确定性;提出了一种新的Shadowed Sets优化目标,使得在模糊集阴影化过程中更加关注核的准确性;同时基于Shadowed Sets聚类,提出了一种结合聚类的离群点检测算法,该算法可以同时进行聚类和离群点检测;通过模拟数据和Iris数据测试,显示算法具有较好的检测效果。 This paper proposes a new definition for outliers from the macroscopic characteristics of data sets, and designs a new outlier factor of observation (COF) by considering both deviation of outlier to clusters and uncertainty of outliers itself. The paper gives a new optimization goal on Shadowed Sets, which pays more attention to the accuracy of core in the shadowed process of fuzzy sets. Further, the paper develops an outlier detection algorithm based on Shadowed Sets clustering to incorporate the advantages of both COF and Shadowed Sets in a hybridized frame-work. The experimental results on synthetic and Iris data sets demonstrate better effect of the proposed approach.
出处 《计算机科学与探索》 CSCD 2012年第11期985-993,共9页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金(60872152)~~
关键词 离群点 聚类 阴影集 outlier clustering Shadowed Sets
  • 相关文献

参考文献14

  • 1Hawkins D. Identification of outliers[M]. London: Chap- man and Hall, 1980.
  • 2Bamett V, Lewis T. Outliers in statistical data[M]. New York: John Wiley and Sons, 1994.
  • 3Johnson T, Kwok I, Ng R. Fast computation of 2-dimensional depth contours[C]//Proceedings of the 4th Intemational Conference on Knowledge Discovery and Data Mining, New York, 1998.
  • 4倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法[J].计算机研究与发展,2006,43(9):1639-1643. 被引量:20
  • 5Knorr E, Ng R. Algorithms for mining distance-based out- liers in large datasets[C]//Proceedings of the 24th Interna- tional Conference on Very Large Data Bases (VLDB '98), New York, NY, USA, 1998. San Francisco, CA, USA: Mor- gan Kaufmann Publishers Inc, 1998: 392-403.
  • 6Breunig M M, Kreigel H P, Ng R T, et al. LOF: identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00), Dallas, TX, 2000. New York, NY, USA: ACM, 2000: 93-104.
  • 7Dhaliwal P, Bhatia M P S, Bansal P. A cluster-based appro- ach for outlier detection in dynamic data streams (KORM: k-median outlier miner)[J]. Journal of Computing, 2010, 2(2): 74-80.
  • 8Pedrycz W. Shadowed sets: representing and processing fuzzy sets[J]. IEEE Transactions on Systems, Man and Cyber- metics: Part B, 1998, 28(1): 103-109.
  • 9Mitra S, Banka H, Pedrycz W. Rough-fuzzy collaborative clustering[J]. IEEE Transactions on Systems, Man, and Cyber- netics: Part B, 2006, 36(4): 795-805.
  • 10Aggarwal C, Yu E An effective and efficient algorithm for high-dimensional outlier detection[J]. The VLDB Journal, 2005, 14(2): 211-221.

二级参考文献12

  • 1E Knorr, R Ng. Algorithms for mining distance-based outliers in large datasets [C]. The 24th Conf on VLDB, New York,NY, 1998
  • 2M M Breunig, H P Kreigel, R T Ng, et al. LOF: Identifying density-based local outliers [C]. The ACM SIGMOD, Dallas,TX, 2000
  • 3M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction [C]. The ACM SIGMOD Int'l Conf on Management of Data, Santa Barbara, CA, 2001
  • 4D Hawkins. Identification of Outliers [M]. London: Chapman and Hall, 1980. 1-45
  • 5S Guha, N Mishra, R Motwani, et al. Clustering data streams[C]. In: Proc of the Annual Syrup on Foundations of Computer Science, 2000. 359- 366. http://citeseer. ist. psu. edu/guha00clustering.html
  • 6Mishra, Adam Meyerson, Sudipto Guha, et al. Streaming-data algorithms for high-quality clustering [C]. In: Proc of IEEE Int'l Conf on Data Engineering, 2002. http://citeseer. ist. psu.edu/497671. html
  • 7J Han, M Kamber. Data Mining [M]. New York: Morgan Kaufmann, 2001. 1-321
  • 8S Robertson, E Siegel, M Miller, et al. Surveillance detection in high bandwidth environments [OL]. http://wwwl. cs.columbia.edu/ids/publications/SD-DiscexⅢ. pdf, 2003
  • 9M Mahoney. Network traffic anomaly detection based on packet bytes [C]. The 2003 ACM Symp on Applied Computing,Melbourne, Florida, 2003
  • 10T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours [C]. The 4th Int'l Conf on Knowledge Discovery and Data Mining, New York, 1998

共引文献19

同被引文献44

  • 1蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量:9
  • 2李德毅,孟海军,史雪梅.隶属云和隶属云发生器[J].计算机研究与发展,1995,32(6):15-20. 被引量:1337
  • 3Tan P N, Michael Steinbach, Vipin K'umar. Introduction to data mining[M]. New Jersey: Pearson Education, 2007: 305-402.
  • 4Xia C, Hsu W, Lee M L, et al. BORDER: An efficient computation of boundary points[J]. IEEE Trans onKnowledge and Data Engineering, 2006, 18(3): 289-303.
  • 5Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]. Int Conf on Knowledge Discovery and Data Mining. Portland: ACM, 1996: 226-231.
  • 6Qiu B Z, Yue F, Shen J Y. BRIM: An efficient boundary points detecting algorithm[C]. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2007: 761- 768.
  • 7Qiu B Z, Wang S. A boundary detection algorithm of clusters based on dual threshold segmentation[C]. The 7th Int Conf on Computational Intelligence and Security(CIS). Sanya: IEEE, 2011: 1246-1250.
  • 8Desoer C A. Slowly varying system z = A(t)z[J]. IEEE Trans on Automatic Control, 1969, 14(6): 780-781.
  • 9薛丽香,邱保志.基于变异系数的边界点检测算法[J].模式识别与人工智能,2009,22(5):799-802. 被引量:20
  • 10邱保志,曹鹤玲.一种高效的基于联合熵的边界点检测算法[J].控制与决策,2011,26(1):71-74. 被引量:6

引证文献3

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部