融合Shadowed Sets聚类的离群点检测算法被引量：3

Outlier Detection Algorithm on Shadowed Sets Clustering

下载PDF

导出

摘要从数据整体和宏观特点给出了离群点的新的定义,并基于数据宏观模式定义了一种新的离群因子,该因子考虑了数据点偏离数据模式的程度和数据点本身归类的不确定性;提出了一种新的Shadowed Sets优化目标,使得在模糊集阴影化过程中更加关注核的准确性;同时基于Shadowed Sets聚类,提出了一种结合聚类的离群点检测算法,该算法可以同时进行聚类和离群点检测;通过模拟数据和Iris数据测试,显示算法具有较好的检测效果。 This paper proposes a new definition for outliers from the macroscopic characteristics of data sets, and designs a new outlier factor of observation （COF） by considering both deviation of outlier to clusters and uncertainty of outliers itself. The paper gives a new optimization goal on Shadowed Sets, which pays more attention to the accuracy of core in the shadowed process of fuzzy sets. Further, the paper develops an outlier detection algorithm based on Shadowed Sets clustering to incorporate the advantages of both COF and Shadowed Sets in a hybridized frame-work. The experimental results on synthetic and Iris data sets demonstrate better effect of the proposed approach.

作者王丹毛紫阳吴孟达

机构地区国防科学技术大学理学院数学与系统科学系

出处《计算机科学与探索》 CSCD 2012年第11期985-993,共9页 Journal of Frontiers of Computer Science and Technology

基金国家自然科学基金(60872152)~~

关键词离群点聚类阴影集 outlier clustering Shadowed Sets

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献14

1Hawkins D. Identification of outliers[M]. London: Chap- man and Hall, 1980.
2Bamett V, Lewis T. Outliers in statistical data[M]. New York: John Wiley and Sons, 1994.
3Johnson T, Kwok I, Ng R. Fast computation of 2-dimensional depth contours[C]//Proceedings of the 4th Intemational Conference on Knowledge Discovery and Data Mining, New York, 1998.
4倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法[J].计算机研究与发展,2006,43(9):1639-1643. 被引量：20
5Knorr E, Ng R. Algorithms for mining distance-based out- liers in large datasets[C]//Proceedings of the 24th Interna- tional Conference on Very Large Data Bases (VLDB '98), New York, NY, USA, 1998. San Francisco, CA, USA: Mor- gan Kaufmann Publishers Inc, 1998: 392-403.
6Breunig M M, Kreigel H P, Ng R T, et al. LOF: identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD '00), Dallas, TX, 2000. New York, NY, USA: ACM, 2000: 93-104.
7Dhaliwal P, Bhatia M P S, Bansal P. A cluster-based appro- ach for outlier detection in dynamic data streams (KORM: k-median outlier miner)[J]. Journal of Computing, 2010, 2(2): 74-80.
8Pedrycz W. Shadowed sets: representing and processing fuzzy sets[J]. IEEE Transactions on Systems, Man and Cyber- metics: Part B, 1998, 28(1): 103-109.
9Mitra S, Banka H, Pedrycz W. Rough-fuzzy collaborative clustering[J]. IEEE Transactions on Systems, Man, and Cyber- netics: Part B, 2006, 36(4): 795-805.
10Aggarwal C, Yu E An effective and efficient algorithm for high-dimensional outlier detection[J]. The VLDB Journal, 2005, 14(2): 211-221.

二级参考文献12

1E Knorr, R Ng. Algorithms for mining distance-based outliers in large datasets [C]. The 24th Conf on VLDB, New York,NY, 1998
2M M Breunig, H P Kreigel, R T Ng, et al. LOF: Identifying density-based local outliers [C]. The ACM SIGMOD, Dallas,TX, 2000
3M Joshi, R Agarwal, V Kumar. Mining needles in a haystack:Classifying rare classes via two-phase rule induction [C]. The ACM SIGMOD Int'l Conf on Management of Data, Santa Barbara, CA, 2001
4D Hawkins. Identification of Outliers [M]. London: Chapman and Hall, 1980. 1-45
5S Guha, N Mishra, R Motwani, et al. Clustering data streams[C]. In: Proc of the Annual Syrup on Foundations of Computer Science, 2000. 359- 366. http://citeseer. ist. psu. edu/guha00clustering.html
6Mishra, Adam Meyerson, Sudipto Guha, et al. Streaming-data algorithms for high-quality clustering [C]. In: Proc of IEEE Int'l Conf on Data Engineering, 2002. http://citeseer. ist. psu.edu/497671. html
7J Han, M Kamber. Data Mining [M]. New York: Morgan Kaufmann, 2001. 1-321
8S Robertson, E Siegel, M Miller, et al. Surveillance detection in high bandwidth environments [OL]. http://wwwl. cs.columbia.edu/ids/publications/SD-DiscexⅢ. pdf, 2003
9M Mahoney. Network traffic anomaly detection based on packet bytes [C]. The 2003 ACM Symp on Applied Computing,Melbourne, Florida, 2003
10T Johnson, I Kwok, R Ng. Fast computation of 2-dimensional depth contours [C]. The 4th Int'l Conf on Knowledge Discovery and Data Mining, New York, 1998

共引文献19

1查成东,王长松,巩宪锋,周家新.基于改进K-均值聚类算法的背景提取方法[J].计算机工程与设计,2007,28(21):5141-5143. 被引量：7
2查成东,王长松,巩宪锋,周家新.基于自适应背景模型的运动目标检测[J].光电工程,2008,35(1):26-30. 被引量：6
3熊桂喜,刘铭志.基于改进Sage滤波器的车辆行程时间预测模型[J].计算机技术与发展,2008,18(9):162-164. 被引量：1
4张忠平,梁永欣.基于反k近邻的流数据离群点挖掘算法[J].计算机工程,2009,35(12):11-13. 被引量：11
5曾颖,罗可,邹瑞芝.基于K-均值聚类和凝聚聚类的离群点查找方法[J].计算机工程与应用,2009,45(29):131-133. 被引量：9
6苏晓珂,兰洋,秦玉明,程耀东.基于衰减模型的混合属性数据流离群检测[J].计算机科学,2010,37(5):157-161. 被引量：1
7谷瑞军.数据流挖掘研究进展[J].中国电子商情（通信市场）,2010(3):119-123.
8谷瑞军,陈圣磊.数据流挖掘及其在持续审计中的可用性研究[J].南京审计学院学报,2011,8(1):36-40. 被引量：3
9李文忠,左万利,赫枫龄.一种基于信息熵的多维流数据噪声检测算法[J].计算机科学,2012,39(2):191-194. 被引量：4
10赵学良,朱庆生.基于距离的数据流离群点快速检测[J].世界科技研究与发展,2013,35(4):462-464. 被引量：4

同被引文献44

1蒋盛益,李庆华.一种基于引力的聚类方法[J].计算机应用,2005,25(2):286-288. 被引量：9
2李德毅,孟海军,史雪梅.隶属云和隶属云发生器[J].计算机研究与发展,1995,32(6):15-20. 被引量：1337
3Tan P N, Michael Steinbach, Vipin K'umar. Introduction to data mining[M]. New Jersey: Pearson Education, 2007: 305-402.
4Xia C, Hsu W, Lee M L, et al. BORDER: An efficient computation of boundary points[J]. IEEE Trans onKnowledge and Data Engineering, 2006, 18(3): 289-303.
5Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]. Int Conf on Knowledge Discovery and Data Mining. Portland: ACM, 1996: 226-231.
6Qiu B Z, Yue F, Shen J Y. BRIM: An efficient boundary points detecting algorithm[C]. Advances in Knowledge Discovery and Data Mining. Berlin: Springer, 2007: 761- 768.
7Qiu B Z, Wang S. A boundary detection algorithm of clusters based on dual threshold segmentation[C]. The 7th Int Conf on Computational Intelligence and Security(CIS). Sanya: IEEE, 2011: 1246-1250.
8Desoer C A. Slowly varying system z = A(t)z[J]. IEEE Trans on Automatic Control, 1969, 14(6): 780-781.
9薛丽香,邱保志.基于变异系数的边界点检测算法[J].模式识别与人工智能,2009,22(5):799-802. 被引量：20
10邱保志,曹鹤玲.一种高效的基于联合熵的边界点检测算法[J].控制与决策,2011,26(1):71-74. 被引量：6

引证文献3

1李向丽,耿鹏,邱保志.混合属性数据集的聚类边界检测技术[J].控制与决策,2015,30(1):171-175. 被引量：5
2杨晨旭,蔡克参,张红云,苗夺谦.基于人脸图像的二阶段性别分类算法[J].计算机科学与探索,2021,15(3):524-532. 被引量：3
3高满,张清华,王国胤,姚一豫.针对模糊数据近似处理的阴影集研究综述[J].自动化学报,2024,50(10):1906-1927. 被引量：1

二级引证文献9

1杨佳润.数据挖掘之聚类分析算法综述[J].通讯世界,2017,23(16):291-291. 被引量：10
2李向丽,曹晓锋,邱保志.基于矩阵模型的高维聚类边界模式发现[J].自动化学报,2017,43(11):1962-1972. 被引量：4
3唐思均.基于优化SMOTE算法的非平衡大数据集分类研究[J].沈阳工程学院学报（自然科学版）,2021,17(3):71-76. 被引量：6
4李小玄,董雷.基于深度学习的消防器材自动识别研究[J].电子设计工程,2021,29(19):53-57.
5李俊霞,钱宇华,马国帅,许皓.自适应边缘样本识别的深度聚类算法[J].西南大学学报（自然科学版）,2023,45(3):34-46. 被引量：1
6高满,张清华,王国胤,姚一豫.针对模糊数据近似处理的阴影集研究综述[J].自动化学报,2024,50(10):1906-1927. 被引量：1
7李忠浩,罗文田,陈乾,覃华懋.基于步态特征的性别识别方法研究[J].中国民航飞行学院学报,2025,36(1):59-64.
8张超,丁雨欣,李文涛,徐伟华.新一代人工智能背景下粒计算研究现状与展望[J].南京理工大学学报,2025,49(3):265-277.
9杨华晖,孟晨,王成,姚运志.基于目标特征选择和去除的改进K-means聚类算法[J].控制与决策,2019,34(6):1219-1226. 被引量：17

1钱光超,贾瑞玉,张然,李龙澍.基于遗传聚类算法的离群点检测[J].计算机工程与应用,2008,44(11):155-157. 被引量：1
2蔡江辉,张继福.基于聚类的离群数据挖掘及应用[J].太原重型机械学院学报,2004,25(4):254-258. 被引量：2
3程艳,苗永春.高维数据流的聚类离群点检测算法研究[J].江西师范大学学报（自然科学版）,2014,38(5):449-453. 被引量：2
4胡云,潘祝山,施珺.基于近邻关系的离群约简搜索算法[J].计算机工程,2011,37(21):38-39. 被引量：1
5古平,刘海波,罗志恒.一种基于多重聚类的离群点检测算法[J].计算机应用研究,2013,30(3):751-753. 被引量：21
6杨维永,何军,郑生军,张旭东.一种适宜于子空间聚类的离群点检测算法[J].计算机与现代化,2015(12):39-42. 被引量：2
7闫伟,张浩,陆剑峰.一种离群数据挖掘新方法的研究与应用[J].控制与决策,2006,21(5):563-566. 被引量：5
8闫少华,张巍,滕少华.基于密度的离群点挖掘在入侵检测中的应用[J].计算机工程,2011,37(18):240-242. 被引量：5
9陈庄,黄勇,邹航.基于离群点挖掘的工业控制系统异常检测[J].计算机科学,2014,41(5):178-181. 被引量：13
10徐雪松,刘凤玉.一种基于距离的再聚类的离群数据发现算法[J].计算机应用,2006,26(10):2398-2400. 被引量：4

计算机科学与探索

2012年第11期

浏览历史

内容加载中请稍等...

融合Shadowed Sets聚类的离群点检测算法被引量：3

参考文献14

二级参考文献12

共引文献19

同被引文献44

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

融合Shadowed Sets聚类的离群点检测算法 被引量：3

参考文献14

二级参考文献12

共引文献19

同被引文献44

引证文献3

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

融合Shadowed Sets聚类的离群点检测算法被引量：3