摘要
从数据整体和宏观特点给出了离群点的新的定义,并基于数据宏观模式定义了一种新的离群因子,该因子考虑了数据点偏离数据模式的程度和数据点本身归类的不确定性;提出了一种新的Shadowed Sets优化目标,使得在模糊集阴影化过程中更加关注核的准确性;同时基于Shadowed Sets聚类,提出了一种结合聚类的离群点检测算法,该算法可以同时进行聚类和离群点检测;通过模拟数据和Iris数据测试,显示算法具有较好的检测效果。
This paper proposes a new definition for outliers from the macroscopic characteristics of data sets, and designs a new outlier factor of observation (COF) by considering both deviation of outlier to clusters and uncertainty of outliers itself. The paper gives a new optimization goal on Shadowed Sets, which pays more attention to the accuracy of core in the shadowed process of fuzzy sets. Further, the paper develops an outlier detection algorithm based on Shadowed Sets clustering to incorporate the advantages of both COF and Shadowed Sets in a hybridized frame-work. The experimental results on synthetic and Iris data sets demonstrate better effect of the proposed approach.
出处
《计算机科学与探索》
CSCD
2012年第11期985-993,共9页
Journal of Frontiers of Computer Science and Technology
基金
国家自然科学基金(60872152)~~
关键词
离群点
聚类
阴影集
outlier
clustering
Shadowed Sets