摘要
相似度聚类方法(Similarity-based clustering method,SCM)因其简单易实现和具有鲁棒性而广受关注.但由于内含相似度聚类算法(Similarity clustering algorithm,SCA)的高时间复杂度和凝聚型层次聚类(Agglomerative hierarchicalclustering,AHC)的高空间复杂度,SCM不适用大数据集场合.本文首先发现了SCM和核密度估计问题的本质联系,并以此入手,通过快速压缩集密度估计器(Fast reduced set density estimator,FRSDE)和基于图的松弛聚类(Graph-based relaxedclustering,GRC)算法提出了快速自适应相似度聚类方法(Fast adaptive similarity-based clustering method,FASCM).相比于原SCM,该方法的主要优点是:1)其总体渐近时间复杂度与样本容量呈线性关系;2)不依赖于人工经验的干预,具有了自适应性.由此,FASCM适用于大数据集环境.该方法的有效性在图像分割应用中进行了验证.
Similarity-based clustering method (SCM) has received much attention because it is robust and can be implemented simply and easily. However, because of its high time complexity of the embedded similarity clustering algorithm (SCA) and high space complexity of the embedded agglomerative hierarchical clustering (AHC), SCM is impractical for large data sets. In this paper, the relationship is revealed between SCM and the kernel density estimation of samples, a novel fast adaptive similarity-based clustering method (FASCM) is accordingly proposed by adopting fast reduced set density estimator (FRSDE) and graph-based relaxed clustering (GRC). The distinctive advantages of FMSSC over MSSC exist in: 1) its asymptotic linear time complexity with the data size; 2) independent on artificial experience and its adaptability. Thus, FASCM is practical for large datasets. Its effectiveness has also been demonstrated in image segmentation examples.
出处
《自动化学报》
EI
CSCD
北大核心
2011年第2期179-187,共9页
Acta Automatica Sinica
基金
国家自然科学基金(60903100
60975027
60773206)资助~~
关键词
相似度聚类
密度估计
时间复杂度
图像分割
Similarity-based clustering, density estimator, time complexity, image segmentation