基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法被引量：6

Fast Adaptive Similarity-based Clustering Using Sparse Parzen Window Density Estimation

下载PDF

导出

摘要相似度聚类方法(Similarity-based clustering method,SCM)因其简单易实现和具有鲁棒性而广受关注.但由于内含相似度聚类算法(Similarity clustering algorithm,SCA)的高时间复杂度和凝聚型层次聚类(Agglomerative hierarchicalclustering,AHC)的高空间复杂度,SCM不适用大数据集场合.本文首先发现了SCM和核密度估计问题的本质联系,并以此入手,通过快速压缩集密度估计器(Fast reduced set density estimator,FRSDE)和基于图的松弛聚类(Graph-based relaxedclustering,GRC)算法提出了快速自适应相似度聚类方法(Fast adaptive similarity-based clustering method,FASCM).相比于原SCM,该方法的主要优点是:1)其总体渐近时间复杂度与样本容量呈线性关系;2)不依赖于人工经验的干预,具有了自适应性.由此,FASCM适用于大数据集环境.该方法的有效性在图像分割应用中进行了验证. Similarity-based clustering method （SCM） has received much attention because it is robust and can be implemented simply and easily. However, because of its high time complexity of the embedded similarity clustering algorithm （SCA） and high space complexity of the embedded agglomerative hierarchical clustering （AHC）, SCM is impractical for large data sets. In this paper, the relationship is revealed between SCM and the kernel density estimation of samples, a novel fast adaptive similarity-based clustering method （FASCM） is accordingly proposed by adopting fast reduced set density estimator （FRSDE） and graph-based relaxed clustering （GRC）. The distinctive advantages of FMSSC over MSSC exist in： 1） its asymptotic linear time complexity with the data size; 2） independent on artificial experience and its adaptability. Thus, FASCM is practical for large datasets. Its effectiveness has also been demonstrated in image segmentation examples.

作者钱鹏江王士同邓赵红

机构地区江南大学信息工程学院江南大学数字媒体学院

出处《自动化学报》 EI CSCD 北大核心 2011年第2期179-187,共9页 Acta Automatica Sinica

基金国家自然科学基金(60903100 60975027 60773206)资助~~

关键词相似度聚类密度估计时间复杂度图像分割 Similarity-based clustering, density estimator, time complexity, image segmentation

分类号 TP311.13 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献22

1Yang M S, Wu K L. A similaxity-based robust clustering method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(4): 434--448.
2Deng Z H, Chung F L, Wang S T. FRSDE: fast reduced set density estimator using minimal enclosing ball approximation. Pattern Recognition, 2008, 41(4): 1363-1372.
3Chung F L, Deng Z H, Wang S T. From minimum enclosing ball to fast fuzzy inference system training on large datasets. IEEE Transactions on Fuzzy Systems, 2009, 17(1): 173-184.
4Lee C H, Zaiane O, Park H H, Huang J Y, Greiner R. Clustering high dimensional data: a graph-based relaxed optimization approach. Information Sciences, 2008, 178(23): 4501-4511.
5Girolami M, Chao H. Probability density estimation from optimally condensed data samples. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2003, 25(10): 1253--1264.
6Frigui H, Krishnapuram R. Clustering by competitive agglomeration. Pattern Recognition, 1997, 30(7): 1109-1119.
7Frigui H, Krishnapuram R. A robust competitive clustering algorithm with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450-465.
8Krishnapuram R, Frigui H, Nasraoui O. Fuzzy and possibilistic shell clustering algorithms and their application to boundary detection and surface approximation. 1EEE Transactions on Fuzzy Systems, 1995, 3(1): 29-43.
9Tsang I W H, Kwok J T Y, Zurada J A. Generalized core vector machines. IEEE Transactions on Neural Networks, 2006, 17(5): 1126-1140.
10Tsang I W, Kwok J T, Cheung P M. Core vector machines: fast SVM training on very large data sets. The Journal of Machine Learning Research, 2005, 6(12): 363-392.

二级参考文献2

1李存华,孙志挥.一类数据空间网格化聚类算法的均值近似方法(英文)[J].软件学报,2003,14(7):1267-1274. 被引量：16
2李存华,孙志挥.GridOF:面向大规模数据集的高效离群点检测算法[J].计算机研究与发展,2003,40(11):1586-1592. 被引量：28

共引文献68

1迟文学,王劲峰,李新虎,廖一兰.出生缺陷的空间点格局分析[J].环境与健康杂志,2007,24(4):238-240. 被引量：22
2周智昊,刘斌,李之棠,周丽娟.一种改进的基于连接成功率的P2P识别方法[J].中国海洋大学学报（自然科学版）,2008,38(S1):199-202.
3王洪春,彭宏.一种基于熵的聚类算法[J].计算机科学,2007,34(11):178-179. 被引量：10
4余波,朱东华,刘嵩,郑涛.密度偏差抽样技术在聚类算法中的应用研究[J].计算机科学,2009,36(2):207-209. 被引量：7
5梁飞豹,张惠榕.一种集聚中心的核估计法[J].福州大学学报（自然科学版）,2009,37(3):322-325. 被引量：1
6李俊林,符红光.改进的基于核密度估计的数据分类算法[J].控制与决策,2010,25(4):507-514. 被引量：9
7刘玲.图像检索中一种新的相似性度量方法[J].科技信息,2010(07X):122-123.
8王发银,李春田,王涛.2010年平邑县结核病空间聚集性分析[J].预防医学论坛,2011,17(7):595-597.
9罗剑.多维演化数据流核密度估计[J].计算机工程,2011,37(17):46-48.
10信继权.施工索赔的预防和解决[J].水电站设计,2000,16(1):60-66.

同被引文献46

1毛尚勤,黄心汉,王敏.基于密度聚类的彩色图像分割方法[J].华中科技大学学报（自然科学版）,2011,39(S2):116-119. 被引量：2
2Deng Z H, Chung F L, Wang S T. FRSDE: fast reduced set density estimator using minimal enclosing ball approximation. Pattern Recognition, 2008, 41(4): 1363-1372.
3Tsang I, Kwok J, Zurada J. Generalized core vector machines. IEEE Transactions on Neural Networks, 2006, 17(5): 1126-1140.
4Badoiu M, Clarkson K L. Optimal core-sets for balls. Computational Geometry: Theory and Applications, 2008, 40(1): 14-22.
5Badoiu M, Har-Peled S, Indyk P. Approximate clustering via core-sets. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing. Quebec, Canada: ACM, 2002. 250-257.
6Tsang I, Kwok J, Cheung P. Core vector machines: fast SVM training on very large data sets. The Journal of Ma- chine Learning Research, 2005, 6:363-392.
7Xu D X. Energy, Entropy and Information Potential for Neural Computation [Ph.D. dissertation], University of Florida, USA, 1998.
8Maynou J, Gallardo-Chacon J J, Vallverdu M, Caminal P, Perera A. Computational detection of transcription factor binding sites through differential Renyi entropy. IEEE Transactions on Information Theory, 2010, 56(2): 734-741.
9Jenssen R. Kernel entropy component analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(5): 847-860.
10Chen S, Hong X, Harris C J. Probability density estimation with tunable kernels using orthogonal forward regression. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2010, 40(4): 1101-1114.

引证文献6

1钱鹏江,王士同,邓赵红.快速核密度估计定理和大规模图论松弛聚类方法[J].自动化学报,2011,37(12):1422-1434. 被引量：5
2胡文军,王士同,王娟,应文豪.一般化最小包含球的大样本快速学习方法[J].自动化学报,2012,38(11):1831-1840. 被引量：3
3田红彬.基于双边滤波和巴氏距离的红外小目标检测[J].核电子学与探测技术,2014,34(10):1159-1163. 被引量：11
4司马海峰,米爱中,王志衡,杜守恒.显著特征融合的主颜色聚类分割算法[J].模式识别与人工智能,2016,29(6):492-503. 被引量：1
5乔颖,王士同.快速大样本同步聚类[J].计算机工程与应用,2016,52(23):159-166. 被引量：2
6乔颖,王士同,杭文龙.大规模数据集引力同步聚类[J].控制与决策,2017,32(6):1075-1083. 被引量：3

二级引证文献25

1翁汉琍,陈皓,万毅,黄景光,李振兴,刘华,马磊.基于Bhattacharyya距离算法的线路纵联保护新判据[J].电网技术,2020,44(2):751-760. 被引量：19
2史荧中,王士同,张景祥,倪彤光.面向非静态数据分类的演进支持向量机[J].电子与信息学报,2013,35(6):1413-1420.
3于海鹏,魏涛.基于RGB-D数据集的无参数图像深度估计算法[J].计算机工程与设计,2014,35(4):1336-1340. 被引量：2
4董爱美,王士同,蒋亦樟,黄成泉.基于最小包含球的异质空间大数据集快速相似度学习算法[J].控制与决策,2014,29(9):1553-1561. 被引量：1
5杨小明,胡文军,楼俊钢,蒋云良.局部分块的一类支持向量数据描述[J].计算机应用,2015,35(4):1026-1029. 被引量：2
6周文刚,赵宇,朱海.基于混合高斯模型和空间模糊度的支持向量机算法研究[J].计算机应用研究,2015,32(5):1319-1321. 被引量：3
7黄成泉,王士同,蒋亦樟,董爱美.一种基于L2-SVM的多视角核心向量机[J].控制与决策,2015,30(8):1356-1364. 被引量：4
8孙剑芬.基于直方图灰度归类的快速背景建模方法[J].实验室研究与探索,2015,34(6):15-19. 被引量：2
9张淑美,王福利,谭帅,王姝.多模态过程的全自动离线模态识别方法[J].自动化学报,2016,42(1):60-80. 被引量：18
10孙智权,童钢,赵不贿,张千,周奇,吕兴琴.太阳电池自适应色系分类方法研究[J].太阳能学报,2017,38(6):1546-1552. 被引量：5

1郭鑫,李云,黄云,周清平.最小闭树特征集的聚类与分类方法[J].计算机应用,2010,30(2):423-426. 被引量：5
2钱鹏江,王士同,邓赵红,徐华.基于最小包含球的大数据集快速谱聚类算法[J].电子学报,2010,38(9):2035-2041. 被引量：16
3张玉宁,樊银芳.基于相似度聚类与免疫危险理论的入侵检测方法研究[J].宁夏师范学院学报,2008,29(6):54-57.
4张凤荔,周洪川,张俊娇,刘渊,张春瑞.基于改进凝聚层次聚类的协议分类算法[J].计算机工程与科学,2017,39(4):796-803. 被引量：9
5刘兴波.凝聚型层次聚类算法的研究[J].科技信息,2008(11):202-202. 被引量：5
6张强,李永丽,董立岩,李威,张晓辉.基于有权重超图的离群点检测[J].吉林大学学报（理学版）,2007,45(4):611-616. 被引量：1
7何信振,胡维华,郑秋华.一种基于警报数据关联的入侵检测系统模型[J].计算机工程与科学,2009,31(8):30-32. 被引量：3
8李玲玲.关于凝聚型层次聚类时间复杂度的研究[J].宿州学院学报,2011,26(2):21-22. 被引量：4
9罗崇伟,张立臣.基于车联网的虚假数据检测模型[J].计算机工程与设计,2013,34(6):2272-2276. 被引量：1
10王丽娜,徐巍,刘铸.基于相似度聚类分析方法的异常入侵检测系统的模型及实现[J].小型微型计算机系统,2004,25(7):1333-1336. 被引量：16

自动化学报

2011年第2期

浏览历史

内容加载中请稍等...

基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法被引量：6

参考文献22

二级参考文献2

共引文献68

同被引文献46

引证文献6

二级引证文献25

相关作者

相关机构

相关主题

浏览历史

基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法 被引量：6

参考文献22

二级参考文献2

共引文献68

同被引文献46

引证文献6

二级引证文献25

相关作者

相关机构

相关主题

浏览历史

基于稀疏Parzen窗密度估计的快速自适应相似度聚类方法被引量：6