期刊文献+

一种基于量子机制的分类属性数据层次聚类算法

A hierarchical clustering algorithm of categorical attributive data using quantum mechanism
在线阅读 下载PDF
导出
摘要 受物理学中量子机制特性的启发,结合层次凝聚思想,通过引入新的相异性度量测度以及聚类度量尺度步长sβtep概念,重新定义以紧致性指标AIAD和离散性指标AIED为基础的聚类有效性函数CVF,提出一种针对分类属性数据的基于量子机制层次聚类算法CQHC.该算法首先在不同粒度水平上划分数据样本产生初始类(簇),然后以聚类有效性函数CVF为评价标准,动态地合并初始类(簇)完成聚类.仿真实验采用2个真实数据集,即:线性可分的大豆疾病样本数据集和线性不可分的动物园数据集.实验结果表明,该算法与已有的其他几个算法相比,不仅具有更高的聚类准确率,而且能够准确地检测出最佳类别数,是有效且可行的. Enlightened by quantum mechanics in physics and incorporated with agglomerative hierarchical clustering, a quantum mechanism-based hierarchical clustering algorithm of categorical attributive data CQHC was proposed by introducing a new dissimilarity measure and a concept of clustering measure scale step βtep, and redefining the cluster validity function CVF based on compactness index AIAD and discrete- ness index AIED. In this algorithm of CQHC, the data sample was partitioned first according to different granularities levels to generate initial clusters. Then the initial clusters were dynamically merged by taking the cluster validity function CVF as evaluation standard and the clustering was completed. Two real data sets, including linear separable soybean disease data sets and linear inseparable zoo data sets, were used for simulation experiment. Experimental result demonstrated that the proposed algorithm was effective and feasible, which not only had higher clustering accuracy, but also accurately detected the best cluster number when compared to other algorithms available.
出处 《兰州理工大学学报》 CAS 北大核心 2009年第5期89-94,共6页 Journal of Lanzhou University of Technology
基金 甘肃省自然科学基金(0809RJZA005)
关键词 分类属性 量子机制 层次凝聚 聚类度量尺度步长 聚类有效性函数 categorical attribute quantum mechanism hierarchical clustering clustering measure scalestep cluster validity function
  • 相关文献

参考文献12

  • 1SANGUTHEVAR R. Efficient parallel hierarchical-clustering algorithms [J]. IEEE Transactions on Parallel and Distributed Systems, 2005,16 (6) : 497-502.
  • 2HUANG Zhexue,MICHAEL K N. A fuzzy k-modes algorithm for clustering categorical data [J]. IEEE Trans on Fuzzy Systems, 1999,7(4) :446-452.
  • 3HUANG Zhexue A fast clustering algorithm to cluster very large categorical data sets in data mining [C]//Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery. New York: ACM Press, 1997 : 1-8.
  • 4HUANG Zhexue Extensions to the k-means algorithm for clustering large data sets with categorical values [J]. Data Mining and Knowledge Discovery, 1998,2(3) : 283-304.
  • 5陈宁,陈安,周龙骧.数值型和分类型混合数据的模糊K-Prototypes聚类算法(英文)[J].软件学报,2001,12(8):1107-1119. 被引量:49
  • 6李志华,王士同.一种基于量子机制的分类属性数据模糊聚类算法[J].系统仿真学报,2008,20(8):2119-2122. 被引量:6
  • 7赵正天,赵小强,李炜.基于量子机制的改进的分类属性数据聚类算法[J].兰州理工大学学报,2009,35(3):98-102. 被引量:2
  • 8AHMAD A, DEW L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set [J].Pattern Recognition Letters, 2007,28(1): 110-118.
  • 9KIM D W, LEE K H,LEE D. On cluster validity index for estimation of the optimal number of fuzzy clusters [J]. Pattern Recognition, 2004,37(10) : 2009-2025.
  • 10KIM M, RAMAKRISHNA R S. New indices for cluster validity assessment [J]. Pattern Recognition Letters, 2005, 26 (15) : 2353-2363.

二级参考文献27

  • 1吴文丽,刘玉树,赵基海.一种新的混合聚类算法[J].系统仿真学报,2007,19(1):16-18. 被引量:18
  • 2乐逸祥,周磊山,乐群星.微粒群算法的可视化仿真及算法改进[J].系统仿真学报,2007,19(6):1212-1216. 被引量:6
  • 3GUHA S,RASTOGI R,SH M K.CURE:an efficient clustering algorithm for large databases[C]//HAAS L M,TIVARY A.Proc of ACM SIGMOD International Conference on Management of Data.Seattle:ACM Press,1998:73-84.
  • 4HUANG Zhexue,MICHAEL K N.A fuzzy k-modes algorithm for clustering categorical data[J].IEEE Trans on Fuzzy Systems,1999,7(4):446-452.
  • 5HUANG Zhexue.A fast clustering algorithm to cluster very large categorical data sets in data mining[C]//Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery.New York:ACM Press,1997:1-8.
  • 6HUANG Zhexue.Extensions to the k-means algorithm for clustering large data sets with categorical values[J].Data Mining and Knowledge Discovery,1998,2(3):283-304.
  • 7ESPOSITO F,MALEBRA D,TAMMA V,et al.Classical resemblance measures,analysis of symbolic data[M].New York:Springer,2000:139-152.
  • 8AHMAD A,DEY L.A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set[J].Pattern Recognition Letters,2007,28:110-118.
  • 9GANTI V,GEKHRE J E,RAMAKRESHNAN R.CACTUS-clustering data using summaries[C]//Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Diego:ACM Press,1999:311-314.
  • 10AHMAD A,DEY L.A feature selection technique for classificatory analysis[J].Pattern Recognition Letters,2005,26:43-56.

共引文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部