期刊文献+

粗糙K-Modes聚类算法 被引量:6

Rough K-Modes clustering algorithm
在线阅读 下载PDF
导出
摘要 Michael K.Ng等人提出了新K-Modes聚类算法,它采用基于相对频率的启发式相异度度量方法,有效地提高了聚类精度,但不足的是在计算各类的属性分类值频率时假定类中样本对聚类的贡献相同。为了考虑类中样本对类中心的不同影响,提出一种粗糙K-Modes算法,通过粗糙集的上、下近似度量数据样本在类内的重要性程度,不仅可以获得比新K-Modes算法更好的聚类效果,而且可以在保证聚类效果的基础上降低白亮等人提出的基于粗糙集改进的K-Modes算法的计算复杂度。对几个UCI的数据集的测试实验结果显示出新算法的优良性能。 Michael K. Ng et al. proposed the new K-Modes clustering algorithm. It takes the heuristic dissimilarity measure method based on the relative frequency and improves the clustering accuracy. However, when computing the attribute category frequency in each cluster, it assumes each object of the samples plays a uniform contribution to the cluster center. To consider the particular contribution of the different objects, a rough K-Modes algorithm was proposed in this paper. By a new approach based on the upper and lower approximation of rough set to measure the important level of each object in its corresponding cluster, the better clustering results can be achieved than the new K-Modes algorithm, and the computational complexity can be reduced in comparison with the improved K-Modes clustering algorithm based on rough sets of Bai Liang et al. with the equivalent clustering results. The experimental results on several UCI data sets illustrate the effectiveness of the proposed algorithm.
出处 《计算机应用》 CSCD 北大核心 2011年第1期97-100,共4页 journal of Computer Applications
基金 国家自然科学基金资助项目(60805042) 福建省自然科学基金资助项目(2010J01329)
关键词 聚类 K—Modes算法 粗糙集 类中心 聚类精度 clustering K-Modes algorithm rough set cluster center clustering accuracy
  • 相关文献

参考文献16

  • 1HAN JIAWEI, KAMBER M. Data mining concepts and techniques [ M]. San Francisco, USA: Morgan Kaufmann, 2001.
  • 2HUANG ZHEXUE. Extensions to the k-means algorithm for clustering large data sets with categorical vaiues[ C]// Data Mining and Knowledge Discovery. Netherlands: Kluwer Academic Publishers, 1998:283-304.
  • 3HUANG ZHEXUE, MICHAEL K NG. A fuzzy k-modes algorithm for clustering categorical data[J]. IEEE Transactions on Fuzzy Systems, 1999, 7(4) : 446 -452.
  • 4PALMER C R, FALOUTSOS C. Electricity based external similarity of categorical attributes[ C]// PAKDD '03: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNAI 2637. Berlin: Springer-Verlag, 2003: 486- 500.
  • 5LE SI QUANG, HO TU BAO. A conditional probability distribution- based dissimilarity measure for categorical data[ C]// PAKDD '04: Proceedings of the 8th Pacific- Asia Conference on Advances in Knowledge Discovery and Data Mining, LNAI 3056. Berlin: Springer-Verlag, 2004:580-589.
  • 6CHENG V, LI C-H, KWOK J T, et al. Dissimilarity learning for nominal data[J]. Pattern Recognition, 2004, 37(7) : 1471 - 1477.
  • 7LEE S-G, YUN D-K. Clustering categorical and numerical data: a new procedure using multidimensional scaling [ J]. International Journal of Information Technology and Decision Making, 2003, 2 (1): 135-160.
  • 8LI CEN, BISWAS GAUTAM. Unsupervised learning with mixed numeric and nominal data[ J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(4) :673 -690.
  • 9AHMAD A, DEY L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set[ J]. Pattem Recognition Letters, 2007, 28(1) : 110 -118.
  • 10HE ZENGYOU, DENG SHENGCHUN, XU XIAOFEI. Improving k-modes algorithm considering frequeneies of attribute values in mode[ C]//Proceedings of the International Conference on Computational Intelligence and Security, LNCS 3801. Berlin: Springer- Verlag, 2005:157 - 162.

二级参考文献15

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:178
  • 2Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001
  • 3MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666
  • 4Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35
  • 5Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304
  • 6Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507
  • 7Li Cen, Biswas Gautam. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 2002,14 :673-690
  • 8Hsu C C, Chen Chinlong, Su Yuwei. Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 2007 :4474-4492
  • 9Hsu C C. Generalizing self-organizing map for categorical data. IEEE Transaction on Neural Network, 2006,17 (2) : 294-304
  • 10Ganti V, Ramakrishnanz J G R. CACTUS, clustering categorical data using summaries//Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. San Diego:ACM Press, 1999 : 73-83

共引文献27

同被引文献19

引证文献6

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部