粗糙K-Modes聚类算法被引量：6

Rough K-Modes clustering algorithm

下载PDF

导出

摘要 Michael K.Ng等人提出了新K-Modes聚类算法,它采用基于相对频率的启发式相异度度量方法,有效地提高了聚类精度,但不足的是在计算各类的属性分类值频率时假定类中样本对聚类的贡献相同。为了考虑类中样本对类中心的不同影响,提出一种粗糙K-Modes算法,通过粗糙集的上、下近似度量数据样本在类内的重要性程度,不仅可以获得比新K-Modes算法更好的聚类效果,而且可以在保证聚类效果的基础上降低白亮等人提出的基于粗糙集改进的K-Modes算法的计算复杂度。对几个UCI的数据集的测试实验结果显示出新算法的优良性能。 Michael K. Ng et al. proposed the new K-Modes clustering algorithm. It takes the heuristic dissimilarity measure method based on the relative frequency and improves the clustering accuracy. However, when computing the attribute category frequency in each cluster, it assumes each object of the samples plays a uniform contribution to the cluster center. To consider the particular contribution of the different objects, a rough K-Modes algorithm was proposed in this paper. By a new approach based on the upper and lower approximation of rough set to measure the important level of each object in its corresponding cluster, the better clustering results can be achieved than the new K-Modes algorithm, and the computational complexity can be reduced in comparison with the improved K-Modes clustering algorithm based on rough sets of Bai Liang et al. with the equivalent clustering results. The experimental results on several UCI data sets illustrate the effectiveness of the proposed algorithm.

作者李仁侃叶东毅

机构地区福州大学数学与计算机科学学院

出处《计算机应用》 CSCD 北大核心 2011年第1期97-100,共4页 journal of Computer Applications

基金国家自然科学基金资助项目(60805042) 福建省自然科学基金资助项目(2010J01329)

关键词聚类 K—Modes算法粗糙集类中心聚类精度 clustering K-Modes algorithm rough set cluster center clustering accuracy

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献16

1HAN JIAWEI, KAMBER M. Data mining concepts and techniques [ M]. San Francisco, USA: Morgan Kaufmann, 2001.
2HUANG ZHEXUE. Extensions to the k-means algorithm for clustering large data sets with categorical vaiues[ C]// Data Mining and Knowledge Discovery. Netherlands: Kluwer Academic Publishers, 1998:283-304.
3HUANG ZHEXUE, MICHAEL K NG. A fuzzy k-modes algorithm for clustering categorical data[J]. IEEE Transactions on Fuzzy Systems, 1999, 7(4) : 446 -452.
4PALMER C R, FALOUTSOS C. Electricity based external similarity of categorical attributes[ C]// PAKDD '03: Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNAI 2637. Berlin: Springer-Verlag, 2003: 486- 500.
5LE SI QUANG, HO TU BAO. A conditional probability distribution- based dissimilarity measure for categorical data[ C]// PAKDD '04: Proceedings of the 8th Pacific- Asia Conference on Advances in Knowledge Discovery and Data Mining, LNAI 3056. Berlin: Springer-Verlag, 2004:580-589.
6CHENG V, LI C-H, KWOK J T, et al. Dissimilarity learning for nominal data[J]. Pattern Recognition, 2004, 37(7) : 1471 - 1477.
7LEE S-G, YUN D-K. Clustering categorical and numerical data: a new procedure using multidimensional scaling [ J]. International Journal of Information Technology and Decision Making, 2003, 2 (1): 135-160.
8LI CEN, BISWAS GAUTAM. Unsupervised learning with mixed numeric and nominal data[ J]. IEEE Transactions on Knowledge and Data Engineering, 2002, 14(4) :673 -690.
9AHMAD A, DEY L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set[ J]. Pattem Recognition Letters, 2007, 28(1) : 110 -118.
10HE ZENGYOU, DENG SHENGCHUN, XU XIAOFEI. Improving k-modes algorithm considering frequeneies of attribute values in mode[ C]//Proceedings of the International Conference on Computational Intelligence and Security, LNCS 3801. Berlin: Springer- Verlag, 2005:157 - 162.

二级参考文献15

1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量：179
2Han Jiawei,Kamber M. Data Mining:Concepts and Techniques. San Francisco, US: Morgan Kaufmann, 2001
3MacQueen J B. Some methods for classification and analysis of multivariate observation//Proceeding 5^th Berkley Symposium, on Mathematical Statistics and Probability. 1967, I:281-297. University of California Press, 1967, Xvii, 666
4Huang Zhexue. Clustering Large Data Sets with Mixed Numeric and Categorical Values//PAKDD'97. Singapore, World Scientific, 1997:21-35
5Huang Zhexue. Extensions to the k Means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery, 1998,2 : 283-304
6Michael K, Ng M, Li Junjie, et al. On the impact of dissimilarity measure in K-Modes clustering algorithm. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2007,29 (3) : 503-507
7Li Cen, Biswas Gautam. Unsupervised learning with mixed numeric and nominal data. IEEE Transactions on Knowledge and Data Engineering, 2002,14 :673-690
8Hsu C C, Chen Chinlong, Su Yuwei. Hierarchical clustering of mixed data based on distance hierarchy. Information Sciences, 2007 :4474-4492
9Hsu C C. Generalizing self-organizing map for categorical data. IEEE Transaction on Neural Network, 2006,17 (2) : 294-304
10Ganti V, Ramakrishnanz J G R. CACTUS, clustering categorical data using summaries//Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining. San Diego:ACM Press, 1999 : 73-83

共引文献27

1赵恒,张高煜.近似k-median分类属性数据聚类[J].计算机工程,2007,33(8):66-67.
2曹文婷,邹海,段凤玲.基于模糊K-Modes和免疫遗传算法的聚类分析[J].计算机技术与发展,2009,19(2):151-153. 被引量：2
3赵兴旺,梁吉业,曹付元.符号数据最佳聚类个数的确定方法[J].广西师范大学学报（自然科学版）,2009,27(3):130-133.
4贾俊芳,李德玉.一种有效的高维分类数据聚类方法研究[J].微电子学与计算机,2011,28(6):88-91. 被引量：2
5徐丽,丁世飞.粒度聚类算法研究[J].计算机科学,2011,38(8):25-28. 被引量：11
6武森,叶俞飞,俞晓莉.拓展集合差异度高维数据聚类[J].计算机应用研究,2011,28(9):3253-3255.
7孙晓博,廖桂平.基于新的相似性度量的加权粗糙聚类算法[J].计算机工程与科学,2011,33(12):110-115. 被引量：1
8李仁侃,叶东毅.属性赋权的K-Modes算法优化[J].计算机科学与探索,2012,6(1):90-96. 被引量：3
9吴润秀.基于互信息量的改进K-Modes聚类方法[J].统计与决策,2012,28(6):89-91. 被引量：3
10武森,张文丽,黄慧敏,叶俞飞.FD-CABOSFV区间变量高维数据聚类[J].信息系统学报,2012,6(1):77-87.

同被引文献19

1Han Jiwei, Kam B M. Data mining concepts and techniques EM. San Francisco, USA: Morgan Kaufmann, 2001.
2Huang Zhe-xue. Extensions to the k-means algorithm for cluste ring large data sets with categorical values [C]//Data Mining and Knowledge Discovery. Netherlands Kluwer Academic Pub- lishers, 1998:283-304.
3Ng K N, Lim J, Huang J Z, et al. On the impact of dissimilarity measure in k-modes clustering algorithm [J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 2007, 29 (3) : 503-507.
4Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters [C]//Proceedings of Operating Systems Design and Implementation. San Francisco, CA, 2004 : 137-150.
5Huang Zhe-xue. Clustering large data sets with mixed numeric and categorical values [C]//Proc of PAKDD' 97. Singapore: World Scientific, 1997 : 21-35.
6陈黎飞,郭躬德,姜青山.自适应的软子空间聚类算法[J].软件学报,2010,21(10):2513-2523. 被引量：33
7梁吉业,白亮,曹付元.基于新的距离度量的K-Modes聚类算法[J].计算机研究与发展,2010,47(10):1749-1755. 被引量：49
8赵卫中,马慧芳,傅燕翔,史忠植.基于云计算平台Hadoop的并行k-means聚类算法设计研究[J].计算机科学,2011,38(10):166-168. 被引量：84
9陈黎飞,郭躬德.属性加权的类属型数据非模聚类[J].软件学报,2013,24(11):2628-2641. 被引量：7
10Chaoqun LI,Liangxiao JIANG,Hongwei LI.Naive Bayes for value difference metric[J].Frontiers of Computer Science,2014,8(2):255-264. 被引量：4

引证文献6

1杨阳,张为群,刘枫,黄仁杰.基于MapReduce自适应参数的粗糙K-modes算法研究[J].计算机科学,2012,39(11):149-152.
2朱杰,陈黎飞.类属数据的贝叶斯聚类算法[J].计算机应用,2017,37(4):1026-1031. 被引量：2
3程铃钫,杨天鹏,陈黎飞.不平衡数据的软子空间聚类算法[J].计算机应用,2017,37(10):2952-2957. 被引量：4
4程铃钫,陈黎飞,赖晓燕,林燕.不平衡数据软子空间聚类算法在临床医学中的应用与研究[J].软件,2019,40(11):106-110. 被引量：1
5张国鹏,陈学斌,王豪石,翟冉,马征.面向本地差分隐私的K-Prototypes聚类方法[J].计算机应用,2022,42(12):3813-3821. 被引量：9
6胡桂开,杨沛融.基于条件概率分布的混合距离度量方法及应用[J].河北大学学报(自然科学版),2025,45(5):520-529.

二级引证文献15

1容会,沈江炎,韩珂,周祖坤,殷洪杰.一种基于海量高维数据的软子空间聚类改进算法[J].云南民族大学学报（自然科学版）,2018,27(2):125-128.
2杨天鹏,陈黎飞.基于概率模型的非均匀数据聚类算法[J].计算机应用,2018,38(10):2844-2849. 被引量：2
3范虹,史肖敏,姚若侠.头脑风暴算法优化的乳腺MR图像软子空间聚类算法[J].计算机科学与探索,2020,14(8):1348-1357. 被引量：1
4魏巍,陈政,袁君.一种基于制造大数据的产品工艺自适应设计方法[J].中国工程科学,2020,22(4):42-49. 被引量：10
5张曦,李璠,付雪峰,谭德坤,赵嘉.随机学习萤火虫算法优化的模糊软子空间聚类算法[J].江西师范大学学报（自然科学版）,2021,45(2):137-144. 被引量：17
6产院东,郭乔进,梁中岩,胡杰.基于深度学习的入侵检测综述[J].信息化研究,2021,47(4):1-7. 被引量：2
7田小芳.基于人工蜂群算法的计算机网络DDoS攻击检测方法[J].计算机测量与控制,2023,31(12):28-33. 被引量：8
8李橙,何孙秦,卫星,张国华.基于孤立森林算法的弹性光网络异常流量自动识别方法[J].激光杂志,2024,45(1):179-183. 被引量：9
9祁富,陈丽敏.基于k-modes聚类算法的混洗差分隐私方法[J].牡丹江师范学院学报（自然科学版）,2024(2):6-13. 被引量：2
10朱亮,慕京哲,左洪强,谷晶中,朱付保.基于联邦图神经网络的位置隐私保护推荐方案[J].计算机应用,2025,45(1):136-143. 被引量：2

1杨阳,张为群,刘枫,黄仁杰.基于MapReduce自适应参数的粗糙K-modes算法研究[J].计算机科学,2012,39(11):149-152.
2白亮,梁吉业,曹付元.基于粗糙集的改进K-Modes聚类算法[J].计算机科学,2009,36(1):162-164. 被引量：15
3郭涛,丁祥武.基于MapReduce的并行k-modes算法[J].智能计算机与应用,2015,5(1):43-45.
4赵亮,刘建辉,张昭昭.基于贝叶斯距离的K-modes聚类算法[J].计算机工程与科学,2017,39(1):188-193. 被引量：5
5张小宇,梁吉业,曹付元,于慧娟.基于加权连接度的改进K-Modes聚类算法[J].广西师范大学学报（自然科学版）,2008,26(3):189-193. 被引量：3
6李仁侃,叶东毅.属性赋权的K-Modes算法优化[J].计算机科学与探索,2012,6(1):90-96. 被引量：3
7王洪波,刘希玉.基于差分进化计算的K—Modes聚类算法[J].高性能计算技术,2012,0(1):25-30.
8石隽锋,白妙青.一种改进的K-Modes聚类算法[J].现代电子技术,2015,38(4):39-41. 被引量：1
9王小巍,蒋玉明.决策树ID3算法的分析与改进[J].计算机工程与设计,2011,32(9):3069-3072. 被引量：38
10黄苑华,郝志峰,蔡瑞初,谢峰.基于相互依存冗余度量的k-modes算法[J].小型微型计算机系统,2016,37(8):1790-1793. 被引量：6

计算机应用

2011年第1期

浏览历史

内容加载中请稍等...

粗糙K-Modes聚类算法被引量：6

参考文献16

二级参考文献15

共引文献27

同被引文献19

引证文献6

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

粗糙K-Modes聚类算法 被引量：6

参考文献16

二级参考文献15

共引文献27

同被引文献19

引证文献6

二级引证文献15

相关作者

相关机构

相关主题

浏览历史

粗糙K-Modes聚类算法被引量：6