期刊文献+

基于熵和信息粒度的粗糙集聚类算法 被引量:6

Rough Set Clustering Algorithm Based on Entropy and Information Granularity
在线阅读 下载PDF
导出
摘要 针对多数聚类算法只能单独处理数值特征数据或类属特征数据,而不能分析具有两种混合属性数据的问题,基于熵和信息粒度提出了粗糙集理论框架下不同粒度划分上的聚类算法.该算法利用相似关系,通过计算每个数据点的熵并选取具有最小熵值的数据点作为聚类中心,将与该聚类中心相似度大于阈值β的所有数据点聚集形成数字颗粒结构.在整个聚类过程中无需调整每个数据点的熵值,缩短了计算时间,同时利用粗糙集的不可分辨关系形成字符颗粒结构,通过不断调整、合并这两种颗粒结构,实现了具有混合属性特征数据的聚类分析.实验结果比较表明,该算法是有效、可行的,当β取值为0 8 时,算法的聚类有效性最大值可达0 96,该值较同条件下的其他聚类算法要高. Aiming at most existing clustering algorithms that only handle the numeric data or categorical data rather than the mixed data, a clustering algorithm based on entropy and information granularity was proposed by using different granular partitions under the framework of the rough set theory. Using similarity relation and calculating the entropy at each data point, the data point with minimum entropy was selected as a clustering center. The numeric granules structure is formed by aggregating all data points in which the similarity with the chosen clustering center is larger than a threshold β. It does not need to regulate the entropy value at each data point in the clustering procedure, and saves the computation time. Moreover, the character granules structure is also formed by using indiscernibility relation in rough set. The cluster analysis with mixed attribute data is accomplished by iteratively modifying and agglomerating these two granules structures. The comparison of experimental results shows that the algorithm is effective and feasible. When β is 0.8, the maximum 0.96 of the clustering validity of the algorithm can be achieved, which is higher than others under same conditions.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2005年第4期343-346,共4页 Journal of Xi'an Jiaotong University
基金 国家高技术研究发展计划资助项目(2003AA1Z2610).
关键词 粗糙集 聚类分析 信息粒度 Data mining Entropy Information analysis Iterative methods Optimization
  • 相关文献

参考文献7

  • 1卜东波,白硕,李国杰.聚类/分类中的粒度原理[J].计算机学报,2002,25(8):810-816. 被引量:95
  • 2Han Jianwei, Kamber M. Data mining: concepts and techniques [M]. San Francisco: Morgan Kaufmann Publisher, 2000.
  • 3Grabmeier J, Rudolph A. Techniques of cluster algorithm in data mining[J]. Data Mining and Knowledge Discovery, 2002, 6(4): 303-360.
  • 4Pawalk Z. Rough sets [J]. International Journal of Computer and Information Science, 1982,11(5): 341-356.
  • 5Pawlak Z. Rough set: theoretical aspects of reasoning about data [M]. Norwell, Netherlands: Kluwer Academic Publisher, 1991.
  • 6Skowron A, Peters J. Rough sets: trends and challenges [A]. Proceedings of the 9th International Conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing [C]. Berlin: Springer-Verlag, 2003. 25-34.
  • 7Hirano S, Tsumoto S, Okuzaki T, et al. A clustering method for nominal and numerical data based on rough set theory [J]. Bulletin of International Rough Set Society, 2001, 5(1-2): 211-216.

二级参考文献6

  • 1王珏,苗夺谦,周育健.关于Rough Set理论与应用的综述[J].模式识别与人工智能,1996,9(4):337-344. 被引量:264
  • 2苗夺谦.Rough Set理论在机器学习中的应用研究:博士学位论文[M].北京:中国科学院自动化研究所,1997..
  • 3Vapnik V N.统计学习理论的本质(中文版)[M].北京:清华大学出版社,2000..
  • 4黄萱菁.大规模中文文本的检索、分类与摘要研究:博士学位论文[M].上海:复旦大学,1998..
  • 5邵健.基于Rough Sets的信息粒度计算及其应用:硕士学位论文[M].北京:中国科学院自动化研究所,2000..
  • 6王珏,王任,苗夺谦,郭萌,阮永韶,袁小红,赵凯.基于Rough Set理论的“数据浓缩”[J].计算机学报,1998,21(5):393-400. 被引量:239

共引文献94

同被引文献89

引证文献6

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部