摘要
针对多数聚类算法只能单独处理数值特征数据或类属特征数据,而不能分析具有两种混合属性数据的问题,基于熵和信息粒度提出了粗糙集理论框架下不同粒度划分上的聚类算法.该算法利用相似关系,通过计算每个数据点的熵并选取具有最小熵值的数据点作为聚类中心,将与该聚类中心相似度大于阈值β的所有数据点聚集形成数字颗粒结构.在整个聚类过程中无需调整每个数据点的熵值,缩短了计算时间,同时利用粗糙集的不可分辨关系形成字符颗粒结构,通过不断调整、合并这两种颗粒结构,实现了具有混合属性特征数据的聚类分析.实验结果比较表明,该算法是有效、可行的,当β取值为0 8 时,算法的聚类有效性最大值可达0 96,该值较同条件下的其他聚类算法要高.
Aiming at most existing clustering algorithms that only handle the numeric data or categorical data rather than the mixed data, a clustering algorithm based on entropy and information granularity was proposed by using different granular partitions under the framework of the rough set theory. Using similarity relation and calculating the entropy at each data point, the data point with minimum entropy was selected as a clustering center. The numeric granules structure is formed by aggregating all data points in which the similarity with the chosen clustering center is larger than a threshold β. It does not need to regulate the entropy value at each data point in the clustering procedure, and saves the computation time. Moreover, the character granules structure is also formed by using indiscernibility relation in rough set. The cluster analysis with mixed attribute data is accomplished by iteratively modifying and agglomerating these two granules structures. The comparison of experimental results shows that the algorithm is effective and feasible. When β is 0.8, the maximum 0.96 of the clustering validity of the algorithm can be achieved, which is higher than others under same conditions.
出处
《西安交通大学学报》
EI
CAS
CSCD
北大核心
2005年第4期343-346,共4页
Journal of Xi'an Jiaotong University
基金
国家高技术研究发展计划资助项目(2003AA1Z2610).
关键词
粗糙集
熵
聚类分析
信息粒度
Data mining
Entropy
Information analysis
Iterative methods
Optimization