摘要
聚类分析在信息检索和数据挖掘等领域都有很广泛的应用,K均值聚类算法是一个比较简洁和快速的聚类算法,但是它存在着初始聚类个数必须事先设定以及初始质心的选择也具有随机性等缺陷,造成聚类的结果不是最优的。针对K均值聚类算法中的随机指定初始质心的缺点,提出了基于密度和最近邻相似度的初始质心选择算法,实验显示该算法可以生成质量较高而且较稳定的聚类结果,但是改进的算法需要事先设定最近邻相似度的阈值计算量较大等缺点,还有待改进。
Cluster analysis have very extensive application in information retrieval and data mining, in which K mean algorithm is a more succinct and more fast cluster algorithm, but it has one counts in the initial cluster to need establishing in advance, and the choice of the initial centroid has randomness too, this lead to the fact that the result of the cluster is not optimum. To the shortcoming of appointing the initial center at random in cluster's algorithm of K mean, the authors choose the algorithm after putting forward the Shared Nearest Neighbor similar degree on the basis of the density. Experiment reveals this algorithm can produce higher and more steady cluster' s result of quality. But the improved algorithm needs to establish greater Shared Nearest Neighbor similar degree in advance, so the algorithm still remain to improve.
出处
《沈阳师范大学学报(自然科学版)》
CAS
2009年第4期448-450,共3页
Journal of Shenyang Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(60970112)
关键词
聚类
K均值聚类算法
初始质心
密度
最近邻相似度
clustering
K-means clustering algorithm
initial center point
Density
SNN(Shared Nearest Neighbor)similar degree