摘要
k-means聚类算法,是在d维空间Rd里把n个数据对象划分为K个类,其划分原则是计算每个数据对象与K个聚类中心的距离并将其分配到最近的一个类.传统直接k-means算法是随机选取初始中心的,不同的初始中心会产生不同的聚类结果,针对这个不足,提出了一种基于排序划分的聚类初始化方法,该方法简单易于实现,将其应用在真实数据集和模拟数据集上,实验表明在处理非高维数据上这是一种简单而有效的方法,在很大程度上提高了聚类精度和效率.
In k-means clustering, we are given a set of n data points in d-dimensional space Raand an integer K the problem is to determine a set of K points in/U, called centers, so as to minimize the mean squared distance from each data point to its nearest center. The initial centers of direct k-means algorithm are chosen randomly, different initial centers will lead to different results. In this paper, in view of the deficiency of direct k-means algorithm, we propose a novel method about initial centers based on sorting and partition and apply it to real data as well as simulated data, which show that this is a simple and efficient method to improve the clustering accuracy and efficiency.
出处
《微电子学与计算机》
CSCD
北大核心
2013年第6期80-83,87,共5页
Microelectronics & Computer