K-means uses the sum-of-squared error as the objective function to minimize within-cluster distances.We show that,as a consequence,it also maximizes between-cluster variances.This means that the two measures do not pr...K-means uses the sum-of-squared error as the objective function to minimize within-cluster distances.We show that,as a consequence,it also maximizes between-cluster variances.This means that the two measures do not provide complementary information and that using only one is enough.Based on this property,we propose a new objective function called cluster overlap,which is measured intuitively as the proportion of points shared between the clusters.We adopt the new function within k-means and present an algorithm called overlap k-means.It is an alternative way to design a k-means algorithm.A localized variant is also provided by limiting the overlap calculation to the neighboring points.展开更多
文摘K-means uses the sum-of-squared error as the objective function to minimize within-cluster distances.We show that,as a consequence,it also maximizes between-cluster variances.This means that the two measures do not provide complementary information and that using only one is enough.Based on this property,we propose a new objective function called cluster overlap,which is measured intuitively as the proportion of points shared between the clusters.We adopt the new function within k-means and present an algorithm called overlap k-means.It is an alternative way to design a k-means algorithm.A localized variant is also provided by limiting the overlap calculation to the neighboring points.