The clustering on categorical variables has received intensive attention. In dataset with categorical features, some features show the superior performance on clustering procedure. In this paper, we propose a simple m...The clustering on categorical variables has received intensive attention. In dataset with categorical features, some features show the superior performance on clustering procedure. In this paper, we propose a simple method to find such distinctive features by comparing pooled within-cluster mean relative difference and then partition the data upon such features and give subspace of the subgroups. The applications on zoo data and soybean data illustrate the performance of the proposed method.展开更多
K-means uses the sum-of-squared error as the objective function to minimize within-cluster distances.We show that,as a consequence,it also maximizes between-cluster variances.This means that the two measures do not pr...K-means uses the sum-of-squared error as the objective function to minimize within-cluster distances.We show that,as a consequence,it also maximizes between-cluster variances.This means that the two measures do not provide complementary information and that using only one is enough.Based on this property,we propose a new objective function called cluster overlap,which is measured intuitively as the proportion of points shared between the clusters.We adopt the new function within k-means and present an algorithm called overlap k-means.It is an alternative way to design a k-means algorithm.A localized variant is also provided by limiting the overlap calculation to the neighboring points.展开更多
针对复杂背景中目标边缘提取的问题,提出一种基于梯度幅度直方图和类内方差进行边缘提取的新方法———CAGH(cluster algorithm based on gradient histogram)算法。该算法先分析经“非最大梯度抑制”后的梯度幅度直方图的特征,确定边...针对复杂背景中目标边缘提取的问题,提出一种基于梯度幅度直方图和类内方差进行边缘提取的新方法———CAGH(cluster algorithm based on gradient histogram)算法。该算法先分析经“非最大梯度抑制”后的梯度幅度直方图的特征,确定边缘集中区域,再通过类内方差确定梯度阈值,并利用该阈值确定边缘。在车牌识别中运用该方法提取复杂背景中的车牌边缘,并与Sobel、Canny等算法进行了比较。结果表明,CAGH算法适应性强、提取效率高,提取的是连通性、独立性好的单像素边缘,有利于后续的特征提取和模式识别。展开更多
文摘The clustering on categorical variables has received intensive attention. In dataset with categorical features, some features show the superior performance on clustering procedure. In this paper, we propose a simple method to find such distinctive features by comparing pooled within-cluster mean relative difference and then partition the data upon such features and give subspace of the subgroups. The applications on zoo data and soybean data illustrate the performance of the proposed method.
文摘K-means uses the sum-of-squared error as the objective function to minimize within-cluster distances.We show that,as a consequence,it also maximizes between-cluster variances.This means that the two measures do not provide complementary information and that using only one is enough.Based on this property,we propose a new objective function called cluster overlap,which is measured intuitively as the proportion of points shared between the clusters.We adopt the new function within k-means and present an algorithm called overlap k-means.It is an alternative way to design a k-means algorithm.A localized variant is also provided by limiting the overlap calculation to the neighboring points.
文摘针对复杂背景中目标边缘提取的问题,提出一种基于梯度幅度直方图和类内方差进行边缘提取的新方法———CAGH(cluster algorithm based on gradient histogram)算法。该算法先分析经“非最大梯度抑制”后的梯度幅度直方图的特征,确定边缘集中区域,再通过类内方差确定梯度阈值,并利用该阈值确定边缘。在车牌识别中运用该方法提取复杂背景中的车牌边缘,并与Sobel、Canny等算法进行了比较。结果表明,CAGH算法适应性强、提取效率高,提取的是连通性、独立性好的单像素边缘,有利于后续的特征提取和模式识别。