摘要
传统的K-means算法对初始聚类中心较为敏感,聚类结果随初始输入不同而波动显著,且易陷入局部最优解。为消除该敏感性,提出了一种改进初始聚类中心选取的新方法。运用主成分分析将高维数据降至平面二维,随后计算每个数据对象的欧氏距离与向量角度参数,建立距离角度混合评价模型,选取k个分散性最高的数据点作为初始聚类中心。实验结果表明,该算法对处理高维数据具有一定的优越性,尤其对非簇状数据集能产生较优的聚类结果,并且消除了初始输入的敏感性。
The traditional K-means algorithm is sensitive to the initial clustering centers.The clustering results fluctuate significantly with different initial inputs and are prone to fall into local optimal solutions.To eliminate that sensi-tivity,a new method is proposed to improve the selection of the initial clustering center.The PCA is used to make high-dimensional data down to a two-dimensional in this method.Then Euclidean distance and vector angle are calculated for each data object parameter,and Distance-Angle hybrid evaluation model is set up to select the k points with high disper-sion as the initial clustering centers.Experiments show that the proposed algorithm has certain advantages while dealing with high-dimensional data,especially for the non-cluster data sets,and it eliminates sensitivity of the initial inputs.
作者
周晓东
董海清
张昆鹏
侯俊丞
孙树峰
ZHOU Xiaodong;DONG Haiqing;ZHANG Kunpeng;HOU Juncheng;SUN Shufeng(Shanghai Police College,Shanghai 200137,China;Ctrip Network Technology(Shanghai)Co.,Ltd.,Shanghai 200335,China;Postal Savings Bank of China,Beijing 100808,China;Shanghai Research Institute of Criminal Science and Technology,Shanghai 200083,China)
出处
《仪表技术》
2025年第2期66-69,73,共5页
Instrumentation Technology
基金
上海市科学技术管理委员会专项课题(22DZ1200500)
上海市智慧警务协同创新中心专项课题(23xtcx04)
上海市智慧警务协同创新中心项目(XC202303006)。