期刊文献+

基于加权马氏距离的改进深度嵌入聚类算法 被引量:3

Improved deep embedding clustering algorithm based on weighted Mahalanobis distance
在线阅读 下载PDF
导出
摘要 针对深度嵌入聚类(DEC)算法在数据降维后的特征空间中采用欧氏距离度量嵌入点之间的距离,容易忽视各特征不同量纲以及不同重要性的问题,提出了基于加权马氏距离的改进DEC算法,并同时给出基于加权马氏距离的间隔统计量(GS)方法判断最佳聚类数。该算法使用信息熵加权的马氏距离作为距离度量,规范化了欧氏距离的计算,并利用信息熵加大了对聚类重要的特征的权重。实证表明,基于加权马氏距离的改进DEC算法准确率优于原DEC算法,在UCI的路透社新闻等文本数据集上的聚类效果有明显的提升。利用改进的GS方法判断的最佳聚类数也有很大的可行性。 For Deep Embedded Clustering(DEC)algorithm,using the Euclidean distance to measure the distance between the embedded points in the feature space after dimensionality reduction is easy to ignore different dimension and different importance of each feature.A weighted Mahalanobis distance was proposed to apply to DEC model and the Gap Statistic(GS)method based on weighted Mahalanobis distance was given to judge the optimal number of clusters.Using the Mahalanobis distance weighted by the information entropy as the distance metric not only normalizes the calculation of Euclidean distance,but also uses information entropy to change the weights of features.The empirical results show that the improved DEC algorithm based on weighted Mahalanobis distance is better than the original DEC algorithm,the clustering effects on UCI text datasets such as Reuters news are obviously improved.It is also very feasible to judge optimal cluster number by the improved GS method.
作者 颜子寒 张正军 王雅萍 金亚洲 严涛 YAN Zihan;ZHANG Zhengjun;WANG Yaping;JIN Yazhou;YAN Tao(School of Science,Nanjing University of Science and Technology,Nanjing Jiangsu 210094,China)
出处 《计算机应用》 CSCD 北大核心 2019年第S02期122-126,共5页 journal of Computer Applications
基金 国家自然科学基金资助项目(61773014,11671205)
关键词 深度嵌入聚类模型 信息熵 加权马氏距离 无监督学习 间隔统计量 deep embedding clustering model information entropy weighted Mahalanobis distance unsupervised learning gap statistics
  • 相关文献

参考文献4

二级参考文献34

共引文献64

同被引文献21

引证文献3

二级引证文献22

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部