摘要
现有的标注聚类算法大多采用传统的K-means或Single-linkage算法对标注数据直接聚类,但是K-means或Sin-gle-linkage本身固有的缺陷严重影响了聚类结果的质量.给出了一种局部中心度传播聚类算法LCIPC(local centrality in-formation passing clustering),该算法首先在标注相似度的基础上建立标注数据的KNN有向邻居图G;然后利用核密度估计方法计算每个标注的局部中心度;再通过随机游走方法在图G中传播局部中心度,以产生全局中心度等级;最后,调用图深度优先搜索算法发现标注聚类结果.在3个真实数据集上的聚类结果显示,LCIPC算法具有够获得高质量标注聚类结果的能力.
In recent years,most of the proposed tag clustering algorithms directly deal with the tag data by using traditional clustering algorithms,such as: K-means or Single-linkage devices.Nevertheless,the inherent drawbacks of using these traditional clustering algorithms have badly influenced the quality of tag clustering.In this paper,a clustering algorithm named local centrality information passing clustering(LCIPC) was proposed in an attempt to find out how to achieve high quality tag clustering results.First,we utilized the LCIPC,to construct a KNN directed neighbor graph G based on the similarities of the tag's;secondly,the local centrality value of every tag was calculated by using a KNN kernel density estimator;and thirdly,the local centrality value was passed onto the graph G by running a random walk method to generate global centrality rank listings.Eventually,tag clustering results were created based on global centrality ranking list by using an in-depth search first algorithm.The experimental results conducted utilizing three real world datasets indicate that the proposed LCIPC method has the ability of finding high quality tag clustering results.
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2013年第4期499-504,共6页
Journal of Harbin Engineering University
基金
国家自然科学重点基金资助项目(60775037
60933013)
安徽省自然科学基金资助项目(1208085MF95
11040606M151)
安徽省教育厅重点项目(KJ2012A273
KJ2012A274)