期刊文献+

面向用户生成内容的短文本聚类算法研究 被引量:6

Research on Short Text Clustering Algorithm for User Generated Content
原文传递
导出
摘要 针对用户生成内容中短文本特征语义描述能力弱和K-means算法对初始聚类中心选值的敏感性问题,通过维基百科概念、链接结构和类别体系信息对短文本进行特征扩展以补充其语义信息。进而以文本间语义关系为基础构建文本集加权复杂网络,利用节点综合特性来选取初始聚类中心,并结合K-means算法对网络节点进行社团划分以达到短文本聚类的目的。实验结果表明,该方法能够有效提高短文本聚类效果。 To solve the problem of weak semantic description ability of short text feature in user generated content, and the traditional K - means algorithm for document clustering is sensitive to the initial clustering center, this paper proposes that the semantic features information of short text can be supplied by feature extension based on the concept, link struc- ture and category system of Wikipedia. Then the weighted complex network of short text set is built by the semantic rela- tion of texts, and text clustering is achieved by node partitioning community based on K - means algorithm whose initial clustering center is chosen according to the synthetic characteristics of network nodes. Results of experiment show that the algorithm proposed by this paper can improve the effect of short text clustering.
作者 赵辉 刘怀亮
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第9期88-92,共5页 New Technology of Library and Information Service
关键词 短文本聚类 特征扩展 复杂网络 K—means算法 用户生成内容 Short text clustering Feature extension Complex network K - means algorithm User generated content
  • 相关文献

参考文献15

二级参考文献101

共引文献523

同被引文献109

引证文献6

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部