期刊文献+

基于词共现图的中文微博新闻话题识别 被引量:31

News topic recognition of Chinese microblog based on word co-occurrence graph
在线阅读 下载PDF
导出
摘要 针对传统的话题检测算法主要适用于新闻网页和博客等长文本信息,而不能有效处理具有稀疏性的微博数据,给出一种基于词共现图的方法来识别微博中的新闻话题.该方法首先在微博数据预处理之后,综合相对词频和词频增加率2个因素抽取微博数据中的主题词.然后根据主题词间的共现度构建词共现图,把词共现图中每个不连通的簇集看成一个新闻话题,并使用每个簇集中包含信息量较大的几个主题词来表示微博新闻话题.最后在微博数据集上进行实验,实现了对微博中新闻话题的识别,验证了该方法的有效性. The traditional topic detection algorithm is applied to longer texts such as: news website pages or blogs, causing it to be hard to deal with sparse microblog data effectively. In this paper, a method based on the word co occurrence graph was provided to detect news topics of microblogs. Firstly, the relative word frequency and the word frequency increase rate were considered to extract new keywords from microblog text after pretreatment. Sec ondly, a word co-occurrence graph was built by co-occurrence degrees of keywords; each unconnected cluster in a word co-occurrence graph was taken as a news topic by calculating several keywords. These keywords contain much more information in each cluster, was used to represent a news topic of microblog. Finally, data analysis provided evidence on how the approach is most effective and also revealed the microblog data set recognized news topic rec ognition.
出处 《智能系统学报》 北大核心 2012年第5期444-449,共6页 CAAI Transactions on Intelligent Systems
基金 国家自然科学基金资助项目(70671039) 中央高校基本科研业务费专项资金资助项目(12MS121)
关键词 微博 新闻话题 新闻话题识别 主题词 词共现图 microblog news topics topic recognition keywords word co-occurrence graph
  • 相关文献

参考文献16

二级参考文献80

共引文献230

同被引文献382

引证文献31

二级引证文献204

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部