期刊文献+

基于层叠条件随机场的微博热点话题跟踪

HOT MICROBLOGGING TOPICS TRACKING BASED ON CASCADED CONDITIONAL RANDOM FIELDS
在线阅读 下载PDF
导出
摘要 微博文本的数据稀疏特性,使传统话题跟踪技术只能捕获部分话题微博且准确度不高。同时,在追踪过程中,话题会出现漂移现象。针对以上两个问题,提出一种基于层叠条件随机场的微博热点话题跟踪方法。该方法先通过标识模型标识出可能相关的微博,源热点微博和标识微博分别作为分类模型的观察序列和状态序列来计算相关度分类。其次,通过构造自适应模型对识别模型进行更新且削弱数据稀疏问题,并从相关微博中选取新的观察序列,其余作为新的状态序列进行迭代分类处理。实验表明,该方法比传统方法综合指标F值平均提升4.13%。 Because of the sparse data characteristic of microblogging text,traditional topics tracking technologies can only capture part of the topical microblogs in low accuracy. At the same time,topic drifting problem will appear in tracking process as well. In this paper,we present a CCRFs-based hot microblogging topics tracking method for two problems mentioned above. The method first marks the microblogs possibly correlated with hot topics through identification model,the source microblogs with hot topics and the marked microblogs are used as the classification model 's observation sequence and the state sequence respectively to calculate the correlation classification. Then,by constructing the adaptive model it updates the identification model and weakens the data sparse problem,and selects new observation sequence from correlated microblogs and leaves the rest as new state sequence for iterative classification processing. Experiments showed that this method improved 4. 13% in average in value of comprehensive index( F) compared with traditional methods.
出处 《计算机应用与软件》 CSCD 2016年第4期56-59,102,共5页 Computer Applications and Software
基金 国家自然科学基金项目(81360230) 科技部科技型中小企业技术创新基金项目(13C26215305404)
关键词 话题跟踪 话题漂移 层叠条件随机场 话题词典 Topic tracking Topic drifting Cascaded conditional random fields(CCRFs) Topic dictionary
  • 相关文献

参考文献16

  • 1丁兆云,贾焰,周斌.微博数据挖掘研究综述[J].计算机研究与发展,2014,51(4):691-706. 被引量:123
  • 2张剑峰,夏云庆,姚建民.微博文本处理研究综述[J].中文信息学报,2012,26(4):21-27. 被引量:55
  • 3Liu Z,Yu W,Chen W,et al.Short text feature selection and classification for microblog miming[C]//Proceedings of CiS E’2010,2010:1-4.
  • 4Sriram B,Davis Fuhry,Engin Demir,et al.Short text classification in twitter to improve information filtering[C]//Proceedings of SIGIR’10.Switzerland,2010.
  • 5Lin J,Snow R,Morgan W.Smoothing techniques for adaptive online language models:Topic tracking in Tweet streams[C]//Proc of the17th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining(KDD’11).New York:ACM,2011:422-429.
  • 6Phuvipadawat S,Murata T.Breaking News Detection and Tracking in Twitter[C]//Proc of the 9thIEEE/WIC/ACM Int Conf on Web Intelligence and Intelligent Agent Technology(WI-IAT’10).New York:ACM,2010:120-123.
  • 7王会珍,朱靖波,季铎,叶娜,张斌.基于反馈学习自适应的中文话题追踪[J].中文信息学报,2006,20(3):92-98. 被引量:17
  • 8张辉,周敬民,王亮,赵莉萍.基于三维文档向量的自适应话题追踪器模型[J].中文信息学报,2010,24(5):70-76. 被引量:11
  • 9郝秀兰,胡运发,申情.中文论坛内容监测的方法研究[J].中文信息学报,2012,26(3):129-136. 被引量:3
  • 10Lafferty J,Mccallum A,Perira F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]//Proc.of the18th Int’l Conf.on Machine Learning(ICML 2001),2001,28(6):282-289.

二级参考文献246

共引文献270

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部