期刊文献+

面向大规模微博消息流的突发话题检测 被引量:15

Burst Topic Detection Oriented Large-Scale Microblogs Streams
在线阅读 下载PDF
导出
摘要 突发事件在微博中迅速传播,产生巨大的影响力,因此,突发舆情受到政府、企业的广泛关注.现有的突发话题检测算法只考虑单一的特征实体,无法处理微博中新词、图片、链接等诱导的突发.面向大规模微博消息流,提出一种无需中文分词的实时突发话题检测框架模型.模型依据消息流动态调整窗口大小,并通过传播影响力度量实体的突发权值.采用高阶联合聚类算法同时对实体、消息、用户进行聚类分析,在检测突发话题的同时,得到话题的关联消息及参与用户.对比实验结果表明,算法的准确性高,能够更早地检测到突发话题. In microblogs, emergent events spread quickly and produce tremendous influence. Burst of public opinion is widely concerned by government and enterprise. Existing burst topic detection methods only consider one type of entity, such as word or tag. However, Chinese microblogs contain not only new or colloquial words, but also contain some pictures and links, burst patters of which are difficult to detect. To tackle this problem, we propose a real-time burst topic detection framework for multi-type entites. Different from existing method, our method does not require Chinese word segmentation, but generates new words lastly. In this framework,the window size is adjusted based on the microblogs streams dynamically. In order to measure the burst weight of entity, the spread influence of entity is calculated. Moreover, the high order co-clustering algorithm based on non- negative matrix decompostition is used to cluster two types of entities, message and user simultaneously. While the detection of burst topic, we can also obtain the related messages and participating users, which can be used to analyze the cause of burst topic. Experimental on a large Sina Weibo dataset show that our algorithm has higher accuracy and earlier detection of the burst topic compared with the existing algorithms.
出处 《计算机研究与发展》 EI CSCD 北大核心 2015年第2期512-521,共10页 Journal of Computer Research and Development
基金 国家"八六三"高技术研究发展计划基金项目(2012AA012802) 国家自然科学基金项目(61170242)
关键词 突发话题检测 微博 联合聚类 影响力 大规模 burst topic detection microblogs co-clustering influence large scale
  • 相关文献

参考文献18

  • 1Sakaki T. Okazaki M. Matsuo Y. Tweet analysis for real?time event detection and earthquake reporting system development[J]. IEEE Trans on Knowledge and Data Engineering. 2013: 25(4).919-931.
  • 2张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量:171
  • 3Hong Liangjie , Brian D. Empirical study of topic modeling in twitter[C]//Proc of SOMA'IO. New York: ACM. 2010: 80-88.
  • 4Diao Qiming. Jiang j ing , Zhu Feida , et al. Find bursty topic from micrcblogs[C]//PTOC of ACL'12. New York: ACM. 2012: 536-544.
  • 5Cui Anqi , Zhang Min. Liu Yiqun , et al. Discover breaking events with popular hash tags in twitter[C]//PTOC of CIKM'12. New York: ACM, 2012: 1794-1798.
  • 6Takahashi T. Tomioka R, Yarnanishi K. Discovering emerging topics in social streams via link anomaly detection[C]//Proc of ICDM'II. Piscataway. NJ: IEEE. 2011: 1230-1235.
  • 7Krishna Y, James C. Transient crowd discovery on the real?time social Web[C] I/Proc of WSDM'11. New York: ACM. 2011: 585-594.
  • 8Cataldi M, Caro L, Schifanella C. Emerging topic detection on twitter based on temporal and social terms evaluation[C] I/Proc of MDMKDD?10. New York: ACM, 2010: No.4.
  • 9Angel A, Koudas N. Sarkas N, et al, Dense subgraph maintenance under streaming edge weight updates for real?time story identification[C]//Proc of VLDB'12. New York: ACM. 2012: 574-585.
  • 10Agarwal M, Ramamritham K. Bhide M. Real time discovery of dense clusters in highly dynamic graphs: Identifying real world events in highly dynamic environments[C]//Proc of VLDB'12. New York: ACM, 2012: 980-991.

二级参考文献20

  • 1Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 2Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 3Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 4Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 5Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 6Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
  • 7Hofmann T. Probabilistic latent semantic indexing [C] // Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1999:50-57.
  • 8Salton G, McGill M. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill, 1983.
  • 9Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation [J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 10Wei X, Croft W B. LDA-based document models for ad hoc retrieval [C] //Proc of the 29th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York:ACM, 2006:178-185.

共引文献170

同被引文献144

引证文献15

二级引证文献71

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部