期刊文献+

一种基于词语能量值变化的微博热点话题发现方法研究 被引量:7

Research on Microblog Hot Topic Detection Method Based on Term Energy Change
在线阅读 下载PDF
导出
摘要 随着微博的迅速发展,微博上的热点话题发现成为目前的研究热点之一。文章以微博的实时性强作为研究的切入点,通过研究不同时域上词语的能量值变化,提出一种基于词语能量值变化的微博热点话题检测方法。该方法基于传统的话题生命周期理论,按微博的时间先后顺序对微博进行划分;引入了物理学科中加速度的概念,用词语的加速度来刻画词语在相邻窗口之间速度的变化;综合考虑词语的加速度和权重值来构造词语的复合权值,更适合量化词语的能量值;在单条件概率的基础上,使用了双条件概率的上下文相似度计算方法,并增加文档分布相似度来减少话题混淆的概率。实验表明了文章方法的有效性和稳定鲁棒性。与单条件概率的上下文相似度模型相比,改进之后的上下文相似度模型在不同的关键词检测方法中均具有更好的聚类效果。 With the popularity of microblog, hot topic detection on microblog has been a hot area of research. Regarding the instantaneity of microblog as a point of penetration, the paper proposes a method of hot topic detection based on change of term energy by studying the change of term energy at different time domain. Based on traditional topic aging theory, the method divides all microblog data into different microblog windows, and introduces the concept of acceleration in physics, which uses the acceleration of terms to describe the change of the speed of the terms in the adjacent window. The paper combines the term acceleration and term weight into a compound weight to quantize term energy better. The paper uses double-conditional probability context similarity computing method based on single-conditional probability, and adds document distribution similarity to decrease the probability of topic confusion. The experiments show that the method is effective and stable in robustness. Compared with single-conditional probability context similarity model, the modified context similarity model has better clustering effect in different keyword detection methods.
出处 《信息网络安全》 2015年第10期46-52,共7页 Netinfo Security
基金 国家自然科学基金[61402112] 福建省安全课题[828398]
关键词 热点话题发现 词语能量值 加速度 上下文相似度 hot topic detection term energy acceleration context similarity
  • 相关文献

参考文献17

  • 1郑飞,张蕾.基于分类的中文微博热点话题发现方法研究[J].信息网络安全,2014(9):127-131. 被引量:3
  • 2Allan J, Carbonell J G, Doddington G, et al. Topic Detection and Tracking Pilot Study Final Report[C]//proceedings of the darpa broadcast news transcription and understanding workshop, 1998:194-218.
  • 3Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation[J], the Journal of machine Learning research, 2003, ( 3 ) : 993-1022.
  • 4Chen C C, Chen M C, Chen M S. LIPED: HMM-based life profiles for adaptive event detection[C]//Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM, 2005: 556-561.
  • 5Chen C C, Chen Y T, Chen M C. An aging theory for event lift-- cycle modeling[J]. Systems, Man and Cybernetics, Part A: Systems and Hunlans, IEEE Transactions on, 2007, 37(2): 237-248.
  • 6Weng J, Lim E P, Jiang J, et al. Twitterrank: finding topic-sensitive influential twitterers[C]//Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010: 261-270.
  • 7Du Y, He Y, Tian Y, et al. Microblog bursty topic detection based on user relationship[C]//Information Technology and Artificial Intelligence Conference (ITAIC), 2011 6th IEEEJoint International. IEEE, 2011,( 1 ): 260-263.
  • 8庄婷婷,王平,程齐凯.一种时间情境依赖的微博话题抽取方法[J].信息资源管理学报,2013,3(3):40-46. 被引量:5
  • 9薛素芝,鲁燃,任圆圆.基于速度增长的微博热点话题发现[J].计算机应用研究,2013,30(9):2598-2601. 被引量:17
  • 10郑斐然,苗夺谦,张志飞,高灿.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1):138-141. 被引量:85

二级参考文献78

  • 1张珊,于留宝,胡长军.基于表情图片与情感词的中文微博情感分析[J].计算机科学,2012,39(S3):146-148. 被引量:55
  • 2Kwak H, Lee C, Park H, et al. What is Twitter, a Social Net- work or a News Media? I-A]//WWW' 10 Proceedings of the 19th International Conference on World Wide Web, 2010[C]. Raleigh, North Carolina, USA : ACM, 2010 : 591 -600.
  • 3Liu Zi-tao, Yu Wen-chao, Chen Wei, et al. Short Text Feature Selection for Miero-blog Mining [A]//Computational Intelli- gence and Software Engineering, 2010[C]. Wuhan, China: Wu- han University, 2010: 1-4.
  • 4Pak A,Paxoubek Pa Twitter as a Corpus for Sentiment Analy- sis and Opinion Mining[A]//Proceedings of LREC, 2010[C]. Valletta, Malta: European Language Resources Association (ELRA). 2010:1320-1326.
  • 5Allan J,Carbonell JG, et al. Topic Detection and Tracking Pilot Study Final Report[A]//Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, 1998 [C]. 1998:194-218.
  • 6Sakaki Ti, Okazaki M, Matsuo Y. Earthquake Shakes Twittt User..Real-time Event Detection by Social Sensors [ A] // Pr1 ceedings of the 19th International Conference on World Wi1 Web, 2010[C]. Raleigh, North Carolina: ACM Press, 2010: 85] 861.
  • 7Petrovi S, Osborne M, Lavrenko V. Streaming First Story De- tection with application to Twitter[A]//Proceedings of HLT- NAACL, 2010 [C]. Stroudsburg, PA, USA: Association for Computational Linguisties. 2010:181-189.
  • 8Zhang H P, Yu H K, Xiong D Y, et al. HHMM-based Chinese lexieal analyzer ICTCLAS [A]//. Proceedings of the second SIGHAN workshop on Chinese language processing-Volume 17, 2003 [C]. Sapporo, Japan: Association for Computational Linguistics, 2003 : 184-187.
  • 9路荣,项亮,刘明荣,等.基于隐主题分析和文本聚类的微博客新闻话题发现研究[A]∥第六届全国信息检索学术会议,2010[C].2010:291-298.
  • 10LUNDEN I. Analyst: Twitter passed 500M users in June 2012, 140M of them in US[ EB/OL]. [2013-03-26]. http ://techcrunch. com/2012/07/30/analyst-twitter-passed- 500m-users-in-june-2012-140m-of-them-in-us-jakarta-big- gest-tweeting-city/.

共引文献169

同被引文献61

  • 1贺敏,王丽宏,杜攀,张瑾,程学旗.基于有意义串聚类的微博热点话题发现方法[J].通信学报,2013,34(S1):256-262. 被引量:13
  • 2新浪.新浪微博发布2015第三季度财报[EB/OL].http://tech.sina.com.cn/i/2015- 11 - 19/doc-ifxkwuwxOl83629.shtml, 2015-11-19.
  • 3新浪.新浪微博开放API[EB/OL]. htlp://opeawdbo.com/\viki/%E5%BE%AE)E5%8D%9AAPI, 2015-11-19.
  • 4Binux. pyspider 爬虫教程:AJAX 和 HTTP [EB/OL]. http://blog,binux.me/2015/01/ pyspider-tutorial-level-2-ajax-and-more-http/,2015-11-19.
  • 5CSDN.利用python 实现新浪微博爬虫[EB/OL]. http://blog.csdn.net/monsion/ article/details/7981366, 2015-11 - 19.
  • 6CSDN.PageRank 算法[EB/OL]. http://blog.csdn.net/hguisu/article/details/7996185, 2015-11-19.
  • 7CAPPELLETTIR,SASTRY N. IARank: Ranking Users onTwitter in Near Real-Time, Based on Their Information AmplificationPotential[EB/OL]. http://www.computer.org/csdl/proceedings/socialinformatics/2012/5015/00/5015a070-abs.html, 2015-11-19.
  • 8蔡建超,蔡明.搜索引擎PageRank算法研究[J].计算机应用与软件,2008,25(9):59-60. 被引量:12
  • 9周立柱,贺宇凯,王建勇.情感分析研究综述[J].计算机应用,2008,28(11):2725-2728. 被引量:76
  • 10王娟琴.三种检索模型的比较分析研究——布尔、概率、向量空间模型[J].情报科学,1998,26(3):225-230. 被引量:18

引证文献7

二级引证文献35

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部