期刊文献+

基于热度曲线分类建模的微博热门话题预测 被引量:6

Trend Prediction for Microblog Based on Classification Modeling of Heat Curves
在线阅读 下载PDF
导出
摘要 及时掌握大众关心的热点话题是企业进行商业创新和商务营销的重要前提.现有方法大都依赖于非结构化数据的处理或反复遍历样本集,使算法复杂性较高.文中从话题的统计特性出发,提出建立在结构化数据上的非参数方法.首先对单个话题构建表征话题传播扩散程度和关注聚焦程度的热度曲线;然后对这些形态丰富的热度曲线进行分类建模,得到不同类别曲线的共性特征及发展规律;最后使用分类模型上的加权投票规则预测新话题是否会发展成为热门话题.基于新浪微博平台进行数据收集和实验,结果表明该方法数据结构简单、效果良好、复杂度低且易于控制. Timely acquiring of hot topics is of great significance for commercial innovation and business marketing. Existing methods mostly need to cope with non-structured data or repeated traversal sample set, which results in high complexity. In this paper, emphasizing the topic statistical properties, a non-parameter method based on structured data is proposed to acquire the hot topics in time. Firstly, diffusion degree and focus degree are introduced to build heat curves to characterize the topics. Then, the varied heat curves are classified to determine the common behaviors of the topics. Finally, the weighted-vote scheme is employed to predict whether a topic is trend or not. The experimental results on Sina microblog show that the proposed method has simple data structure and works well with low time complexity and simple manipulation.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2015年第1期27-34,共8页 Pattern Recognition and Artificial Intelligence
基金 973国家重点基础研究发展计划项目(No.2013CB329603) 国家自然科学基金项目(No.71071047) 教育部人文社科基金项目(No.12YJC630073)资助
关键词 热度曲线 分类建模 加权投票 热门话题预测 Heat Curve, Classification Modeling, Weighted-Vote, Trend Prediction
  • 相关文献

参考文献20

  • 1Culnan M J, MeHugh P J, Zubillaga J I. How Large U.S. Companies Can Use Twitter and Other Social Media to Gain Business Value. MIS Quarterly Executive, 2010, 9(4) : 243-259.
  • 2Kwak H, Lee C, Park H, et al. What is Twitter, a Social Network or a News Media.9 // Proc of the 19th International Conference on World Wide Web. Raleigh, USA, 2010:591-600.
  • 3Lee Y H, Wei C P, Cheng T H, et al. Nearest-Neighbor-Based Approach to Time-Series Classification. Decision Support Systems, 2012, 53(1): 207-217.
  • 4Mathioudakis M, Koudas N. Twittermonitor: Trend Detection over the Twitter Stream// Proc of the ACM SIGMOD International Con- ference on Management of Data. Indianapolis, USA, 2010 : 1155 - 1158.
  • 5Cataldi M, Caro L D, Schifanella C. Emerging Topic Detection on Twitter Based on Temporal and Social Terms Evaluation// Proc of the lOth International Workshop on Multimedia Data Mining. Wash- ington, USA, 2010. DOI: 10. 1145/1814245. 1814249.
  • 6Guo J, Zhang P, Tan J L, et al. Mining Hot Topics from Twitter Streams. Procedia Computer Science, 2012, 9 : 2008-2011.
  • 7路荣,项亮,刘明荣,杨青.基于隐主题分析和文本聚类的微博客中新闻话题的发现[J].模式识别与人工智能,2012,25(3):382-387. 被引量:68
  • 8Han J, Xie X, Woo W. Context-Based Local Hot Topic Detectionfor Mobile User [ EB/OL ].[引用时间].http://icserv.gist.ac.kr/mis/publications/data/2010/Pervasive2010.pdf.
  • 9Asur S, Huberman B A, Szabo G, et al. Trends in Social Media: Persistence and Decay// Proc of the 5th International AAAI Confe- rence on Weblogs and Social Media. Barcelona, Spain, 2011 : 434 - 437.
  • 10Yu L, Asur S, Huberman B A. What Trends in Chinese Social Media// Proc of the 5th Workshop on Social Network Mining and Analysis (SNA-KDD). San Diego, USA, 2011:37-46.

二级参考文献30

  • 1李爱国,覃征.在线分割时间序列数据[J].软件学报,2004,15(11):1671-1679. 被引量:27
  • 2骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 3詹艳艳,徐荣聪,陈晓云.基于斜率提取边缘点的时间序列分段线性表示方法[J].计算机科学,2006,33(11):139-142. 被引量:47
  • 4杨一鸣,潘嵘,潘嘉林,杨强,李磊.时间序列分类问题的算法比较[J].计算机学报,2007,30(8):1259-1266. 被引量:44
  • 5Bollegala D, Matsuo Y, Ishizuka M. Measuring Semantic Similarity between Words Using Web Search Engines//Proc of the 16th Inter- national Conference on World Wide Web. Banff, Canada, 2007: 757 - 766.
  • 6Sahami M, Heilman T D. A Web-Based Kernel Function for Meas- uring the Similarity of Short Text Snippets//Pmc of the 15th Inter- national Conference on World Wide Web. Edinburgh, UK, 2006: 377 - 386.
  • 7Blei D M, Ng A Y, Jordan M I. Latent Diriehlet Allocation. Journal of Machine Learning Research, 2003, 3 : 993 - 1022.
  • 8Heinrich G. Parameter Estimation for Text Analysis [ EB/OL ]. [ 2010 -8-10 ]. http ://www. arbylon, net/publications/text-est, pdf.
  • 9Griffiths T L, Steyvers M. Finding Scientific Topies. Proe of the Na- tional Academy of Sciences of the United States of America, 2004, 101 ( zl ) : 5228 -5235.
  • 10Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science, 1990, 41(6) : 391 -407.

共引文献97

同被引文献70

引证文献6

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部