期刊文献+

基于隐主题分析的中文微博话题发现 被引量:19

Discovering topic from Chinese microblog based on hidden topics analysis
在线阅读 下载PDF
导出
摘要 针对高维、稀疏的中文微博数据,提出一种多步骤的新闻话题发现方法。首先结合微博的传播特点,选取出不同时间窗口中具有较高新闻价值的微博文本;再利用隐主题模型挖掘微博内容中隐含的主题信息,并在此基础上进行文本聚类;最后使用频繁项集挖掘技术获取话题关键词集合。该算法能够较好地实现对中文微博数据的降维与话题发现。真实的微博数据集实验结果验证了该方法的有效性。 For high dimensional and sparse Chinese microblog data, this paper proposed a multi-step method for discovering topic. Firstly, it combined with the spread characteristics of microblog, it got the microblog content which had a high news va- lue. Then, it used the hidden topics analysis technique to model the text data and got the result of the text clustering by using the hidden topic information. Finally, the keywords which were best represented the topic content would be obtained from the clustered results through frequent itemsets mining. The experimental results verify the validity of the method on Chinese mi- croblog dataset' s dimensionality reduction and topic identification.
出处 《计算机应用研究》 CSCD 北大核心 2014年第3期700-704,共5页 Application Research of Computers
基金 国家科技支撑计划课题资助项目(2012BAH18B05) 四川大学青年教师科研启动基金资助项目(2013SCU11017)
关键词 中文微博 话题发现 隐主题模型 文本聚类 频繁项集挖掘 Chinese microblog topic discovering hidden topic analysis text clustering frequent itemsets mining
  • 相关文献

参考文献14

  • 1ALLAN J, CARBONELL J, DODDINGTON G. Topic detection and tracking pilot study : final report [ C ]//Proc of DARPA BroadcastNews Transcription and Understanding Workshop. San Francisco: Morgan Kaufmann Publisher Inc, 1998 : 194-218.
  • 2DANUSHKA B, YUTAKA M, MITSURU I. Measuring semantic sim- ilarity between words using Web search engines [ C ]//Proc of the 16th International Conference on World Wide Web. New York : ACM Press, 2007:757-766.
  • 3LIU Zi-tao, YU Wen-chao, CHEN Wei,et al. Short text feature se- lection for microblog mining [ C] //Proc of the 4th International Con- ference on Computational Intelligence and Software Engineering. 2010:1-4.
  • 4郑斐然,苗夺谦,张志飞,高灿.一种中文微博新闻话题检测的方法[J].计算机科学,2012,39(1):138-141. 被引量:85
  • 5赵文清,侯小可.基于词共现图的中文微博新闻话题识别[J].智能系统学报,2012,7(5):444-449. 被引量:31
  • 6路荣,项亮,刘明荣,杨青.基于隐主题分析和文本聚类的微博客中新闻话题的发现[J].模式识别与人工智能,2012,25(3):382-387. 被引量:68
  • 7ZHAO W X, JIANG Jing, WENG Jian-shu, et al. Comparing Twitter and traditional media using topic models [ C] //Proc of the 33rd Eu- ropean Conference on Information Retrieval. Berlin: Springer-Verlag, 2011:338-349.
  • 8张晨逸,孙建伶,丁轶群.基于MB-LDA模型的微博主题挖掘[J].计算机研究与发展,2011,48(10):1795-1802. 被引量:171
  • 9BLEI D, NG A, JORDAN M. Latent Dirichlet allocation[ J]. Jour- na~ of Machine Learning Research ,2003,3(3/1 ) :993-1022.
  • 10GRIFFITH T L, STEYVERS M. Finding scientific topics [ J ]. PNAS, 2004,101 ( 1 ) :5228-5235.

二级参考文献60

  • 1耿焕同,蔡庆生,赵鹏,于琨.一种基于词共现图的文档自动摘要研究[J].情报学报,2005,24(6):651-656. 被引量:15
  • 2骆卫华,于满泉,许洪波,王斌,程学旗.基于多策略优化的分治多层聚类算法的话题发现研究[J].中文信息学报,2006,20(1):29-36. 被引量:38
  • 3Kang J H, Lerman K, Plangprasopchok A. Analyzing Microblogs with affinity propagation [C] //Proc of the 1st KDD Workshop on Social Media Analytic. New York: ACM, 2010:67-70.
  • 4Ramage D, Dumais S, Liebling D. Characterizing microblogs with topic models [C] //Proc of Int AAAI Conf on Weblogs and Social Media. Menlo Park, CA: AAAI, 2010:130-137.
  • 5Xu R, Wunsch D. Survey of clustering algorithms [J]. IEEE Trans on Neural Networks, 2005, 16(3): 645-678.
  • 6Deerwester S, Dumais S, Landauer T, et al. Indexing by latent semantic analysis [J]. Journal of the American Society of Information Science, 1990, 41(6): 391-407.
  • 7Landauer T K, Foltz P W, Laham D. Introduction to Latent Semantic Analysis [J]. Discourse Processes, 1998, 25 (2) 259-284.
  • 8Griffiths T, Steyvers M. Probabilistic topic models [G] // Latent Semantic Analysis: A Road to Meaning. Hillsdale, NJ: Laurence Erlbaum, 2006.
  • 9Hofmann T. Probabilistic latent semantic indexing [C] // Proc of the 22nd Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1999:50-57.
  • 10Salton G, McGill M. Introduction to Modern Information Retrieval [M]. New York: McGraw-Hill, 1983.

共引文献309

同被引文献226

引证文献19

二级引证文献156

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部