期刊文献+

基于LDA模型的评论热点挖掘:原理与实现 被引量:21

Mining Hot Topics of User Comment Based on LDA Model:Principle & Approach
原文传递
导出
摘要 本文提出了潜在狄利克雷分布模型与自然语言处理技术相结合的一种挖掘用户评论热点的方法。为验证该方法的有效性,以22157篇餐馆评论为样本,利用Gibbs抽样计算模型参数,获取了评论热点及相应的热点词语。实验获得的9个主题内容较好地反映了餐馆评论中的热点,与现实生活中用户所关心的餐饮热点基本吻合,表明该模型具有较好的热点识别效果。 This paper presents an approach to mining the hot topics of user comment which combines the Latent Diriehlet Allocation (LDA) model with natural language processing technologies. To verify the validity of the proposed approach, 22 157 comments on restaurants are taken as samples to obtain the hot topics of user comment and their relevant hot words by the use of the Gibbs sampling-computed parameters. The obtained 9 topics reflect the hot topics of user comments on restaurants relatively satisfactorily, and are basically consistent with what the users care about in restaurants in their real life, which shows that this model has a good effect in mining hot topics.
出处 《情报理论与实践》 CSSCI 北大核心 2010年第5期103-106,共4页 Information Studies:Theory & Application
基金 国家自然科学基金资助项目(项目编号:70903047) 上海市重点学科建设项目(项目编号:S30501 J50504) 上海市第三期本科教育高地建设项目(电子商务)的研究成果之一
关键词 热点话题识别 热点挖掘 用户评论 模型 topic detection hot topic mining user comment model
  • 相关文献

参考文献14

  • 1ALLAN J, CARBONELL J, DODDINGTON G, et al. Topic detection and tracking pilot study : final report [ C ] // Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. Virginia: Lansdowne, 1998: 194-218.
  • 2LEEK T, SCHWARTZ R M, SISTA S. Probabilistic approaches to topic detection and tracking [ C ] //Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic : Massachusetts, 2002 : 67-83.
  • 3CHEN K Y, LUESUKPRASERT L, CHOU S C T. Hot topic extraction based on timeline analysis and multidimensional sentence modeling [ J ]. IEEE Transactions on Knowledge Data Engineering, 2007 (19) : 1016-1025.
  • 4曾依灵,许洪波.网络热点信息发现研究[J].通信学报,2007,28(12):141-146. 被引量:29
  • 5周亚东,孙钦东,管晓宏,李卫,陶敬.流量内容词语相关度的网络热点话题提取[J].西安交通大学学报,2007,41(10):1142-1145. 被引量:27
  • 6罗亚平,王枞,周延泉.基于关注度的热点话题发现模型[M]//萧国政,何炎祥,孙茂松.中文计算技术与语言问题研究.北京:电子工业出版社,2007:402-408.
  • 7刘星星,何婷婷,龚海军,陈龙.网络热点事件发现系统的设计[J].中文信息学报,2008,22(6):80-85. 被引量:31
  • 8OKA M, ABE H, KATO K. Extracting topics from Weblogs through frequency segments [ C ] // Proceedings of the WWW2006 Workshop on Web Intelligence, 2006: 22-26.
  • 9YE Hui-min,CHENG Wei,DAI Guan-zhong.Design and Implementation of On-Line Hot Topic Discovery Model[J].Wuhan University Journal of Natural Sciences,2006,11(1):21-26. 被引量:14
  • 10BLEI D M, NG A Y, JORDAN M I. Latent difichlet allocation[J]. Journal of Machine Learning Research, 2003 (3).

二级参考文献45

共引文献99

同被引文献309

  • 1吴刚,唐杰,李涓子,王克宏.细粒度语义网检索[J].清华大学学报(自然科学版),2005,45(S1):1865-1872. 被引量:11
  • 2张云涛,龚玲,王永成.基于综合方法的文本主题句的自动抽取[J].上海交通大学学报,2006,40(5):771-774. 被引量:16
  • 3徐德智,王怀民.基于本体的概念间语义相似度计算方法研究[J].计算机工程与应用,2007,43(8):154-156. 被引量:34
  • 4谭松波,王月粉.中文文本分类语料库-TanCorpv1.0[EB/OL].(2007-08-29)[2008-01-20].http://www.searehforum:org.cn/tansongbo/corpus.htm.
  • 5中国互联网络信息中心.第29次中国互联网络发展状况统计报告[EB/OL].(2012-01-16).http://Avww.cnnie.net.cn/dtygg/dtgg/201201/t20120116_23667.html.
  • 6Yang C C, Tobun D N. Analyzing and visualizing Web opinion de- velopment and social interactions with density- based clustering [ J ]. IEEE Transactions on Systems, Man, and Cybernetics, PartA . Systems and Humans, 2011,41 (6) . 1144 - 1155.
  • 7Dumais S, Fumas G,Landauer T, et al. Using latent semantic anal- ysis to imprnve access to textual information [ C]// Proceedings of Computer Human Interaction. Washington. ACM, 1988.281 - 285.
  • 8Hofmann T. Prohabilistic Latent Semantic Indexing[ C ]//Proceed- ings of the 22th Annual International S[GIR Conference on Re- search and Development in Information Retrieval. Univca, Berke- ley, CA . Assoc Computing Machinery, 1999 . 50 - 57.
  • 9Blei D M, Ng A Y,Jordan M I. Latent d irichlet allocation [ J]. Jour- nal of Machine Learning Research ,2003, 3 (4 -5 ) .993 - 1022.
  • 10Phan X, Nguyen L, Horiguchi S. Learning to classify short and sparse text &web with hidden topics frnm large - scale data collec- tions [ C]//Proceedings of 2008 WWW Conference. New York. ACM ,2008 . 91 - 100.

引证文献21

二级引证文献256

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部