期刊文献+

基于LDA模型的博客垃圾评论发现 被引量:23

LDA-Based Opinion Spam Discovering
在线阅读 下载PDF
导出
摘要 Blog(博客)作为一种新兴的网络媒体,在很大程度上增强了互联网的开放性,Blog已经成为互联网上的主要信息源之一,这也使得Blog空间中的垃圾评论成倍增长,因此如何识别垃圾评论成为面临的重要问题。该文首先借鉴处理垃圾邮件的方法,针对Blog本身的特点,使用规则初步过滤垃圾评论,然后对剩余评论,利用Latent Dirichlet Allocation(LDA)这种能够提取文本隐含主题的产生式模型,对博客中的博文进行主题提取,并结合主题信息进行判断,从而识别Blog空间的垃圾评论。通过实验验证,该方法可以发现大多数垃圾评论,实验取得了较好的结果,使Blog信息更加准确、有效的为用户使用。 As well-known,Blog has become one of the main information sources on the Internet,and the opinion spam also grows fantastically in Blog.The paper focuses on identifying the opinion spam.Firstly,it adopts the method of email spam identification.Considering the characteristics of Blog,it establishes the rules of comments to filter the opinion spam,and then it utilizes the Latent Dirichlet Allocation Model(LDA) to extract the topics information from text content in Blog.Finally,with the topics information integrated,it judges the opinion whether spam or not.Experiments prove it can identify most of the spam opinions,effectively bringing more accurate and efficient Blog information for users.
出处 《中文信息学报》 CSCD 北大核心 2011年第1期41-47,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60673039 60973068) 国家社科基金资助项目(08BTQ025) 国家863高科技计划资助项目(2006AA01Z151) 教育部留学回国人员科研启动基金 高等学校博士学科点专项科研基金资助项目(20090041110002)
关键词 BLOG 博文 LDA 主题 垃圾评论 Blog Blog content LDA topic opinion spam
  • 相关文献

参考文献17

  • 1C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, S. Vigna. A Referenee Collec tion for Web Spam[C]//ACM SIGIR Forum,2006,40 (2) :11-24.
  • 2Dennis Fetterly, Mark Manasse, Marc Najork, Spare, Damn Spam, and Statistics Using Statistical Analysis to Locate Spare Web Pages [C]//Proceedings of the 7th International Workshop on the Web and Datahases : colocated with ACM SIGMOD/PODS 2004, Paris, France, 1- 6.
  • 3M. Hu and B. Liu. Mining and Summarizing Customer Reviews[C]//Proceedings of the tenth International Conference on Knowledge Discovery and Data Mining (KDD2004), Seattle,WA,USA,2004:167-177.
  • 4N. Jindal and B. Liu. Product Review Analysis [M]. Technical Report, The University of Illinois at Chicago, 2007.
  • 5Nitin Jindal and Bing Liu,Opinion Spam and Analysis [C]//Proceedings of the International Conference on Web Search and Data Mining(WSDM2009), Palo Alto, California, USA,2009: 219-230.
  • 6N. Jindal and B. Liu. Analyzing and Detecting Review Spam[C]//Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), Omaha, Nebraska, USA, 2007: 547-552.
  • 7徐琳宏,林鸿飞,潘宇,任惠,陈建美.情感词汇本体的构造[J].情报学报,2008,27(2):180-185. 被引量:437
  • 8D. Blei, A. Ng, and M. Jordan. Latent Dirichlet Allocation [J]. Journal of Machine Learning Research, 2003, 3:998-1022.
  • 9李文波,孙乐,黄瑞红,冯远勇,张大鲲.基于Label-based LDA模型的文本分类新算法[C]//第三届全国信息检索与内容安全学术会议,苏州,2007.
  • 10D. Blei and J. Lafferty, Correlated topic models [C]//Advances in Neural Information Processing Gystems 18, MIT Press, Cambridge, MA. 2006.

二级参考文献23

  • 1林传鼎,无.社会主义心理学中的情绪问题——在中国社会心理学研究会成立大会上的报告(摘要)[J].社会心理科学,2006,21(1):37-37. 被引量:15
  • 2Tsou Benjamin K Y, Kwong O Y, Wong W L. Sentiment and content analysis of Chinese news coverage [ J ]. International Journal of Computer Processing of Oriental Languages, 2005, 18(2) : 171-183.
  • 3Ekman P. Facial expression and emotion [ J]. Americam Psychologist, 1993, 48:384-392.
  • 4Yu Zhang, zhuoming Li, Fuji Ren, Shingo Kuroiwa. Semiautomatic emotion recognition from textual input based on the constructed emotion thesaurus[ C]. Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE' 05). 2005 : 571-576.
  • 5许小颖,陶建华.汉语情感系统中情感划分的研究[C].第一届中国情感计算及智能交互学术会议论文集.2003:199-205.
  • 6Ekman P. An argument for basic emotions [ J]. Cognition and Emotion, 1992, 6: 169-200.
  • 7郑怀德,孟庆海.汉语形容词用法词典[M].北京:商务印书馆,2004.
  • 8Hugo Liu, Henry Lieberman, Ted Selker. A model of textual affect sensing using real-world knowledge [ C ] .Proceedings of the 8th International Conference on Intelligent User Interfaces. 2003: 125-132.
  • 9Hugo Liu, Ted Selker, Henry Lieberman. Visualizing the affective structure of a text document [ C ].Proceedings of Conference on Human Factors in Computing Systems. 2003 : 740-741.
  • 10Hua Wang, Helmut Prendinger, Takeo Igarashi. Communicating emotions in online chat using physiological sensors and animated text [ C ].Proceedings of Conference on Human Factors in Computing Systems. 2004: 1171- 1174.

共引文献520

同被引文献290

引证文献23

二级引证文献161

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部