期刊文献+

基于SVM的新浪微博营销类水帖识别研究 被引量:5

Research on Sina Microblogging Marketing Spam Review Detection Based on Support Vector Machine
在线阅读 下载PDF
导出
摘要 研究一种发现水帖的分类算法.该方法利用SimHash方法将发帖重复当成类似网页去重的问题处理,发帖内容的重复度和其他特征,如发帖的密集型、帐号名称的相似性,所使用的客户端等特征将用于水帖与正常发帖的分类.该文利用新浪微博API下载多个汽车营销账号下的交互数据作为实验数据,并使用SVM作为分类器.实验结果表明,该方法能够较好地发现那些伪装性非常好的水军所发布的水帖. Using tremendous robot accounts to follow product twitters, and review the posts about mar- keting contents is a typical spam issue in Sina microblogging. This method could change the existing public opinions about the involved products and form fake hot topics. Based on similar behaviors from a set of existing spam accounts, we attempt to identify these fake posts. Our method will use SVM to classify them according to text, time, clients and multiplicity among them. The test sets consists of several marketing twitters about automotive products using Sina Weibo APIs. The test results show that our method can find those well disguised reviews by spammers.
作者 叶施仁 孙宁
出处 《湘潭大学自然科学学报》 CAS 北大核心 2015年第4期70-74,共5页 Natural Science Journal of Xiangtan University
基金 国家自然科学基金项目(61272367)
关键词 评论行为 评论特征 支持向量机 水帖识别 comments behavior comments features support vector machine mieroblog spammers' review identification
  • 相关文献

参考文献9

二级参考文献69

  • 1朱嫣岚,闵锦,周雅倩,黄萱菁,吴立德.基于HowNet的词汇语义倾向计算[J].中文信息学报,2006,20(1):14-20. 被引量:328
  • 2Jindal N, Liu B. Opinion spare and analysis[C]//Web Services Dislributed Management: WSDM. Beijing, 2008, 2: 219-229.
  • 3Jindal N, Liu B. Analyzing and detecting review spam[C]//International Conference on Data Mining, Seventh IEEE International Conference on Data Mining. Omaha, 2007, 6: 547-552.
  • 4Jindal N, Liu B. Review spare detection[C]// WWW.2007. Alberta, 2007, 5: 1189-1190.
  • 5Bhattarai A, Rus V, Dasgupta D. Characterizing comment spam in the blogosphere through content analysis[C]//IEEE Xplore. Shanghai, 2009.
  • 6杨宇航,郑德权,于浩,等.基于内容分析的作弊评论自动识别[c]//第四届全国网络与信息安全技术研讨会.青岛,2007:288-294.
  • 7Andrei Z Broder, Steven C Glassman, Mark S Manass~, et al. Syntactic clustering of the Web[ J]. Computer Networks and ISDN Systems, 1997, 29(8-13) :1157-1166.
  • 8Huang Lian-en, Wang Lei, Li Xiao-ming. Achieving both high precision and high recall in near-duplicate detection[A]. In: Pro- ceeding of the 17th ACM Conference on Information and Knowl- edge Management~ C], ACM, 2008: 63-72.
  • 9Moses S Chafikar. Similarity estimation techniques from rounding algorithms[ A] . In: Proceedings of 34th Annual ACM Symposium on Theory of Computing[ C ], ACM, 2002: 380-388.
  • 10Alcksandcr Kolcz, Abdur Chowdhury, Joshua Alspcctor. Improvedrobustness of signature-based near-replica detection via lexicon ran- domization[A]. In: Proceedings of the 10th ACM SIGKDD Inl~r- national Conference on Knowledge Discovery and Data Mining [C], ACM, 2004: 605-610.

共引文献654

同被引文献43

引证文献5

二级引证文献49

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部