摘要
研究一种发现水帖的分类算法.该方法利用SimHash方法将发帖重复当成类似网页去重的问题处理,发帖内容的重复度和其他特征,如发帖的密集型、帐号名称的相似性,所使用的客户端等特征将用于水帖与正常发帖的分类.该文利用新浪微博API下载多个汽车营销账号下的交互数据作为实验数据,并使用SVM作为分类器.实验结果表明,该方法能够较好地发现那些伪装性非常好的水军所发布的水帖.
Using tremendous robot accounts to follow product twitters, and review the posts about mar- keting contents is a typical spam issue in Sina microblogging. This method could change the existing public opinions about the involved products and form fake hot topics. Based on similar behaviors from a set of existing spam accounts, we attempt to identify these fake posts. Our method will use SVM to classify them according to text, time, clients and multiplicity among them. The test sets consists of several marketing twitters about automotive products using Sina Weibo APIs. The test results show that our method can find those well disguised reviews by spammers.
出处
《湘潭大学自然科学学报》
CAS
北大核心
2015年第4期70-74,共5页
Natural Science Journal of Xiangtan University
基金
国家自然科学基金项目(61272367)
关键词
评论行为
评论特征
支持向量机
水帖识别
comments behavior
comments features
support vector machine
mieroblog spammers' review identification