Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection ...Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection perspectives,our approach works in three settings:review-centric(spam detection),reviewer-centric(spammer detection)and product-centric(spam-targeted product detection).Besides this,to negate any kind of classifier-bias,we employ four classifiers to get a better and unbiased reflection of the obtained results.In addition,we have proposed a new set of features which are compared against some well-known related works.The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.Findings:Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings.In addition,models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual,further establishing the superiority of behavioral features as dominating indicators of opinion spam.The features used in this work provide improvement over existing features utilized in other related works.Furthermore,the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.Research limitations:The analyses conducted in this paper are solely limited to two wellknown datasets,viz.,Yelp Zip and Yelp NYC of Yelp.com.Practical implications:The results obtained in this paper can be used to improve the detection of opinion spam,wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.Originality/value:To the best of our knowledge,this study is the first of its kind which considers three perspectives(review,reviewer and product-centric)and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features.This study also introduces some novel features,which help to improve the performance of opinion spam detection methods.展开更多
Spam or unsolicited emails constitute a major threat to the Internet, the corporations, and the end-users. Statistics show that about 70% - 80% of the emails are spam. There are several techniques that have been imple...Spam or unsolicited emails constitute a major threat to the Internet, the corporations, and the end-users. Statistics show that about 70% - 80% of the emails are spam. There are several techniques that have been implemented to react to the spam on its arrival. These techniques consist in filtering the emails and placing them in the Junk or Spam folders of the users. Regardless of the accuracy of these techniques, they are all passive. In other words, they are like someone is hitting you and you are trying by all the means to protect yourself from these hits without fighting your opponent. As we know the proverbs 'The best defense is a good offense' or 'Attack is the best form of defense'. Thus, we believe that attacking the spammers is the best way to minimize their impact. Spammers send millions of emails to the users for several reasons and usually they include some links or images that direct the user to some web pages or simply to track the users. The proposed idea of attacking the spammers is by building some software to collect these links from the Spam and Junk folders of the users. Then, the software periodically and actively visit these links and the subsequent redirect links as if a user clicks on these links or as if the user open the email containing the tracking link. If this software is used by millions of users (included in the major email providers), then this will act as a storm of Distributed Denial of Service attack on the spammers servers and there bandwidth will be completely consumed by this act. In this case, no human can visit their sites because they will be unavailable. In this paper, we describe this approach and show its effectiveness. In addition, we present an application we have developed that can be used for this reason.展开更多
Within the thriving e-commerce landscape,some unscrupulous merchants hire spammer groups to post misleading reviews or ratings,aiming to manipulate public perception and disrupt fair market competition.This phenomenon...Within the thriving e-commerce landscape,some unscrupulous merchants hire spammer groups to post misleading reviews or ratings,aiming to manipulate public perception and disrupt fair market competition.This phenomenon has prompted a heightened research focus on spammer groups detection.In the e-commerce domain,current spammer group detection algorithms can be classified into three categories,i.e.,Frequent Item Mining-based,graph-based,and burst-based algorithms.However,existing graph-based algorithms have limitations in that they did not adequately consider the redundant relationships within co-review graphs and neglected to detect overlapping members within spammer groups.To address these issues,we introduce an overlapping spammer group detection algorithm based on deep reinforcement learning named DRL-OSG.First,the algorithm filters out highly suspicious products and gets the set of reviewers who have reviewed these products.Secondly,taking these reviewers as nodes and their co-reviewing relationships as edges,we construct a homogeneous co-reviewing graph.Thirdly,to efficiently identify and handle the redundant relationships that are accidentally formed between ordinary users and spammer group members,we propose the Auto-Sim algorithm,which is a specifically tailored algorithm for dynamic optimization of the co-reviewing graph,allowing for adjustments to the reviewers’relationship network within the graph.Finally,candidate spammer groups are discovered by using the Ego-Splitting overlapping clustering algorithm,allowing overlapping members to exist in these groups.Then,these groups are refined and ranked to derive the final list of spammer groups.Experimental results based on real-life datasets show that our proposed DRL-OSG algorithm performs better than the baseline algorithms in Precision.展开更多
It is not uncommon for malicious sellers to collude with fake reviewers(also called spammers)to write fake reviews for multiple products to either demote competitors or promote their products'reputations,forming a...It is not uncommon for malicious sellers to collude with fake reviewers(also called spammers)to write fake reviews for multiple products to either demote competitors or promote their products'reputations,forming a gray industry chain.To detect spammer groups in a heterogeneous network with rich semantic information from both buyers and sellers,researchers have conducted extensive research using Frequent Item Mining-based and graph-based meth-ods.However,these methods cannot detect spammer groups with cross-product attacks and do not jointly consider structural and attribute features,and structure-attribute correlation,resulting in poorer detection performance.There-fore,we propose a collaborative training-based spammer group detection algorithm by constructing a heterogene-ous induced sub-network based on the target product set to detect cross-product attack spammer groups.To jointly consider all available features,we use the collaborative training method to learn the feature representations of nodes.In addition,we use the DBSCAN clustering method to generate candidate groups,exclude innocent ones,and rank them to obtain spammer groups.The experimental results on real-world datasets indicate that the overall detection performance of the proposed method is better than that of the baseline methods.展开更多
文摘Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection perspectives,our approach works in three settings:review-centric(spam detection),reviewer-centric(spammer detection)and product-centric(spam-targeted product detection).Besides this,to negate any kind of classifier-bias,we employ four classifiers to get a better and unbiased reflection of the obtained results.In addition,we have proposed a new set of features which are compared against some well-known related works.The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.Findings:Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings.In addition,models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual,further establishing the superiority of behavioral features as dominating indicators of opinion spam.The features used in this work provide improvement over existing features utilized in other related works.Furthermore,the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.Research limitations:The analyses conducted in this paper are solely limited to two wellknown datasets,viz.,Yelp Zip and Yelp NYC of Yelp.com.Practical implications:The results obtained in this paper can be used to improve the detection of opinion spam,wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.Originality/value:To the best of our knowledge,this study is the first of its kind which considers three perspectives(review,reviewer and product-centric)and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features.This study also introduces some novel features,which help to improve the performance of opinion spam detection methods.
文摘Spam or unsolicited emails constitute a major threat to the Internet, the corporations, and the end-users. Statistics show that about 70% - 80% of the emails are spam. There are several techniques that have been implemented to react to the spam on its arrival. These techniques consist in filtering the emails and placing them in the Junk or Spam folders of the users. Regardless of the accuracy of these techniques, they are all passive. In other words, they are like someone is hitting you and you are trying by all the means to protect yourself from these hits without fighting your opponent. As we know the proverbs 'The best defense is a good offense' or 'Attack is the best form of defense'. Thus, we believe that attacking the spammers is the best way to minimize their impact. Spammers send millions of emails to the users for several reasons and usually they include some links or images that direct the user to some web pages or simply to track the users. The proposed idea of attacking the spammers is by building some software to collect these links from the Spam and Junk folders of the users. Then, the software periodically and actively visit these links and the subsequent redirect links as if a user clicks on these links or as if the user open the email containing the tracking link. If this software is used by millions of users (included in the major email providers), then this will act as a storm of Distributed Denial of Service attack on the spammers servers and there bandwidth will be completely consumed by this act. In this case, no human can visit their sites because they will be unavailable. In this paper, we describe this approach and show its effectiveness. In addition, we present an application we have developed that can be used for this reason.
基金supported by the Natural Science Foundation of China(71772107)the Natural Science Foundation of Shandong Province of China(ZR2023MF070,ZR2020MF044,ZR202102230289)+2 种基金Open Research Fund of Anhui Province Engineering Laboratory for Big Data Analysis and Early Warning Technology of Coal Mine Safety(NO.CSBD2022-ZD01)Shandong Education Quality Improvement Plan for Postgraduate(2021)the SDUST Research Fund.
文摘Within the thriving e-commerce landscape,some unscrupulous merchants hire spammer groups to post misleading reviews or ratings,aiming to manipulate public perception and disrupt fair market competition.This phenomenon has prompted a heightened research focus on spammer groups detection.In the e-commerce domain,current spammer group detection algorithms can be classified into three categories,i.e.,Frequent Item Mining-based,graph-based,and burst-based algorithms.However,existing graph-based algorithms have limitations in that they did not adequately consider the redundant relationships within co-review graphs and neglected to detect overlapping members within spammer groups.To address these issues,we introduce an overlapping spammer group detection algorithm based on deep reinforcement learning named DRL-OSG.First,the algorithm filters out highly suspicious products and gets the set of reviewers who have reviewed these products.Secondly,taking these reviewers as nodes and their co-reviewing relationships as edges,we construct a homogeneous co-reviewing graph.Thirdly,to efficiently identify and handle the redundant relationships that are accidentally formed between ordinary users and spammer group members,we propose the Auto-Sim algorithm,which is a specifically tailored algorithm for dynamic optimization of the co-reviewing graph,allowing for adjustments to the reviewers’relationship network within the graph.Finally,candidate spammer groups are discovered by using the Ego-Splitting overlapping clustering algorithm,allowing overlapping members to exist in these groups.Then,these groups are refined and ranked to derive the final list of spammer groups.Experimental results based on real-life datasets show that our proposed DRL-OSG algorithm performs better than the baseline algorithms in Precision.
基金This paper is supported in part by the Natural Science Foundation of China(No.71772107,62072288)Shandong Nature Science Foundation of China[Grant No.ZR2019MF003,ZR2020MF044].
文摘It is not uncommon for malicious sellers to collude with fake reviewers(also called spammers)to write fake reviews for multiple products to either demote competitors or promote their products'reputations,forming a gray industry chain.To detect spammer groups in a heterogeneous network with rich semantic information from both buyers and sellers,researchers have conducted extensive research using Frequent Item Mining-based and graph-based meth-ods.However,these methods cannot detect spammer groups with cross-product attacks and do not jointly consider structural and attribute features,and structure-attribute correlation,resulting in poorer detection performance.There-fore,we propose a collaborative training-based spammer group detection algorithm by constructing a heterogene-ous induced sub-network based on the target product set to detect cross-product attack spammer groups.To jointly consider all available features,we use the collaborative training method to learn the feature representations of nodes.In addition,we use the DBSCAN clustering method to generate candidate groups,exclude innocent ones,and rank them to obtain spammer groups.The experimental results on real-world datasets indicate that the overall detection performance of the proposed method is better than that of the baseline methods.