Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartph...Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.展开更多
Email communication plays a crucial role in both personal and professional contexts;however,it is frequently compromised by the ongoing challenge of spam,which detracts from productivity and introduces considerable se...Email communication plays a crucial role in both personal and professional contexts;however,it is frequently compromised by the ongoing challenge of spam,which detracts from productivity and introduces considerable security risks.Current spam detection techniques often struggle to keep pace with the evolving tactics employed by spammers,resulting in user dissatisfaction and potential data breaches.To address this issue,we introduce the Divide and Conquer-Generative Adversarial Network Squeeze and Excitation-Based Framework(DaC-GANSAEBF),an innovative deep-learning model designed to identify spam emails.This framework incorporates cutting-edge technologies,such as Generative Adversarial Networks(GAN),Squeeze and Excitation(SAE)modules,and a newly formulated Light Dual Attention(LDA)mechanism,which effectively utilizes both global and local attention to discern intricate patterns within textual data.This approach significantly improves efficiency and accuracy by segmenting scanned email content into smaller,independently evaluated components.The model underwent training and validation using four publicly available benchmark datasets,achieving an impressive average accuracy of 98.87%,outperforming leading methods in the field.These findings underscore the resilience and scalability of DaC-GANSAEBF,positioning it as a viable solution for contemporary spam detection systems.The framework can be easily integrated into existing technologies to enhance user security and reduce the risks associated with spam.展开更多
Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection ...Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection perspectives,our approach works in three settings:review-centric(spam detection),reviewer-centric(spammer detection)and product-centric(spam-targeted product detection).Besides this,to negate any kind of classifier-bias,we employ four classifiers to get a better and unbiased reflection of the obtained results.In addition,we have proposed a new set of features which are compared against some well-known related works.The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.Findings:Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings.In addition,models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual,further establishing the superiority of behavioral features as dominating indicators of opinion spam.The features used in this work provide improvement over existing features utilized in other related works.Furthermore,the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.Research limitations:The analyses conducted in this paper are solely limited to two wellknown datasets,viz.,Yelp Zip and Yelp NYC of Yelp.com.Practical implications:The results obtained in this paper can be used to improve the detection of opinion spam,wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.Originality/value:To the best of our knowledge,this study is the first of its kind which considers three perspectives(review,reviewer and product-centric)and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features.This study also introduces some novel features,which help to improve the performance of opinion spam detection methods.展开更多
Web spamming是指故意误导搜索引擎的行为,它使得一些页面的排序值比它的应有值更高。最近几年,随着webspam的急剧增加,使得搜索引擎的搜索结果也降低了一些等级。文章首先讨论了Spam的基本概念和影响,然后详细地分析了当前的各种Spamm...Web spamming是指故意误导搜索引擎的行为,它使得一些页面的排序值比它的应有值更高。最近几年,随着webspam的急剧增加,使得搜索引擎的搜索结果也降低了一些等级。文章首先讨论了Spam的基本概念和影响,然后详细地分析了当前的各种Spamming技术,包括termspaming、link spamming和隐藏技术三种类型。我们相信本文的分析对于开发恰当的反措施是非常有用的。展开更多
文摘Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.
基金funded by the Deanship of Scientific Research(DSR)at King Abdulaziz University,Jeddah,Saudi Arabia under Grant No.(GPIP:71-829-2024).
文摘Email communication plays a crucial role in both personal and professional contexts;however,it is frequently compromised by the ongoing challenge of spam,which detracts from productivity and introduces considerable security risks.Current spam detection techniques often struggle to keep pace with the evolving tactics employed by spammers,resulting in user dissatisfaction and potential data breaches.To address this issue,we introduce the Divide and Conquer-Generative Adversarial Network Squeeze and Excitation-Based Framework(DaC-GANSAEBF),an innovative deep-learning model designed to identify spam emails.This framework incorporates cutting-edge technologies,such as Generative Adversarial Networks(GAN),Squeeze and Excitation(SAE)modules,and a newly formulated Light Dual Attention(LDA)mechanism,which effectively utilizes both global and local attention to discern intricate patterns within textual data.This approach significantly improves efficiency and accuracy by segmenting scanned email content into smaller,independently evaluated components.The model underwent training and validation using four publicly available benchmark datasets,achieving an impressive average accuracy of 98.87%,outperforming leading methods in the field.These findings underscore the resilience and scalability of DaC-GANSAEBF,positioning it as a viable solution for contemporary spam detection systems.The framework can be easily integrated into existing technologies to enhance user security and reduce the risks associated with spam.
文摘Purpose:This paper aims to analyze the effectiveness of two major types of features—metadata-based(behavioral)and content-based(textual)—in opinion spam detection.Design/methodology/approach:Based on spam-detection perspectives,our approach works in three settings:review-centric(spam detection),reviewer-centric(spammer detection)and product-centric(spam-targeted product detection).Besides this,to negate any kind of classifier-bias,we employ four classifiers to get a better and unbiased reflection of the obtained results.In addition,we have proposed a new set of features which are compared against some well-known related works.The experiments performed on two real-world datasets show the effectiveness of different features in opinion spam detection.Findings:Our findings indicate that behavioral features are more efficient as well as effective than the textual to detect opinion spam across all three settings.In addition,models trained on hybrid features produce results quite similar to those trained on behavioral features than on the textual,further establishing the superiority of behavioral features as dominating indicators of opinion spam.The features used in this work provide improvement over existing features utilized in other related works.Furthermore,the computation time analysis for feature extraction phase shows the better cost efficiency of behavioral features over the textual.Research limitations:The analyses conducted in this paper are solely limited to two wellknown datasets,viz.,Yelp Zip and Yelp NYC of Yelp.com.Practical implications:The results obtained in this paper can be used to improve the detection of opinion spam,wherein the researchers may work on improving and developing feature engineering and selection techniques focused more on metadata information.Originality/value:To the best of our knowledge,this study is the first of its kind which considers three perspectives(review,reviewer and product-centric)and four classifiers to analyze the effectiveness of opinion spam detection using two major types of features.This study also introduces some novel features,which help to improve the performance of opinion spam detection methods.