Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,...Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.展开更多
Detecting malicious Uniform Resource Locators(URLs)is crucially important to prevent attackers from committing cybercrimes.Recent researches have investigated the role of machine learning(ML)models to detect malicious...Detecting malicious Uniform Resource Locators(URLs)is crucially important to prevent attackers from committing cybercrimes.Recent researches have investigated the role of machine learning(ML)models to detect malicious URLs.By using ML algorithms,rst,the features of URLs are extracted,and then different ML models are trained.The limitation of this approach is that it requires manual feature engineering and it does not consider the sequential patterns in the URL.Therefore,deep learning(DL)models are used to solve these issues since they are able to perform featureless detection.Furthermore,DL models give better accuracy and generalization to newly designed URLs;however,the results of our study show that these models,such as any other DL models,can be susceptible to adversarial attacks.In this paper,we examine the robustness of these models and demonstrate the importance of considering this susceptibility before applying such detection systems in real-world solutions.We propose and demonstrate a black-box attack based on scoring functions with greedy search for the minimum number of perturbations leading to a misclassication.The attack is examined against different types of convolutional neural networks(CNN)-based URL classiers and it causes a tangible decrease in the accuracy with more than 56%reduction in the accuracy of the best classier(among the selected classiers for this work).Moreover,adversarial training shows promising results in reducing the inuence of the attack on the robustness of the model to less than 7%on average.展开更多
A URL(Uniform Resource Locator)is used to locate a digital resource.With this URL,an attacker can perform a variety of attacks,which can lead to serious consequences for both individuals and organizations.Therefore,at...A URL(Uniform Resource Locator)is used to locate a digital resource.With this URL,an attacker can perform a variety of attacks,which can lead to serious consequences for both individuals and organizations.Therefore,attackers create malicious URLs to gain access to an organization’s systems or sensitive information.It is crucial to secure individuals and organizations against these malicious URLs.A combination of machine learning and deep learning was used to predict malicious URLs.This research contributes significantly to the field of cybersecurity by proposing a model that seamlessly integrates the accuracy of machine learning with the swiftness of deep learning.The strategic fusion of Random Forest(RF) and Multilayer Perceptron(MLP)with an accuracy of 81% represents a noteworthy advancement,offering a balanced solution for robust cybersecurity.This study found that by combining RF and MLP,an efficient model was developed with an accuracy of 81%and a training time of 33.78 s.展开更多
The sudden growth of harmful web pages,including spam and phishing URLs,poses a greater threat to global cybersecurity than ever before.These URLs are commonly utilised to trick people into divulging confidential deta...The sudden growth of harmful web pages,including spam and phishing URLs,poses a greater threat to global cybersecurity than ever before.These URLs are commonly utilised to trick people into divulging confidential details or to stealthily deploy malware.To address this issue,we aimed to assess the efficiency of popular machine learning and neural network models in identifying such harmful links.To serve our research needs,we employed two different datasets:the PhiUSIIL dataset,which is specifically designed to address phishing URL detection,and another dataset developed to uncover spam links by examining the wording and structure of every URL.Our strategy was to train and evaluate four classificationmodels,namely RandomForest,SupportVectorMachine(SVM),Naive Bayes,and Artificial Neural Networks(ANN),under two different feature engineering approaches:statistical text-based analysis and heuristic-based structural features.The results are in,and they are stunning:Random Forest and ANN models were always the best.During our research,we achieved some outstanding results.On the PhiUSIIL phishing dataset,the model achieved an accuracy of 99.99%,and on the spam dataset,it attained an accuracy of 99.62%.Studies surpass any previously reported findings,firmly establishing the efficacy of machine learning and neural networks in detecting malicious URLs.Not only does this work reinforce the superiority of these in-demand models,but it also sets a high bar for subsequent research and development in the field.In general,this provides the direction for building smarter,faster,and more precise tools that can spot online threats as they develop.展开更多
基金financially supported by the Deanship of Scientific Research and Graduate Studies at King Khalid University under research grant number(R.G.P.2/21/46)in part by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia,under Grant KFU253116.
文摘Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.
基金supported by Korea Electric Power Corporation(Grant Number:R18XA02).
文摘Detecting malicious Uniform Resource Locators(URLs)is crucially important to prevent attackers from committing cybercrimes.Recent researches have investigated the role of machine learning(ML)models to detect malicious URLs.By using ML algorithms,rst,the features of URLs are extracted,and then different ML models are trained.The limitation of this approach is that it requires manual feature engineering and it does not consider the sequential patterns in the URL.Therefore,deep learning(DL)models are used to solve these issues since they are able to perform featureless detection.Furthermore,DL models give better accuracy and generalization to newly designed URLs;however,the results of our study show that these models,such as any other DL models,can be susceptible to adversarial attacks.In this paper,we examine the robustness of these models and demonstrate the importance of considering this susceptibility before applying such detection systems in real-world solutions.We propose and demonstrate a black-box attack based on scoring functions with greedy search for the minimum number of perturbations leading to a misclassication.The attack is examined against different types of convolutional neural networks(CNN)-based URL classiers and it causes a tangible decrease in the accuracy with more than 56%reduction in the accuracy of the best classier(among the selected classiers for this work).Moreover,adversarial training shows promising results in reducing the inuence of the attack on the robustness of the model to less than 7%on average.
文摘A URL(Uniform Resource Locator)is used to locate a digital resource.With this URL,an attacker can perform a variety of attacks,which can lead to serious consequences for both individuals and organizations.Therefore,attackers create malicious URLs to gain access to an organization’s systems or sensitive information.It is crucial to secure individuals and organizations against these malicious URLs.A combination of machine learning and deep learning was used to predict malicious URLs.This research contributes significantly to the field of cybersecurity by proposing a model that seamlessly integrates the accuracy of machine learning with the swiftness of deep learning.The strategic fusion of Random Forest(RF) and Multilayer Perceptron(MLP)with an accuracy of 81% represents a noteworthy advancement,offering a balanced solution for robust cybersecurity.This study found that by combining RF and MLP,an efficient model was developed with an accuracy of 81%and a training time of 33.78 s.
文摘The sudden growth of harmful web pages,including spam and phishing URLs,poses a greater threat to global cybersecurity than ever before.These URLs are commonly utilised to trick people into divulging confidential details or to stealthily deploy malware.To address this issue,we aimed to assess the efficiency of popular machine learning and neural network models in identifying such harmful links.To serve our research needs,we employed two different datasets:the PhiUSIIL dataset,which is specifically designed to address phishing URL detection,and another dataset developed to uncover spam links by examining the wording and structure of every URL.Our strategy was to train and evaluate four classificationmodels,namely RandomForest,SupportVectorMachine(SVM),Naive Bayes,and Artificial Neural Networks(ANN),under two different feature engineering approaches:statistical text-based analysis and heuristic-based structural features.The results are in,and they are stunning:Random Forest and ANN models were always the best.During our research,we achieved some outstanding results.On the PhiUSIIL phishing dataset,the model achieved an accuracy of 99.99%,and on the spam dataset,it attained an accuracy of 99.62%.Studies surpass any previously reported findings,firmly establishing the efficacy of machine learning and neural networks in detecting malicious URLs.Not only does this work reinforce the superiority of these in-demand models,but it also sets a high bar for subsequent research and development in the field.In general,this provides the direction for building smarter,faster,and more precise tools that can spot online threats as they develop.