The sudden growth of harmful web pages,including spam and phishing URLs,poses a greater threat to global cybersecurity than ever before.These URLs are commonly utilised to trick people into divulging confidential deta...The sudden growth of harmful web pages,including spam and phishing URLs,poses a greater threat to global cybersecurity than ever before.These URLs are commonly utilised to trick people into divulging confidential details or to stealthily deploy malware.To address this issue,we aimed to assess the efficiency of popular machine learning and neural network models in identifying such harmful links.To serve our research needs,we employed two different datasets:the PhiUSIIL dataset,which is specifically designed to address phishing URL detection,and another dataset developed to uncover spam links by examining the wording and structure of every URL.Our strategy was to train and evaluate four classificationmodels,namely RandomForest,SupportVectorMachine(SVM),Naive Bayes,and Artificial Neural Networks(ANN),under two different feature engineering approaches:statistical text-based analysis and heuristic-based structural features.The results are in,and they are stunning:Random Forest and ANN models were always the best.During our research,we achieved some outstanding results.On the PhiUSIIL phishing dataset,the model achieved an accuracy of 99.99%,and on the spam dataset,it attained an accuracy of 99.62%.Studies surpass any previously reported findings,firmly establishing the efficacy of machine learning and neural networks in detecting malicious URLs.Not only does this work reinforce the superiority of these in-demand models,but it also sets a high bar for subsequent research and development in the field.In general,this provides the direction for building smarter,faster,and more precise tools that can spot online threats as they develop.展开更多
Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,...Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.展开更多
文摘The sudden growth of harmful web pages,including spam and phishing URLs,poses a greater threat to global cybersecurity than ever before.These URLs are commonly utilised to trick people into divulging confidential details or to stealthily deploy malware.To address this issue,we aimed to assess the efficiency of popular machine learning and neural network models in identifying such harmful links.To serve our research needs,we employed two different datasets:the PhiUSIIL dataset,which is specifically designed to address phishing URL detection,and another dataset developed to uncover spam links by examining the wording and structure of every URL.Our strategy was to train and evaluate four classificationmodels,namely RandomForest,SupportVectorMachine(SVM),Naive Bayes,and Artificial Neural Networks(ANN),under two different feature engineering approaches:statistical text-based analysis and heuristic-based structural features.The results are in,and they are stunning:Random Forest and ANN models were always the best.During our research,we achieved some outstanding results.On the PhiUSIIL phishing dataset,the model achieved an accuracy of 99.99%,and on the spam dataset,it attained an accuracy of 99.62%.Studies surpass any previously reported findings,firmly establishing the efficacy of machine learning and neural networks in detecting malicious URLs.Not only does this work reinforce the superiority of these in-demand models,but it also sets a high bar for subsequent research and development in the field.In general,this provides the direction for building smarter,faster,and more precise tools that can spot online threats as they develop.
基金financially supported by the Deanship of Scientific Research and Graduate Studies at King Khalid University under research grant number(R.G.P.2/21/46)in part by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia,under Grant KFU253116.
文摘Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.