Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class at...Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class attacks,this study proposes an intrusion detection method based on a two-layer structure.The first layer employs a CNN-BiLSTM model incorporating an attention mechanism to classify network traffic into normal traffic,majority class attacks,and merged minority class attacks.The second layer further segments the minority class attacks through Stacking ensemble learning.The datasets are selected from the generic network dataset CIC-IDS2017,NSL-KDD,and the industrial network dataset Mississippi Gas Pipeline dataset to enhance the generalization and practical applicability of the model.Experimental results show that the proposed model achieves an overall detection accuracy of 99%,99%,and 95%on the CIC-IDS2017,NSL-KDD,and industrial network datasets,respectively.It also significantly outperforms traditional methods in terms of detection accuracy and recall rate for minority class attacks.Compared with the single-layer deep learning model,the two-layer structure effectively reduces the false alarm rate while improving the minority-class attack detection performance.The research in this paper not only improves the adaptability of NIDS to complex network environments but also provides a new solution for minority-class attack detection in industrial network security.展开更多
Existing web-based security applications have failed in many situations due to the great intelligence of attackers.Among web applications,Cross-Site Scripting(XSS)is one of the dangerous assaults experienced while mod...Existing web-based security applications have failed in many situations due to the great intelligence of attackers.Among web applications,Cross-Site Scripting(XSS)is one of the dangerous assaults experienced while modifying an organization's or user's information.To avoid these security challenges,this article proposes a novel,all-encompassing combination of machine learning(NB,SVM,k-NN)and deep learning(RNN,CNN,LSTM)frameworks for detecting and defending against XSS attacks with high accuracy and efficiency.Based on the representation,a novel idea for merging stacking ensemble with web applications,termed“hybrid stacking”,is proposed.In order to implement the aforementioned methods,four distinct datasets,each of which contains both safe and unsafe content,are considered.The hybrid detection method can adaptively identify the attacks from the URL,and the defense mechanism inherits the advantages of URL encoding with dictionary-based mapping to improve prediction accuracy,accelerate the training process,and effectively remove the unsafe JScript/JavaScript keywords from the URL.The simulation results show that the proposed hybrid model is more efficient than the existing detection methods.It produces more than 99.5%accurate XSS attack classification results(accuracy,precision,recall,f1_score,and Receiver Operating Characteristic(ROC))and is highly resistant to XSS attacks.In order to ensure the security of the server's information,the proposed hybrid approach is demonstrated in a real-time environment.展开更多
Flood susceptibility modeling is crucial for rapid flood forecasting, disaster reduction strategies, evacuation planning, and decision-making. Machine learning(ML) models have proven to be effective tools for assessin...Flood susceptibility modeling is crucial for rapid flood forecasting, disaster reduction strategies, evacuation planning, and decision-making. Machine learning(ML) models have proven to be effective tools for assessing flood susceptibility. However, most previous studies have focused on individual models or comparative performance, underscoring the unique strengths and weaknesses of each model. In this study, we propose a stacking ensemble learning algorithm that harnesses the strengths of a diverse range of machine learning models. The findings reveal the following:(1) The stacking ensemble learning, using RF-XGBCB-LR model, significantly enhances flood susceptibility simulation.(2) In addition to rainfall,key flood drivers in the study area include NDVI, and impervious surfaces. Over 40% of the study area, primarily in the northeast and southeast, exhibits high flood susceptibility, with higher risks for populations compared to cropland.(3) In the northeast of the study area,heavy precipitation, low terrain, and NDVI values are key indicators contributing to high flood susceptibility, while long-duration precipitation, mountainous topography, and upper reach vegetation are the main drivers in the southeast. This study underscores the effectiveness of ML, particularly ensemble learning, in flood modeling. It identifies vulnerable areas and contributes to improved flood risk management.展开更多
Healthcare networks prove to be an urgent issue in terms of intrusion detection due to the critical consequences of cyber threats and the extreme sensitivity of medical information.The proposed Auto-Stack ID in the st...Healthcare networks prove to be an urgent issue in terms of intrusion detection due to the critical consequences of cyber threats and the extreme sensitivity of medical information.The proposed Auto-Stack ID in the study is a stacked ensemble of encoder-enhanced auctions that can be used to improve intrusion detection in healthcare networks.TheWUSTL-EHMS 2020 dataset trains and evaluates themodel,constituting an imbalanced class distribution(87.46% normal traffic and 12.53% intrusion attacks).To address this imbalance,the study balances the effect of training Bias through Stratified K-fold cross-validation(K=5),so that each class is represented similarly on training and validation splits.Second,the Auto-Stack ID method combines many base classifiers such as TabNet,LightGBM,Gaussian Naive Bayes,Histogram-Based Gradient Boosting(HGB),and Logistic Regression.We apply a two-stage training process based on the first stage,where we have base classifiers that predict out-of-fold(OOF)predictions,which we use as inputs for the second-stage meta-learner XGBoost.The meta-learner learns to refine predictions to capture complicated interactions between base models,thus improving detection accuracy without introducing bias,overfitting,or requiring domain knowledge of the meta-data.In addition,the auto-stack ID model got 98.41% accuracy and 93.45%F1 score,better than individual classifiers.It can identify intrusions due to its 90.55% recall and 96.53% precision with minimal false positives.These findings identify its suitability in ensuring healthcare networks’security through ensemble learning.Ongoing efforts will be deployed in real time to improve response to evolving threats.展开更多
Slope failures lead to catastrophic consequences in numerous countries and thus the stability assessment for slopes is of high interest in geotechnical and geological engineering researches.A hybrid stacking ensemble ...Slope failures lead to catastrophic consequences in numerous countries and thus the stability assessment for slopes is of high interest in geotechnical and geological engineering researches.A hybrid stacking ensemble approach is proposed in this study for enhancing the prediction of slope stability.In the hybrid stacking ensemble approach,we used an artificial bee colony(ABC)algorithm to find out the best combination of base classifiers(level 0)and determined a suitable meta-classifier(level 1)from a pool of 11 individual optimized machine learning(OML)algorithms.Finite element analysis(FEA)was conducted in order to form the synthetic database for the training stage(150 cases)of the proposed model while 107 real field slope cases were used for the testing stage.The results by the hybrid stacking ensemble approach were then compared with that obtained by the 11 individual OML methods using confusion matrix,F1-score,and area under the curve,i.e.AUC-score.The comparisons showed that a significant improvement in the prediction ability of slope stability has been achieved by the hybrid stacking ensemble(AUC?90.4%),which is 7%higher than the best of the 11 individual OML methods(AUC?82.9%).Then,a further comparison was undertaken between the hybrid stacking ensemble method and basic ensemble classifier on slope stability prediction.The results showed a prominent performance of the hybrid stacking ensemble method over the basic ensemble method.Finally,the importance of the variables for slope stability was studied using linear vector quantization(LVQ)method.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
Numerical simulation of concrete-faced rockfill dams(CFRDs)considering the spatial variability of rockfill has become a popular research topic in recent years.In order to determine uncertain rockfill properties effici...Numerical simulation of concrete-faced rockfill dams(CFRDs)considering the spatial variability of rockfill has become a popular research topic in recent years.In order to determine uncertain rockfill properties efficiently and reliably,this study developed an uncertainty inversion analysis method for rockfill material parameters using the stacking ensemble strategy and Jaya optimizer.The comprehensive implementation process of the proposed model was described with an illustrative CFRD example.First,the surrogate model method using the stacking ensemble algorithm was used to conduct the Monte Carlo stochastic finite element calculations with reduced computational cost and improved accuracy.Afterwards,the Jaya algorithm was used to inversely calculate the combination of the coefficient of variation of rockfill material parameters.This optimizer obtained higher accuracy and more significant uncertainty reduction than traditional optimizers.Overall,the developed model effectively identified the random parameters of rockfill materials.This study provided scientific references for uncertainty analysis of CFRDs.In addition,the proposed method can be applied to other similar engineering structures.展开更多
As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single mac...As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.展开更多
Recently,machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductormanufacturing.The existing approaches used in the wafer map pattern clas...Recently,machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductormanufacturing.The existing approaches used in the wafer map pattern classification include directly learning the image through a convolution neural network and applying the ensemble method after extracting image features.This study aims to classify wafer map defects more effectively and derive robust algorithms even for datasets with insufficient defect patterns.First,the number of defects during the actual process may be limited.Therefore,insufficient data are generated using convolutional auto-encoder(CAE),and the expanded data are verified using the evaluation technique of structural similarity index measure(SSIM).After extracting handcrafted features,a boosted stacking ensemble model that integrates the four base-level classifiers with the extreme gradient boosting classifier as a meta-level classifier is designed and built for training the model based on the expanded data for final prediction.Since the proposed algorithm shows better performance than those of existing ensemble classifiers even for insufficient defect patterns,the results of this study will contribute to improving the product quality and yield of the actual semiconductor manufacturing process.展开更多
Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with ...Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with the evolving sophistication of cyber threats.This paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack detection.Our approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in performance.It demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant loads.Furthermore,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time applications.Designed for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current defenses.This innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.展开更多
Surveillance cameras have been widely used for monitoring in both private and public sectors as a security measure.Close Circuits Television(CCTV)Cameras are used to surveillance and monitor the normal and anomalous i...Surveillance cameras have been widely used for monitoring in both private and public sectors as a security measure.Close Circuits Television(CCTV)Cameras are used to surveillance and monitor the normal and anomalous incidents.Real-world anomaly detection is a significant challenge due to its complex and diverse nature.It is difficult to manually analyze because vast amounts of video data have been generated through surveillance systems,and the need for automated techniques has been raised to enhance detection accuracy.This paper proposes a novel deep-stacked ensemble model integrated with a data augmentation approach called Stack Ensemble Road Anomaly Detection(SERAD).SERAD is used to detect and classify the four most happening road anomalies,such as accidents,car fires,fighting,and snatching,through road surveillance videos with high accuracy.The SERAD adapted three pre-trained Convolutional Neural Networks(CNNs)models,namely VGG19,ResNet50 and InceptionV3.The stacking technique is employed to incorporate these three models,resulting in much-improved accuracy for classifying road abnormalities compared to individual models.Additionally,it presented a custom real-world Road Anomaly Dataset(RAD)comprising a comprehensive collection of road images and videos.The experimental results demonstrate the strength and reliability of the proposed SERAD model,achieving an impressive classification accuracy of 98.7%.The results indicate that the proposed SERAD model outperforms than the individual CNN base models.展开更多
BACKGROUND There is a lack of literature discussing the utilization of the stacking ensemble algorithm for predicting depression in patients with heart failure(HF).AIM To create a stacking model for predicting depress...BACKGROUND There is a lack of literature discussing the utilization of the stacking ensemble algorithm for predicting depression in patients with heart failure(HF).AIM To create a stacking model for predicting depression in patients with HF.METHODS This study analyzed data on 1084 HF patients from the National Health and Nutrition Examination Survey database spanning from 2005 to 2018.Through univariate analysis and the use of an artificial neural network algorithm,predictors significantly linked to depression were identified.These predictors were utilized to create a stacking model employing tree-based learners.The performances of both the individual models and the stacking model were assessed by using the test dataset.Furthermore,the SHapley additive exPlanations(SHAP)model was applied to interpret the stacking model.RESULTS The models included five predictors.Among these models,the stacking model demonstrated the highest performance,achieving an area under the curve of 0.77(95%CI:0.71-0.84),a sensitivity of 0.71,and a specificity of 0.68.The calibration curve supported the reliability of the models,and decision curve analysis confirmed their clinical value.The SHAP plot demonstrated that age had the most significant impact on the stacking model's output.CONCLUSION The stacking model demonstrated strong predictive performance.Clinicians can utilize this model to identify highrisk depression patients with HF,thus enabling early provision of psychological interventions.展开更多
Tailings produced by mining and ore smelting are a major source of soil pollution.Understanding the speciation of heavy metals(HMs)in tailings is essential for soil remediation and sustainable development.Given the co...Tailings produced by mining and ore smelting are a major source of soil pollution.Understanding the speciation of heavy metals(HMs)in tailings is essential for soil remediation and sustainable development.Given the complex and time-consuming nature of traditional sequential laboratory extraction methods for determining the forms of HMs in tailings,a rapid and precise identification approach is urgently required.To address this issue,a general empirical prediction method for HM occurrence was developed using machine learning(ML).The compositional information of the tailings,properties of the HMs,and sequential extraction steps were used as inputs to calculate the percentages of the seven forms of HMs.After the models were tuned and compared,extreme gradient boosting,gradient boosting decision tree,and categorical boosting methods were found to be the top three performing ML models,with the coefficient of determination(R^(2))values on the testing set exceeding 0.859.Feature importance analysis for these three optimal models indicated that electronegativity was the most important factor affecting the occurrence of HMs,with an average feature importance of 0.4522.The subsequent use of stacking as a model integration method enabled the ability of the ML models to predict HM occurrence forms to be further improved,and resulting in an increase of R^(2) to 0.879.Overall,this study developed a robust technique for predicting the occurrence forms in tailings and provides an important reference for the environmental assessment and recycling of tailings.展开更多
This study employs a stacking ensemble learning framework to establish a regression model for predicting the tribological properties of amide-based lubricating grease and determining the optimal additive ratios.Melami...This study employs a stacking ensemble learning framework to establish a regression model for predicting the tribological properties of amide-based lubricating grease and determining the optimal additive ratios.Melamine cyanuric acid(MCA)was selected as the thickener,and three extreme-pressure anti-wear additives were used to prepare the lubricating grease.The tribological performance was tested using an MFT-R4000 reciprocating friction and wear machine.Based on the tribological experimental data,the synthetic minority oversampling technique(SMOTE)was utilized for data augmentation,and a stacking ensemble algorithm with Bayesian optimization of hyperparameters was used to construct a predictive model for tribological performance.Subsequently,within this model framework,single and multi-objective optimization models were developed,and the fruit fly algorithm was employed to find the optimal additive combination ratios,which were experimentally validated.The results demonstrated that the learning framework based on the stacking ensemble model could effectively predict the tribological properties of amide-based lubricating grease in small sample datasets,with the R2 for the average friction coefficient prediction reaching 0.9939 and for the wear scar width prediction reaching 0.9535.In the experimental validation of the optimal additive ratios,the relative error of the friction coefficient ratio scheme was 0.51%,and the relative error of the wear scar width was 1.10%.This finding suggests that the learning framework provides a novel approach for predicting the performance of amide-based lubricating grease and studying additive combinations.展开更多
An anomaly-based intrusion detection system(A-IDS)provides a critical aspect in a modern computing infrastructure since new types of attacks can be discovered.It prevalently utilizes several machine learning algorithm...An anomaly-based intrusion detection system(A-IDS)provides a critical aspect in a modern computing infrastructure since new types of attacks can be discovered.It prevalently utilizes several machine learning algorithms(ML)for detecting and classifying network traffic.To date,lots of algorithms have been proposed to improve the detection performance of A-IDS,either using individual or ensemble learners.In particular,ensemble learners have shown remarkable performance over individual learners in many applications,including in cybersecurity domain.However,most existing works still suffer from unsatisfactory results due to improper ensemble design.The aim of this study is to emphasize the effectiveness of stacking ensemble-based model for A-IDS,where deep learning(e.g.,deep neural network[DNN])is used as base learner model.The effectiveness of the proposed model and base DNN model are benchmarked empirically in terms of several performance metrics,i.e.,Matthew’s correlation coefficient,accuracy,and false alarm rate.The results indicate that the proposed model is superior to the base DNN model as well as other existing ML algorithms found in the literature.展开更多
The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge...The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.展开更多
Malicious traffic detection over the internet is one of the challenging areas for researchers to protect network infrastructures from any malicious activity.Several shortcomings of a network system can be leveraged by...Malicious traffic detection over the internet is one of the challenging areas for researchers to protect network infrastructures from any malicious activity.Several shortcomings of a network system can be leveraged by an attacker to get unauthorized access through malicious traffic.Safeguard from such attacks requires an efficient automatic system that can detect malicious traffic timely and avoid system damage.Currently,many automated systems can detect malicious activity,however,the efficacy and accuracy need further improvement to detect malicious traffic from multi-domain systems.The present study focuses on the detection of malicious traffic with high accuracy using machine learning techniques.The proposed approach used two datasets UNSW-NB15 and IoTID20 which contain the data for IoT-based traffic and local network traffic,respectively.Both datasets were combined to increase the capability of the proposed approach in detecting malicious traffic from local and IoT networks,with high accuracy.Horizontally merging both datasets requires an equal number of features which was achieved by reducing feature count to 30 for each dataset by leveraging principal component analysis(PCA).The proposed model incorporates stacked ensemble model extra boosting forest(EBF)which is a combination of tree-based models such as extra tree classifier,gradient boosting classifier,and random forest using a stacked ensemble approach.Empirical results show that EBF performed significantly better and achieved the highest accuracy score of 0.985 and 0.984 on the multi-domain dataset for two and four classes,respectively.展开更多
A common difficulty in building prediction models with real-world environmental datasets is the skewed distribution of classes.There are significantly more samples for day-to-day classes,while rare events such as poll...A common difficulty in building prediction models with real-world environmental datasets is the skewed distribution of classes.There are significantly more samples for day-to-day classes,while rare events such as polluted classes are uncommon.Consequently,the limited availability of minority outcomes lowers the classifier’s overall reliability.This study assesses the capability of machine learning(ML)algorithms in tackling imbalanced water quality data based on the metrics of precision,recall,and F1 score.It intends to balance the misled accuracy towards the majority of data.Hence,10 ML algorithms of its performance are compared.The classifiers included are AdaBoost,SupportVector Machine,Linear Discriminant Analysis,k-Nearest Neighbors,Naive Bayes,Decision Trees,Random Forest,Extra Trees,Bagging,and the Multilayer Perceptron.This study also uses the Easy Ensemble Classifier,Balanced Bagging,andRUSBoost algorithm to evaluatemulti-class imbalanced learning methods.The comparison results revealed that a highaccuracy machine learning model is not always good in recall and sensitivity.This paper’s stacked ensemble deep learning(SE-DL)generalization model effectively classifies the water quality index(WQI)based on 23 input variables.The proposed algorithm achieved a remarkable average of 95.69%,94.96%,92.92%,and 93.88%for accuracy,precision,recall,and F1 score,respectively.In addition,the proposed model is compared against two state-of-the-art classifiers,the XGBoost(eXtreme Gradient Boosting)and Light Gradient Boosting Machine,where performance metrics of balanced accuracy and g-mean are included.The experimental setup concluded XGBoost with a higher balanced accuracy and G-mean.However,the SE-DL model has a better and more balanced performance in the F1 score.The SE-DL model aligns with the goal of this study to ensure the balance between accuracy and completeness for each water quality class.The proposed algorithm is also capable of higher efficiency at a lower computational time against using the standard SyntheticMinority Oversampling Technique(SMOTE)approach to imbalanced datasets.展开更多
COVID-19 is a growing problem worldwide with a high mortality rate.As a result,the World Health Organization(WHO)declared it a pandemic.In order to limit the spread of the disease,a fast and accurate diagnosis is requ...COVID-19 is a growing problem worldwide with a high mortality rate.As a result,the World Health Organization(WHO)declared it a pandemic.In order to limit the spread of the disease,a fast and accurate diagnosis is required.A reverse transcript polymerase chain reaction(RT-PCR)test is often used to detect the disease.However,since this test is time-consuming,a chest computed tomography(CT)or plain chest X-ray(CXR)is sometimes indicated.The value of automated diagnosis is that it saves time and money by minimizing human effort.Three significant contributions are made by our research.Its initial purpose is to use the essential finetuning methodology to test the action and efficiency of a variety of vision models,ranging from Inception to Neural Architecture Search(NAS)networks.Second,by plotting class activationmaps(CAMs)for individual networks and assessing classification efficiency with AUC-ROC curves,the behavior of these models is visually analyzed.Finally,stacked ensembles techniques were used to provide greater generalization by combining finetuned models with six ensemble neural networks.Using stacked ensembles,the generalization of the models improved.Furthermore,the ensemble model created by combining all of the finetuned networks obtained a state-of-the-art COVID-19 accuracy detection score of 99.17%.The precision and recall rates were 99.99%and 89.79%,respectively,highlighting the robustness of stacked ensembles.The proposed ensemble approach performed well in the classification of the COVID-19 lesions on CXR according to the experimental results.展开更多
Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,...Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.展开更多
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)—Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government(MSIT)(IITP-2025-RS-2022-00156334)in part by Liaoning Province Nature Fund Project(2024-BSLH-214).
文摘Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class attacks,this study proposes an intrusion detection method based on a two-layer structure.The first layer employs a CNN-BiLSTM model incorporating an attention mechanism to classify network traffic into normal traffic,majority class attacks,and merged minority class attacks.The second layer further segments the minority class attacks through Stacking ensemble learning.The datasets are selected from the generic network dataset CIC-IDS2017,NSL-KDD,and the industrial network dataset Mississippi Gas Pipeline dataset to enhance the generalization and practical applicability of the model.Experimental results show that the proposed model achieves an overall detection accuracy of 99%,99%,and 95%on the CIC-IDS2017,NSL-KDD,and industrial network datasets,respectively.It also significantly outperforms traditional methods in terms of detection accuracy and recall rate for minority class attacks.Compared with the single-layer deep learning model,the two-layer structure effectively reduces the false alarm rate while improving the minority-class attack detection performance.The research in this paper not only improves the adaptability of NIDS to complex network environments but also provides a new solution for minority-class attack detection in industrial network security.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MEST)No.2015R1A3A2031159,2016R1A5A1008055.
文摘Existing web-based security applications have failed in many situations due to the great intelligence of attackers.Among web applications,Cross-Site Scripting(XSS)is one of the dangerous assaults experienced while modifying an organization's or user's information.To avoid these security challenges,this article proposes a novel,all-encompassing combination of machine learning(NB,SVM,k-NN)and deep learning(RNN,CNN,LSTM)frameworks for detecting and defending against XSS attacks with high accuracy and efficiency.Based on the representation,a novel idea for merging stacking ensemble with web applications,termed“hybrid stacking”,is proposed.In order to implement the aforementioned methods,four distinct datasets,each of which contains both safe and unsafe content,are considered.The hybrid detection method can adaptively identify the attacks from the URL,and the defense mechanism inherits the advantages of URL encoding with dictionary-based mapping to improve prediction accuracy,accelerate the training process,and effectively remove the unsafe JScript/JavaScript keywords from the URL.The simulation results show that the proposed hybrid model is more efficient than the existing detection methods.It produces more than 99.5%accurate XSS attack classification results(accuracy,precision,recall,f1_score,and Receiver Operating Characteristic(ROC))and is highly resistant to XSS attacks.In order to ensure the security of the server's information,the proposed hybrid approach is demonstrated in a real-time environment.
基金National Natural Science Foundation of China,No.42271037Key Research and Development Program Project of Anhui Province,No.2022m07020011+1 种基金The University Synergy Innovation Program of Anhui Province,No.GXXT-2021-048Science Foundation for Excellent Young Scholars of Anhui,No.2108085Y13。
文摘Flood susceptibility modeling is crucial for rapid flood forecasting, disaster reduction strategies, evacuation planning, and decision-making. Machine learning(ML) models have proven to be effective tools for assessing flood susceptibility. However, most previous studies have focused on individual models or comparative performance, underscoring the unique strengths and weaknesses of each model. In this study, we propose a stacking ensemble learning algorithm that harnesses the strengths of a diverse range of machine learning models. The findings reveal the following:(1) The stacking ensemble learning, using RF-XGBCB-LR model, significantly enhances flood susceptibility simulation.(2) In addition to rainfall,key flood drivers in the study area include NDVI, and impervious surfaces. Over 40% of the study area, primarily in the northeast and southeast, exhibits high flood susceptibility, with higher risks for populations compared to cropland.(3) In the northeast of the study area,heavy precipitation, low terrain, and NDVI values are key indicators contributing to high flood susceptibility, while long-duration precipitation, mountainous topography, and upper reach vegetation are the main drivers in the southeast. This study underscores the effectiveness of ML, particularly ensemble learning, in flood modeling. It identifies vulnerable areas and contributes to improved flood risk management.
基金funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2025R319),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia and Prince Sultan University for covering the article processing charges(APC)associated with this publicationResearchers Supporting Project Number(RSPD2025R1107),King Saud University,Riyadh,Saudi Arabia.
文摘Healthcare networks prove to be an urgent issue in terms of intrusion detection due to the critical consequences of cyber threats and the extreme sensitivity of medical information.The proposed Auto-Stack ID in the study is a stacked ensemble of encoder-enhanced auctions that can be used to improve intrusion detection in healthcare networks.TheWUSTL-EHMS 2020 dataset trains and evaluates themodel,constituting an imbalanced class distribution(87.46% normal traffic and 12.53% intrusion attacks).To address this imbalance,the study balances the effect of training Bias through Stratified K-fold cross-validation(K=5),so that each class is represented similarly on training and validation splits.Second,the Auto-Stack ID method combines many base classifiers such as TabNet,LightGBM,Gaussian Naive Bayes,Histogram-Based Gradient Boosting(HGB),and Logistic Regression.We apply a two-stage training process based on the first stage,where we have base classifiers that predict out-of-fold(OOF)predictions,which we use as inputs for the second-stage meta-learner XGBoost.The meta-learner learns to refine predictions to capture complicated interactions between base models,thus improving detection accuracy without introducing bias,overfitting,or requiring domain knowledge of the meta-data.In addition,the auto-stack ID model got 98.41% accuracy and 93.45%F1 score,better than individual classifiers.It can identify intrusions due to its 90.55% recall and 96.53% precision with minimal false positives.These findings identify its suitability in ensuring healthcare networks’security through ensemble learning.Ongoing efforts will be deployed in real time to improve response to evolving threats.
基金We acknowledge the funding support from Australia Research Council(Grant Nos.DP200100549 and IH180100010).
文摘Slope failures lead to catastrophic consequences in numerous countries and thus the stability assessment for slopes is of high interest in geotechnical and geological engineering researches.A hybrid stacking ensemble approach is proposed in this study for enhancing the prediction of slope stability.In the hybrid stacking ensemble approach,we used an artificial bee colony(ABC)algorithm to find out the best combination of base classifiers(level 0)and determined a suitable meta-classifier(level 1)from a pool of 11 individual optimized machine learning(OML)algorithms.Finite element analysis(FEA)was conducted in order to form the synthetic database for the training stage(150 cases)of the proposed model while 107 real field slope cases were used for the testing stage.The results by the hybrid stacking ensemble approach were then compared with that obtained by the 11 individual OML methods using confusion matrix,F1-score,and area under the curve,i.e.AUC-score.The comparisons showed that a significant improvement in the prediction ability of slope stability has been achieved by the hybrid stacking ensemble(AUC?90.4%),which is 7%higher than the best of the 11 individual OML methods(AUC?82.9%).Then,a further comparison was undertaken between the hybrid stacking ensemble method and basic ensemble classifier on slope stability prediction.The results showed a prominent performance of the hybrid stacking ensemble method over the basic ensemble method.Finally,the importance of the variables for slope stability was studied using linear vector quantization(LVQ)method.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.
基金supported by the National Natural Science Foundation of China(Grants No.51879185 and 52179139)the Open Fund of the Hubei Key Laboratory of Construction and Management in Hydropower Engineering(Grant No.2020KSD06).
文摘Numerical simulation of concrete-faced rockfill dams(CFRDs)considering the spatial variability of rockfill has become a popular research topic in recent years.In order to determine uncertain rockfill properties efficiently and reliably,this study developed an uncertainty inversion analysis method for rockfill material parameters using the stacking ensemble strategy and Jaya optimizer.The comprehensive implementation process of the proposed model was described with an illustrative CFRD example.First,the surrogate model method using the stacking ensemble algorithm was used to conduct the Monte Carlo stochastic finite element calculations with reduced computational cost and improved accuracy.Afterwards,the Jaya algorithm was used to inversely calculate the combination of the coefficient of variation of rockfill material parameters.This optimizer obtained higher accuracy and more significant uncertainty reduction than traditional optimizers.Overall,the developed model effectively identified the random parameters of rockfill materials.This study provided scientific references for uncertainty analysis of CFRDs.In addition,the proposed method can be applied to other similar engineering structures.
文摘As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.
基金the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2021R1A5A8033165)the“Human Resources Program in Energy Technology”of the Korea Institute of Energy Technology Evaluation and Planning(KETEP)and was granted financial resources from the Ministry of Trade,Industry&Energy,Republic of Korea(No.20214000000200).
文摘Recently,machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductormanufacturing.The existing approaches used in the wafer map pattern classification include directly learning the image through a convolution neural network and applying the ensemble method after extracting image features.This study aims to classify wafer map defects more effectively and derive robust algorithms even for datasets with insufficient defect patterns.First,the number of defects during the actual process may be limited.Therefore,insufficient data are generated using convolutional auto-encoder(CAE),and the expanded data are verified using the evaluation technique of structural similarity index measure(SSIM).After extracting handcrafted features,a boosted stacking ensemble model that integrates the four base-level classifiers with the extreme gradient boosting classifier as a meta-level classifier is designed and built for training the model based on the expanded data for final prediction.Since the proposed algorithm shows better performance than those of existing ensemble classifiers even for insufficient defect patterns,the results of this study will contribute to improving the product quality and yield of the actual semiconductor manufacturing process.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R513),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with the evolving sophistication of cyber threats.This paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack detection.Our approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in performance.It demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant loads.Furthermore,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time applications.Designed for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current defenses.This innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.
基金funded by the King Saud University,Riyadh,Saudi Arabia for funding this work through Researchers Supporting Project Number-RSPD2024R893.
文摘Surveillance cameras have been widely used for monitoring in both private and public sectors as a security measure.Close Circuits Television(CCTV)Cameras are used to surveillance and monitor the normal and anomalous incidents.Real-world anomaly detection is a significant challenge due to its complex and diverse nature.It is difficult to manually analyze because vast amounts of video data have been generated through surveillance systems,and the need for automated techniques has been raised to enhance detection accuracy.This paper proposes a novel deep-stacked ensemble model integrated with a data augmentation approach called Stack Ensemble Road Anomaly Detection(SERAD).SERAD is used to detect and classify the four most happening road anomalies,such as accidents,car fires,fighting,and snatching,through road surveillance videos with high accuracy.The SERAD adapted three pre-trained Convolutional Neural Networks(CNNs)models,namely VGG19,ResNet50 and InceptionV3.The stacking technique is employed to incorporate these three models,resulting in much-improved accuracy for classifying road abnormalities compared to individual models.Additionally,it presented a custom real-world Road Anomaly Dataset(RAD)comprising a comprehensive collection of road images and videos.The experimental results demonstrate the strength and reliability of the proposed SERAD model,achieving an impressive classification accuracy of 98.7%.The results indicate that the proposed SERAD model outperforms than the individual CNN base models.
文摘BACKGROUND There is a lack of literature discussing the utilization of the stacking ensemble algorithm for predicting depression in patients with heart failure(HF).AIM To create a stacking model for predicting depression in patients with HF.METHODS This study analyzed data on 1084 HF patients from the National Health and Nutrition Examination Survey database spanning from 2005 to 2018.Through univariate analysis and the use of an artificial neural network algorithm,predictors significantly linked to depression were identified.These predictors were utilized to create a stacking model employing tree-based learners.The performances of both the individual models and the stacking model were assessed by using the test dataset.Furthermore,the SHapley additive exPlanations(SHAP)model was applied to interpret the stacking model.RESULTS The models included five predictors.Among these models,the stacking model demonstrated the highest performance,achieving an area under the curve of 0.77(95%CI:0.71-0.84),a sensitivity of 0.71,and a specificity of 0.68.The calibration curve supported the reliability of the models,and decision curve analysis confirmed their clinical value.The SHAP plot demonstrated that age had the most significant impact on the stacking model's output.CONCLUSION The stacking model demonstrated strong predictive performance.Clinicians can utilize this model to identify highrisk depression patients with HF,thus enabling early provision of psychological interventions.
基金financially supported by the Natural Science Foundation of Hunan Province,China(No.2024JJ2074)the National Natural Science Foundation of China(No.22376221)the Young Elite Scientists Sponsorship Program by CAST,China(No.2023QNRC001).
文摘Tailings produced by mining and ore smelting are a major source of soil pollution.Understanding the speciation of heavy metals(HMs)in tailings is essential for soil remediation and sustainable development.Given the complex and time-consuming nature of traditional sequential laboratory extraction methods for determining the forms of HMs in tailings,a rapid and precise identification approach is urgently required.To address this issue,a general empirical prediction method for HM occurrence was developed using machine learning(ML).The compositional information of the tailings,properties of the HMs,and sequential extraction steps were used as inputs to calculate the percentages of the seven forms of HMs.After the models were tuned and compared,extreme gradient boosting,gradient boosting decision tree,and categorical boosting methods were found to be the top three performing ML models,with the coefficient of determination(R^(2))values on the testing set exceeding 0.859.Feature importance analysis for these three optimal models indicated that electronegativity was the most important factor affecting the occurrence of HMs,with an average feature importance of 0.4522.The subsequent use of stacking as a model integration method enabled the ability of the ML models to predict HM occurrence forms to be further improved,and resulting in an increase of R^(2) to 0.879.Overall,this study developed a robust technique for predicting the occurrence forms in tailings and provides an important reference for the environmental assessment and recycling of tailings.
基金support extended for this academic work by the Beijing Natural Science Foundation(No.2232066)the Open Project Foundation of State Key Laboratory of Solid Lubrication(No.LSL-2212).
文摘This study employs a stacking ensemble learning framework to establish a regression model for predicting the tribological properties of amide-based lubricating grease and determining the optimal additive ratios.Melamine cyanuric acid(MCA)was selected as the thickener,and three extreme-pressure anti-wear additives were used to prepare the lubricating grease.The tribological performance was tested using an MFT-R4000 reciprocating friction and wear machine.Based on the tribological experimental data,the synthetic minority oversampling technique(SMOTE)was utilized for data augmentation,and a stacking ensemble algorithm with Bayesian optimization of hyperparameters was used to construct a predictive model for tribological performance.Subsequently,within this model framework,single and multi-objective optimization models were developed,and the fruit fly algorithm was employed to find the optimal additive combination ratios,which were experimentally validated.The results demonstrated that the learning framework based on the stacking ensemble model could effectively predict the tribological properties of amide-based lubricating grease in small sample datasets,with the R2 for the average friction coefficient prediction reaching 0.9939 and for the wear scar width prediction reaching 0.9535.In the experimental validation of the optimal additive ratios,the relative error of the friction coefficient ratio scheme was 0.51%,and the relative error of the wear scar width was 1.10%.This finding suggests that the learning framework provides a novel approach for predicting the performance of amide-based lubricating grease and studying additive combinations.
基金the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2019R1F1A1059346)This work was supported by the 2020 Research Fund(Project No.1.180090.01)of UNIST(Ulsan National Institute of Science and Technology).
文摘An anomaly-based intrusion detection system(A-IDS)provides a critical aspect in a modern computing infrastructure since new types of attacks can be discovered.It prevalently utilizes several machine learning algorithms(ML)for detecting and classifying network traffic.To date,lots of algorithms have been proposed to improve the detection performance of A-IDS,either using individual or ensemble learners.In particular,ensemble learners have shown remarkable performance over individual learners in many applications,including in cybersecurity domain.However,most existing works still suffer from unsatisfactory results due to improper ensemble design.The aim of this study is to emphasize the effectiveness of stacking ensemble-based model for A-IDS,where deep learning(e.g.,deep neural network[DNN])is used as base learner model.The effectiveness of the proposed model and base DNN model are benchmarked empirically in terms of several performance metrics,i.e.,Matthew’s correlation coefficient,accuracy,and false alarm rate.The results indicate that the proposed model is superior to the base DNN model as well as other existing ML algorithms found in the literature.
文摘The exponential growth of Internet and network usage has neces-sitated heightened security measures to protect against data and network breaches.Intrusions,executed through network packets,pose a significant challenge for firewalls to detect and prevent due to the similarity between legit-imate and intrusion traffic.The vast network traffic volume also complicates most network monitoring systems and algorithms.Several intrusion detection methods have been proposed,with machine learning techniques regarded as promising for dealing with these incidents.This study presents an Intrusion Detection System Based on Stacking Ensemble Learning base(Random For-est,Decision Tree,and k-Nearest-Neighbors).The proposed system employs pre-processing techniques to enhance classification efficiency and integrates seven machine learning algorithms.The stacking ensemble technique increases performance by incorporating three base models(Random Forest,Decision Tree,and k-Nearest-Neighbors)and a meta-model represented by the Logistic Regression algorithm.Evaluated using the UNSW-NB15 dataset,the pro-posed IDS gained an accuracy of 96.16%in the training phase and 97.95%in the testing phase,with precision of 97.78%,and 98.40%for taring and testing,respectively.The obtained results demonstrate improvements in other measurement criteria.
文摘Malicious traffic detection over the internet is one of the challenging areas for researchers to protect network infrastructures from any malicious activity.Several shortcomings of a network system can be leveraged by an attacker to get unauthorized access through malicious traffic.Safeguard from such attacks requires an efficient automatic system that can detect malicious traffic timely and avoid system damage.Currently,many automated systems can detect malicious activity,however,the efficacy and accuracy need further improvement to detect malicious traffic from multi-domain systems.The present study focuses on the detection of malicious traffic with high accuracy using machine learning techniques.The proposed approach used two datasets UNSW-NB15 and IoTID20 which contain the data for IoT-based traffic and local network traffic,respectively.Both datasets were combined to increase the capability of the proposed approach in detecting malicious traffic from local and IoT networks,with high accuracy.Horizontally merging both datasets requires an equal number of features which was achieved by reducing feature count to 30 for each dataset by leveraging principal component analysis(PCA).The proposed model incorporates stacked ensemble model extra boosting forest(EBF)which is a combination of tree-based models such as extra tree classifier,gradient boosting classifier,and random forest using a stacked ensemble approach.Empirical results show that EBF performed significantly better and achieved the highest accuracy score of 0.985 and 0.984 on the multi-domain dataset for two and four classes,respectively.
基金primarily supported by the Ministry of Higher Education through MRUN Young Researchers Grant Scheme(MY-RGS),MR001-2019,entitled“Climate Change Mitigation:Artificial Intelligence-Based Integrated Environmental System for Mangrove Forest Conservation,”received by K.H.,S.A.R.,H.F.H.,M.I.M.,and M.M.Asecondarily funded by the UM-RU Grant,ST065-2021,entitled Climate Smart Mitigation and Adaptation:Integrated Climate Resilience Strategy for Tropical Marine Ecosystem.
文摘A common difficulty in building prediction models with real-world environmental datasets is the skewed distribution of classes.There are significantly more samples for day-to-day classes,while rare events such as polluted classes are uncommon.Consequently,the limited availability of minority outcomes lowers the classifier’s overall reliability.This study assesses the capability of machine learning(ML)algorithms in tackling imbalanced water quality data based on the metrics of precision,recall,and F1 score.It intends to balance the misled accuracy towards the majority of data.Hence,10 ML algorithms of its performance are compared.The classifiers included are AdaBoost,SupportVector Machine,Linear Discriminant Analysis,k-Nearest Neighbors,Naive Bayes,Decision Trees,Random Forest,Extra Trees,Bagging,and the Multilayer Perceptron.This study also uses the Easy Ensemble Classifier,Balanced Bagging,andRUSBoost algorithm to evaluatemulti-class imbalanced learning methods.The comparison results revealed that a highaccuracy machine learning model is not always good in recall and sensitivity.This paper’s stacked ensemble deep learning(SE-DL)generalization model effectively classifies the water quality index(WQI)based on 23 input variables.The proposed algorithm achieved a remarkable average of 95.69%,94.96%,92.92%,and 93.88%for accuracy,precision,recall,and F1 score,respectively.In addition,the proposed model is compared against two state-of-the-art classifiers,the XGBoost(eXtreme Gradient Boosting)and Light Gradient Boosting Machine,where performance metrics of balanced accuracy and g-mean are included.The experimental setup concluded XGBoost with a higher balanced accuracy and G-mean.However,the SE-DL model has a better and more balanced performance in the F1 score.The SE-DL model aligns with the goal of this study to ensure the balance between accuracy and completeness for each water quality class.The proposed algorithm is also capable of higher efficiency at a lower computational time against using the standard SyntheticMinority Oversampling Technique(SMOTE)approach to imbalanced datasets.
基金The research is funded by the Researchers Supporting Project at King Saud University,(Project#RSP-2021/305).
文摘COVID-19 is a growing problem worldwide with a high mortality rate.As a result,the World Health Organization(WHO)declared it a pandemic.In order to limit the spread of the disease,a fast and accurate diagnosis is required.A reverse transcript polymerase chain reaction(RT-PCR)test is often used to detect the disease.However,since this test is time-consuming,a chest computed tomography(CT)or plain chest X-ray(CXR)is sometimes indicated.The value of automated diagnosis is that it saves time and money by minimizing human effort.Three significant contributions are made by our research.Its initial purpose is to use the essential finetuning methodology to test the action and efficiency of a variety of vision models,ranging from Inception to Neural Architecture Search(NAS)networks.Second,by plotting class activationmaps(CAMs)for individual networks and assessing classification efficiency with AUC-ROC curves,the behavior of these models is visually analyzed.Finally,stacked ensembles techniques were used to provide greater generalization by combining finetuned models with six ensemble neural networks.Using stacked ensembles,the generalization of the models improved.Furthermore,the ensemble model created by combining all of the finetuned networks obtained a state-of-the-art COVID-19 accuracy detection score of 99.17%.The precision and recall rates were 99.99%and 89.79%,respectively,highlighting the robustness of stacked ensembles.The proposed ensemble approach performed well in the classification of the COVID-19 lesions on CXR according to the experimental results.
基金financially supported by the Deanship of Scientific Research and Graduate Studies at King Khalid University under research grant number(R.G.P.2/21/46)in part by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia,under Grant KFU253116.
文摘Today,phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers,passwords,and usernames.We can find several anti-phishing solutions,such as heuristic detection,virtual similarity detection,black and white lists,and machine learning(ML).However,phishing attempts remain a problem,and establishing an effective anti-phishing strategy is a work in progress.Furthermore,while most antiphishing solutions achieve the highest levels of accuracy on a given dataset,their methods suffer from an increased number of false positives.These methods are ineffective against zero-hour attacks.Phishing sites with a high False Positive Rate(FPR)are considered genuine because they can cause people to lose a lot ofmoney by visiting them.Feature selection is critical when developing phishing detection strategies.Good feature selection helps improve accuracy;however,duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm.Therefore,a combination of filter-based feature selection methods is proposed to detect phishing attacks,including constant feature removal,duplicate feature removal,quasi-feature removal,correlated feature removal,mutual information extraction,and Analysis of Variance(ANOVA)testing.The technique has been tested with differentMachine Learning classifiers:Random Forest,Artificial Neural Network(ANN),Ada-Boost,Extreme Gradient Boosting(XGBoost),Logistic Regression,Decision Trees,Gradient Boosting Classifiers,Support Vector Machine(SVM),and two types of ensemble models,stacking and majority voting to gain A low false positive rate is achieved.Stacked ensemble classifiers(gradient boosting,randomforest,support vector machine)achieve 1.31%FPR and 98.17%accuracy on Dataset 1,2.81%FPR and Dataset 3 shows 2.81%FPR and 97.61%accuracy,while Dataset 2 shows 3.47%FPR and 96.47%accuracy.