Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class at...Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class attacks,this study proposes an intrusion detection method based on a two-layer structure.The first layer employs a CNN-BiLSTM model incorporating an attention mechanism to classify network traffic into normal traffic,majority class attacks,and merged minority class attacks.The second layer further segments the minority class attacks through Stacking ensemble learning.The datasets are selected from the generic network dataset CIC-IDS2017,NSL-KDD,and the industrial network dataset Mississippi Gas Pipeline dataset to enhance the generalization and practical applicability of the model.Experimental results show that the proposed model achieves an overall detection accuracy of 99%,99%,and 95%on the CIC-IDS2017,NSL-KDD,and industrial network datasets,respectively.It also significantly outperforms traditional methods in terms of detection accuracy and recall rate for minority class attacks.Compared with the single-layer deep learning model,the two-layer structure effectively reduces the false alarm rate while improving the minority-class attack detection performance.The research in this paper not only improves the adaptability of NIDS to complex network environments but also provides a new solution for minority-class attack detection in industrial network security.展开更多
Self-powered neutron detectors(SPNDs)play a critical role in monitoring the safety margins and overall health of reactors,directly affecting safe operation within the reactor.In this work,a novel fault identification ...Self-powered neutron detectors(SPNDs)play a critical role in monitoring the safety margins and overall health of reactors,directly affecting safe operation within the reactor.In this work,a novel fault identification method based on graph convolutional networks(GCN)and Stacking ensemble learning is proposed for SPNDs.The GCN is employed to extract the spatial neighborhood information of SPNDs at different positions,and residuals are obtained by nonlinear fitting of SPND signals.In order to completely extract the time-varying features from residual sequences,the Stacking fusion model,integrated with various algorithms,is developed and enables the identification of five conditions for SPNDs:normal,drift,bias,precision degradation,and complete failure.The results demonstrate that the integration of diverse base-learners in the GCN-Stacking model exhibits advantages over a single model as well as enhances the stability and reliability in fault identification.Additionally,the GCN-Stacking model maintains higher accuracy in identifying faults at different reactor power levels.展开更多
Flood susceptibility modeling is crucial for rapid flood forecasting, disaster reduction strategies, evacuation planning, and decision-making. Machine learning(ML) models have proven to be effective tools for assessin...Flood susceptibility modeling is crucial for rapid flood forecasting, disaster reduction strategies, evacuation planning, and decision-making. Machine learning(ML) models have proven to be effective tools for assessing flood susceptibility. However, most previous studies have focused on individual models or comparative performance, underscoring the unique strengths and weaknesses of each model. In this study, we propose a stacking ensemble learning algorithm that harnesses the strengths of a diverse range of machine learning models. The findings reveal the following:(1) The stacking ensemble learning, using RF-XGBCB-LR model, significantly enhances flood susceptibility simulation.(2) In addition to rainfall,key flood drivers in the study area include NDVI, and impervious surfaces. Over 40% of the study area, primarily in the northeast and southeast, exhibits high flood susceptibility, with higher risks for populations compared to cropland.(3) In the northeast of the study area,heavy precipitation, low terrain, and NDVI values are key indicators contributing to high flood susceptibility, while long-duration precipitation, mountainous topography, and upper reach vegetation are the main drivers in the southeast. This study underscores the effectiveness of ML, particularly ensemble learning, in flood modeling. It identifies vulnerable areas and contributes to improved flood risk management.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with ...Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with the evolving sophistication of cyber threats.This paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack detection.Our approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in performance.It demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant loads.Furthermore,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time applications.Designed for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current defenses.This innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.展开更多
Tailings produced by mining and ore smelting are a major source of soil pollution.Understanding the speciation of heavy metals(HMs)in tailings is essential for soil remediation and sustainable development.Given the co...Tailings produced by mining and ore smelting are a major source of soil pollution.Understanding the speciation of heavy metals(HMs)in tailings is essential for soil remediation and sustainable development.Given the complex and time-consuming nature of traditional sequential laboratory extraction methods for determining the forms of HMs in tailings,a rapid and precise identification approach is urgently required.To address this issue,a general empirical prediction method for HM occurrence was developed using machine learning(ML).The compositional information of the tailings,properties of the HMs,and sequential extraction steps were used as inputs to calculate the percentages of the seven forms of HMs.After the models were tuned and compared,extreme gradient boosting,gradient boosting decision tree,and categorical boosting methods were found to be the top three performing ML models,with the coefficient of determination(R^(2))values on the testing set exceeding 0.859.Feature importance analysis for these three optimal models indicated that electronegativity was the most important factor affecting the occurrence of HMs,with an average feature importance of 0.4522.The subsequent use of stacking as a model integration method enabled the ability of the ML models to predict HM occurrence forms to be further improved,and resulting in an increase of R^(2) to 0.879.Overall,this study developed a robust technique for predicting the occurrence forms in tailings and provides an important reference for the environmental assessment and recycling of tailings.展开更多
A common difficulty in building prediction models with real-world environmental datasets is the skewed distribution of classes.There are significantly more samples for day-to-day classes,while rare events such as poll...A common difficulty in building prediction models with real-world environmental datasets is the skewed distribution of classes.There are significantly more samples for day-to-day classes,while rare events such as polluted classes are uncommon.Consequently,the limited availability of minority outcomes lowers the classifier’s overall reliability.This study assesses the capability of machine learning(ML)algorithms in tackling imbalanced water quality data based on the metrics of precision,recall,and F1 score.It intends to balance the misled accuracy towards the majority of data.Hence,10 ML algorithms of its performance are compared.The classifiers included are AdaBoost,SupportVector Machine,Linear Discriminant Analysis,k-Nearest Neighbors,Naive Bayes,Decision Trees,Random Forest,Extra Trees,Bagging,and the Multilayer Perceptron.This study also uses the Easy Ensemble Classifier,Balanced Bagging,andRUSBoost algorithm to evaluatemulti-class imbalanced learning methods.The comparison results revealed that a highaccuracy machine learning model is not always good in recall and sensitivity.This paper’s stacked ensemble deep learning(SE-DL)generalization model effectively classifies the water quality index(WQI)based on 23 input variables.The proposed algorithm achieved a remarkable average of 95.69%,94.96%,92.92%,and 93.88%for accuracy,precision,recall,and F1 score,respectively.In addition,the proposed model is compared against two state-of-the-art classifiers,the XGBoost(eXtreme Gradient Boosting)and Light Gradient Boosting Machine,where performance metrics of balanced accuracy and g-mean are included.The experimental setup concluded XGBoost with a higher balanced accuracy and G-mean.However,the SE-DL model has a better and more balanced performance in the F1 score.The SE-DL model aligns with the goal of this study to ensure the balance between accuracy and completeness for each water quality class.The proposed algorithm is also capable of higher efficiency at a lower computational time against using the standard SyntheticMinority Oversampling Technique(SMOTE)approach to imbalanced datasets.展开更多
Surveillance cameras have been widely used for monitoring in both private and public sectors as a security measure.Close Circuits Television(CCTV)Cameras are used to surveillance and monitor the normal and anomalous i...Surveillance cameras have been widely used for monitoring in both private and public sectors as a security measure.Close Circuits Television(CCTV)Cameras are used to surveillance and monitor the normal and anomalous incidents.Real-world anomaly detection is a significant challenge due to its complex and diverse nature.It is difficult to manually analyze because vast amounts of video data have been generated through surveillance systems,and the need for automated techniques has been raised to enhance detection accuracy.This paper proposes a novel deep-stacked ensemble model integrated with a data augmentation approach called Stack Ensemble Road Anomaly Detection(SERAD).SERAD is used to detect and classify the four most happening road anomalies,such as accidents,car fires,fighting,and snatching,through road surveillance videos with high accuracy.The SERAD adapted three pre-trained Convolutional Neural Networks(CNNs)models,namely VGG19,ResNet50 and InceptionV3.The stacking technique is employed to incorporate these three models,resulting in much-improved accuracy for classifying road abnormalities compared to individual models.Additionally,it presented a custom real-world Road Anomaly Dataset(RAD)comprising a comprehensive collection of road images and videos.The experimental results demonstrate the strength and reliability of the proposed SERAD model,achieving an impressive classification accuracy of 98.7%.The results indicate that the proposed SERAD model outperforms than the individual CNN base models.展开更多
Intrusion detection is a hot field in the direction of network security.Classical intrusion detection systems are usually based on supervised machine learning models.These offline-trained models usually have better pe...Intrusion detection is a hot field in the direction of network security.Classical intrusion detection systems are usually based on supervised machine learning models.These offline-trained models usually have better performance in the initial stages of system construction.However,due to the diversity and rapid development of intrusion techniques,the trained models are often difficult to detect new attacks.In addition,very little noisy data in the training process often has a considerable impact on the performance of the intrusion detection system.This paper proposes an intrusion detection system based on active incremental learning with the adaptive capability to solve these problems.IDS consists of two modules,namely the improved incremental stacking ensemble learning detection method called Multi-Stacking model and the active learning query module.The stacking model can cope well with concept drift due to the diversity and generalization selection of its base classifiers,but the accuracy does not meet the requirements.The Multi-Stacking model improves the accuracy of the model by adding a voting layer on the basis of the original stacking.The active learning query module improves the detection of known attacks through the committee algorithm,and the improved KNN algorithm can better help detect unknown attacks.We have tested the latest industrial IoT dataset with satisfactory results.展开更多
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)—Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government(MSIT)(IITP-2025-RS-2022-00156334)in part by Liaoning Province Nature Fund Project(2024-BSLH-214).
文摘Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class attacks,this study proposes an intrusion detection method based on a two-layer structure.The first layer employs a CNN-BiLSTM model incorporating an attention mechanism to classify network traffic into normal traffic,majority class attacks,and merged minority class attacks.The second layer further segments the minority class attacks through Stacking ensemble learning.The datasets are selected from the generic network dataset CIC-IDS2017,NSL-KDD,and the industrial network dataset Mississippi Gas Pipeline dataset to enhance the generalization and practical applicability of the model.Experimental results show that the proposed model achieves an overall detection accuracy of 99%,99%,and 95%on the CIC-IDS2017,NSL-KDD,and industrial network datasets,respectively.It also significantly outperforms traditional methods in terms of detection accuracy and recall rate for minority class attacks.Compared with the single-layer deep learning model,the two-layer structure effectively reduces the false alarm rate while improving the minority-class attack detection performance.The research in this paper not only improves the adaptability of NIDS to complex network environments but also provides a new solution for minority-class attack detection in industrial network security.
基金the Industry-University Cooperation Project in Fujian Province University(No.2022H6020)。
文摘Self-powered neutron detectors(SPNDs)play a critical role in monitoring the safety margins and overall health of reactors,directly affecting safe operation within the reactor.In this work,a novel fault identification method based on graph convolutional networks(GCN)and Stacking ensemble learning is proposed for SPNDs.The GCN is employed to extract the spatial neighborhood information of SPNDs at different positions,and residuals are obtained by nonlinear fitting of SPND signals.In order to completely extract the time-varying features from residual sequences,the Stacking fusion model,integrated with various algorithms,is developed and enables the identification of five conditions for SPNDs:normal,drift,bias,precision degradation,and complete failure.The results demonstrate that the integration of diverse base-learners in the GCN-Stacking model exhibits advantages over a single model as well as enhances the stability and reliability in fault identification.Additionally,the GCN-Stacking model maintains higher accuracy in identifying faults at different reactor power levels.
基金National Natural Science Foundation of China,No.42271037Key Research and Development Program Project of Anhui Province,No.2022m07020011+1 种基金The University Synergy Innovation Program of Anhui Province,No.GXXT-2021-048Science Foundation for Excellent Young Scholars of Anhui,No.2108085Y13。
文摘Flood susceptibility modeling is crucial for rapid flood forecasting, disaster reduction strategies, evacuation planning, and decision-making. Machine learning(ML) models have proven to be effective tools for assessing flood susceptibility. However, most previous studies have focused on individual models or comparative performance, underscoring the unique strengths and weaknesses of each model. In this study, we propose a stacking ensemble learning algorithm that harnesses the strengths of a diverse range of machine learning models. The findings reveal the following:(1) The stacking ensemble learning, using RF-XGBCB-LR model, significantly enhances flood susceptibility simulation.(2) In addition to rainfall,key flood drivers in the study area include NDVI, and impervious surfaces. Over 40% of the study area, primarily in the northeast and southeast, exhibits high flood susceptibility, with higher risks for populations compared to cropland.(3) In the northeast of the study area,heavy precipitation, low terrain, and NDVI values are key indicators contributing to high flood susceptibility, while long-duration precipitation, mountainous topography, and upper reach vegetation are the main drivers in the southeast. This study underscores the effectiveness of ML, particularly ensemble learning, in flood modeling. It identifies vulnerable areas and contributes to improved flood risk management.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.
基金supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2024R513),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Cross-Site Scripting(XSS)remains a significant threat to web application security,exploiting vulnerabilities to hijack user sessions and steal sensitive data.Traditional detection methods often fail to keep pace with the evolving sophistication of cyber threats.This paper introduces a novel hybrid ensemble learning framework that leverages a combination of advanced machine learning algorithms—Logistic Regression(LR),Support Vector Machines(SVM),eXtreme Gradient Boosting(XGBoost),Categorical Boosting(CatBoost),and Deep Neural Networks(DNN).Utilizing the XSS-Attacks-2021 dataset,which comprises 460 instances across various real-world trafficrelated scenarios,this framework significantly enhances XSS attack detection.Our approach,which includes rigorous feature engineering and model tuning,not only optimizes accuracy but also effectively minimizes false positives(FP)(0.13%)and false negatives(FN)(0.19%).This comprehensive methodology has been rigorously validated,achieving an unprecedented accuracy of 99.87%.The proposed system is scalable and efficient,capable of adapting to the increasing number of web applications and user demands without a decline in performance.It demonstrates exceptional real-time capabilities,with the ability to detect XSS attacks dynamically,maintaining high accuracy and low latency even under significant loads.Furthermore,despite the computational complexity introduced by the hybrid ensemble approach,strategic use of parallel processing and algorithm tuning ensures that the system remains scalable and performs robustly in real-time applications.Designed for easy integration with existing web security systems,our framework supports adaptable Application Programming Interfaces(APIs)and a modular design,facilitating seamless augmentation of current defenses.This innovation represents a significant advancement in cybersecurity,offering a scalable and effective solution for securing modern web applications against evolving threats.
基金financially supported by the Natural Science Foundation of Hunan Province,China(No.2024JJ2074)the National Natural Science Foundation of China(No.22376221)the Young Elite Scientists Sponsorship Program by CAST,China(No.2023QNRC001).
文摘Tailings produced by mining and ore smelting are a major source of soil pollution.Understanding the speciation of heavy metals(HMs)in tailings is essential for soil remediation and sustainable development.Given the complex and time-consuming nature of traditional sequential laboratory extraction methods for determining the forms of HMs in tailings,a rapid and precise identification approach is urgently required.To address this issue,a general empirical prediction method for HM occurrence was developed using machine learning(ML).The compositional information of the tailings,properties of the HMs,and sequential extraction steps were used as inputs to calculate the percentages of the seven forms of HMs.After the models were tuned and compared,extreme gradient boosting,gradient boosting decision tree,and categorical boosting methods were found to be the top three performing ML models,with the coefficient of determination(R^(2))values on the testing set exceeding 0.859.Feature importance analysis for these three optimal models indicated that electronegativity was the most important factor affecting the occurrence of HMs,with an average feature importance of 0.4522.The subsequent use of stacking as a model integration method enabled the ability of the ML models to predict HM occurrence forms to be further improved,and resulting in an increase of R^(2) to 0.879.Overall,this study developed a robust technique for predicting the occurrence forms in tailings and provides an important reference for the environmental assessment and recycling of tailings.
基金primarily supported by the Ministry of Higher Education through MRUN Young Researchers Grant Scheme(MY-RGS),MR001-2019,entitled“Climate Change Mitigation:Artificial Intelligence-Based Integrated Environmental System for Mangrove Forest Conservation,”received by K.H.,S.A.R.,H.F.H.,M.I.M.,and M.M.Asecondarily funded by the UM-RU Grant,ST065-2021,entitled Climate Smart Mitigation and Adaptation:Integrated Climate Resilience Strategy for Tropical Marine Ecosystem.
文摘A common difficulty in building prediction models with real-world environmental datasets is the skewed distribution of classes.There are significantly more samples for day-to-day classes,while rare events such as polluted classes are uncommon.Consequently,the limited availability of minority outcomes lowers the classifier’s overall reliability.This study assesses the capability of machine learning(ML)algorithms in tackling imbalanced water quality data based on the metrics of precision,recall,and F1 score.It intends to balance the misled accuracy towards the majority of data.Hence,10 ML algorithms of its performance are compared.The classifiers included are AdaBoost,SupportVector Machine,Linear Discriminant Analysis,k-Nearest Neighbors,Naive Bayes,Decision Trees,Random Forest,Extra Trees,Bagging,and the Multilayer Perceptron.This study also uses the Easy Ensemble Classifier,Balanced Bagging,andRUSBoost algorithm to evaluatemulti-class imbalanced learning methods.The comparison results revealed that a highaccuracy machine learning model is not always good in recall and sensitivity.This paper’s stacked ensemble deep learning(SE-DL)generalization model effectively classifies the water quality index(WQI)based on 23 input variables.The proposed algorithm achieved a remarkable average of 95.69%,94.96%,92.92%,and 93.88%for accuracy,precision,recall,and F1 score,respectively.In addition,the proposed model is compared against two state-of-the-art classifiers,the XGBoost(eXtreme Gradient Boosting)and Light Gradient Boosting Machine,where performance metrics of balanced accuracy and g-mean are included.The experimental setup concluded XGBoost with a higher balanced accuracy and G-mean.However,the SE-DL model has a better and more balanced performance in the F1 score.The SE-DL model aligns with the goal of this study to ensure the balance between accuracy and completeness for each water quality class.The proposed algorithm is also capable of higher efficiency at a lower computational time against using the standard SyntheticMinority Oversampling Technique(SMOTE)approach to imbalanced datasets.
基金funded by the King Saud University,Riyadh,Saudi Arabia for funding this work through Researchers Supporting Project Number-RSPD2024R893.
文摘Surveillance cameras have been widely used for monitoring in both private and public sectors as a security measure.Close Circuits Television(CCTV)Cameras are used to surveillance and monitor the normal and anomalous incidents.Real-world anomaly detection is a significant challenge due to its complex and diverse nature.It is difficult to manually analyze because vast amounts of video data have been generated through surveillance systems,and the need for automated techniques has been raised to enhance detection accuracy.This paper proposes a novel deep-stacked ensemble model integrated with a data augmentation approach called Stack Ensemble Road Anomaly Detection(SERAD).SERAD is used to detect and classify the four most happening road anomalies,such as accidents,car fires,fighting,and snatching,through road surveillance videos with high accuracy.The SERAD adapted three pre-trained Convolutional Neural Networks(CNNs)models,namely VGG19,ResNet50 and InceptionV3.The stacking technique is employed to incorporate these three models,resulting in much-improved accuracy for classifying road abnormalities compared to individual models.Additionally,it presented a custom real-world Road Anomaly Dataset(RAD)comprising a comprehensive collection of road images and videos.The experimental results demonstrate the strength and reliability of the proposed SERAD model,achieving an impressive classification accuracy of 98.7%.The results indicate that the proposed SERAD model outperforms than the individual CNN base models.
基金sponsored by the National Natural Science Foundation of China under Grants 62271264,61972207,and 42175194the Project through the Priority Academic Program Development(PAPD)of Jiangsu Higher Education Institution.
文摘Intrusion detection is a hot field in the direction of network security.Classical intrusion detection systems are usually based on supervised machine learning models.These offline-trained models usually have better performance in the initial stages of system construction.However,due to the diversity and rapid development of intrusion techniques,the trained models are often difficult to detect new attacks.In addition,very little noisy data in the training process often has a considerable impact on the performance of the intrusion detection system.This paper proposes an intrusion detection system based on active incremental learning with the adaptive capability to solve these problems.IDS consists of two modules,namely the improved incremental stacking ensemble learning detection method called Multi-Stacking model and the active learning query module.The stacking model can cope well with concept drift due to the diversity and generalization selection of its base classifiers,but the accuracy does not meet the requirements.The Multi-Stacking model improves the accuracy of the model by adding a voting layer on the basis of the original stacking.The active learning query module improves the detection of known attacks through the committee algorithm,and the improved KNN algorithm can better help detect unknown attacks.We have tested the latest industrial IoT dataset with satisfactory results.