The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial int...The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.展开更多
The first 2^(+)excited states of the nucleus directly reflect the interaction between the shell structure and the nucleus,providing insights into the validity of the shell model and nuclear structure characteristics.A...The first 2^(+)excited states of the nucleus directly reflect the interaction between the shell structure and the nucleus,providing insights into the validity of the shell model and nuclear structure characteristics.Although the features of the first 2^(+)excited states can be measured for stable nuclei and calculated using nuclear models,significant uncertainty remains.This study employs a machine learning model based on a light gradient boosting machine(LightGBM)to investigate the first 2^(+)excited states.Specifically,the training of the LightGBM algorithm and the prediction of the first 2^(+)properties of 642 nuclei are presented.Furthermore,detailed comparisons of the LightGBM predictions were performed with available experimental data,shell model calculations,and Bayesian neural network predictions.The results revealed that the average difference between the LightGBM predictions and the experimental data was 18 times smaller than that obtained by the shell model and only 70%of the BNN prediction results.Considering Mg,Ca,Kr,Sm,and Pb isotopes as examples,it was also observed that LightGBM can effectively reproduce the magic number mutation caused by shell effects,with the energy being as low as 0.04 MeV due to shape coexistence.Therefore,we believe that leveraging LightGBM-based machine learning can profoundly enhance our insights into nuclear structures and provide new avenues for nuclear physics research.展开更多
Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradien...Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.展开更多
In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:...In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:i)An optimized LGBM model has been developed for the identification of malicious IoT activities in the IoT network;ii)An efficient evolutionary optimization approach has been adopted for finding the optimal set of hyper-parameters of LGBM for the projected problem.Here,a Genetic Algorithm(GA)with k-way tournament selection and uniform crossover operation is used for efficient exploration of hyper-parameter search space;iii)Finally,the performance of the proposed model is evaluated using state-of-the-art ensemble learning and machine learning-based model to achieve overall generalized performance and efficiency.Simulation outcomes reveal that the proposed approach is superior to other considered methods and proves to be a robust approach to intrusion detection in an IoT environment.展开更多
Coal rock mass instability fracture may result in serious hazards to underground coal mining.Acoustic emissions(AE)stimulated by internal structure fracture should carry lots of favorable information about health cond...Coal rock mass instability fracture may result in serious hazards to underground coal mining.Acoustic emissions(AE)stimulated by internal structure fracture should carry lots of favorable information about health condition of rock mass.AE as a sensitive non-destructive test method is gradually utilized to detect anomaly conditions of coal rock.This paper proposes an improved multi-resolution feature to extract AE waveform at different frequency resolutions using Coilflet Wavelet Transform method(CWT).It is further adopt an efficient Light Gradient Boosting Machine(LightGBM)by several cascaded sub weak classifier models to merge AE features at different views of frequency for coal rock anomaly damage recognition.The results denote that the proposed method achieves excellent recognition performance on anomaly damage levels of coal rock.It is an effective method to detect the critical stability further to predict the rock mass bursting in time.展开更多
Global climate change and sea level rise have led to increased losses from flooding.Accurate prediction of floods is essential to mitigating flood losses in coastal cities.Physically based models cannot satisfy the de...Global climate change and sea level rise have led to increased losses from flooding.Accurate prediction of floods is essential to mitigating flood losses in coastal cities.Physically based models cannot satisfy the demand for real-time prediction for urban flooding due to their computational complexity.In this study,we proposed a hybrid modeling approach for rapid prediction of urban floods,coupling the physically based model with the light gradient boosting machine(LightGBM)model.A hydrological–hydraulic model was used to provide sufficient data for the LightGBM model based on the personal computer storm water management model(PCSWMM).The variables related to rainfall,tide level,and the location of flood points were used as the input for the LightGBM model.To improve the prediction accuracy,the hyperparameters of the LightGBM model are optimized by grid search algorithm and K-fold cross-validation.Taking Haidian Island,Hainan Province,China as a case study,the optimum values of the learning rate,number of estimators,and number of leaves of the LightGBM model are 0.11,450,and 12,respectively.The Nash-Sutcliffe efficiency coefficient(NSE)of the LightGBM model on the test set is 0.9896,indicating that the LightGBM model has reliable predictions and outperforms random forest(RF),extreme gradient boosting(XGBoost),and k-nearest neighbor(KNN).From the LightGBM model,the variables related to tide level were analyzed as the dominant variables for predicting the inundation depth based on the Gini index in the study area.The proposed LightGBM model provides a scientific reference for flood control in coastal cities considering its superior performance and computational efficiency.展开更多
BACKGROUND:The problem of prolonged emergency department length of stay(EDLOS) is becoming increasingly crucial.This study aims to develop a machine learning(ML) model to predict EDLOS,with EDLOS as the outcome variab...BACKGROUND:The problem of prolonged emergency department length of stay(EDLOS) is becoming increasingly crucial.This study aims to develop a machine learning(ML) model to predict EDLOS,with EDLOS as the outcome variable and demographic characteristics,triage level,and medical resource utilization as predictive factors.METHODS:A retrospective analysis was performed on the patients who visited the emergency department of the Second Affiliated Hospital of Guangzhou Medical University from March 2019to September 2021,and a total of 321,012 cases were identified.According to the inclusion and exclusion criteria,187,028 cases were finally included in the analysis.ML analysis was performed using R-squared(R^(2)),and the predictive factors and the EDLOS were used as independent variables and dependent variables,respectively,to establish models.The performance evaluation of the ML models was conducted through the utilization of the mean absolute error(MAE),root mean square error(RMSE),and R^(2),enabling an objective comparative analysis.RESULTS:In the comparative analysis of the six ML models,light gradient boosting machine(LightGBM) model demonstrated the lowest MAE(443.519) and RMSE(826.783),and the highest R^(2) value(0.48),indicating better model fit and predictive performance.Among the top 10 predictive factors associated with EDLOS according to the LightGBM model,the emergency waiting time,age,and emergency arrival time had the most significant impact on the EDLOS.CONCLUSION:The LightGBM model suggests that the emergency waiting time,age,and emergency arrival time may be used to predict the EDLOS.展开更多
Blasting is well-known as an effective method for fragmenting or moving rock in open-pit mines.To evaluate the quality of blasting,the size of rock distribution is used as a critical criterion in blasting operations.A...Blasting is well-known as an effective method for fragmenting or moving rock in open-pit mines.To evaluate the quality of blasting,the size of rock distribution is used as a critical criterion in blasting operations.A high percentage of oversized rocks generated by blasting operations can lead to economic and environmental damage.Therefore,this study proposed four novel intelligent models to predict the size of rock distribution in mine blasting in order to optimize blasting parameters,as well as the efficiency of blasting operation in open mines.Accordingly,a nature-inspired algorithm(i.e.,firefly algorithm-FFA)and different machine learning algorithms(i.e.,gradient boosting machine(GBM),support vector machine(SVM),Gaussian process(GP),and artificial neural network(ANN))were combined for this aim,abbreviated as FFA-GBM,FFA-SVM,FFA-GP,and FFA-ANN,respectively.Subsequently,predicted results from the abovementioned models were compared with each other using three statistical indicators(e.g.,mean absolute error,root-mean-squared error,and correlation coefficient)and color intensity method.For developing and simulating the size of rock in blasting operations,136 blasting events with their images were collected and analyzed by the Split-Desktop software.In which,111 events were randomly selected for the development and optimization of the models.Subsequently,the remaining 25 blasting events were applied to confirm the accuracy of the proposed models.Herein,blast design parameters were regarded as input variables to predict the size of rock in blasting operations.Finally,the obtained results revealed that the FFA is a robust optimization algorithm for estimating rock fragmentation in bench blasting.Among the models developed in this study,FFA-GBM provided the highest accuracy in predicting the size of fragmented rocks.The other techniques(i.e.,FFA-SVM,FFA-GP,and FFA-ANN)yielded lower computational stability and efficiency.Hence,the FFA-GBM model can be used as a powerful and precise soft computing tool that can be applied to practical engineering cases aiming to improve the quality of blasting and rock fragmentation.展开更多
Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing...Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.展开更多
Artificial lift plays an important role in petroleum industry to sustain production flowrate and to extend the lifespan of oil wells. One of the most popular artificial lift methods is Electric Submersible Pumps (ESP)...Artificial lift plays an important role in petroleum industry to sustain production flowrate and to extend the lifespan of oil wells. One of the most popular artificial lift methods is Electric Submersible Pumps (ESP) because it can produce high flowrate even for wells with great depth. Although ESPs are designed to work under extreme conditions such as corrosion, high temperatures and high pressure, their lifespan is much shorter than expected. ESP failures lead to production loss and increase the cost of replacement, because the cost of intervention work for ESP is much higher than for other artificial lift methods, especially for offshore wells. Therefore, the prediction of ESP failures is highly valuable in oil production and contribute</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span></span></span><span><span><span><span style="font-family:""><span style="font-family:Verdana;"> a lot to the design, construction and operation of oil wells. The contribution of this study is to use 3 machine learning algorithms, which are Decision Tree, Random Forest and Gradient Boosting Machine, to build predictive models for ESP lifespan while using both dynamic and static ESP parameters. The results of these </span><span style="font-family:Verdana;">models were compared to find out the most suitable model for </span></span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">the </span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">prediction of ESP life cycle. In addition, this study also evaluated the influence factor of various operating param</span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">e</span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">ters to forecast the most impact parameters on the duration of ESP. The results of this study can provide a better understanding of ESP behavior so that early actions can be realized to prevent potential ESP failures</span></span></span></span><span style="font-family:Verdana;">.展开更多
Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network...Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network.On the other hand,these advantages create a more vulnerable environment with substantial risks,culminating in network difficulties,system paralysis,online banking frauds,and robberies.These issues have a significant detrimental impact on organizations,enterprises,and even economies.Accuracy,high performance,and real-time systems are necessary to achieve this goal.Using a SDN to extend intelligent machine learning methodologies in an Intrusion Detection System(IDS)has stimulated the interest of numerous research investigators over the last decade.In this paper,a novel HFS-LGBM IDS is proposed for SDN.First,the Hybrid Feature Selection algorithm consisting of two phases is applied to reduce the data dimension and to obtain an optimal feature subset.In thefirst phase,the Correlation based Feature Selection(CFS)algorithm is used to obtain the feature subset.The optimal feature set is obtained by applying the Random Forest Recursive Feature Elimination(RF-RFE)in the second phase.A LightGBM algorithm is then used to detect and classify different types of attacks.The experimental results based on NSL-KDD dataset show that the proposed system produces outstanding results compared to the existing methods in terms of accuracy,precision,recall and f-measure.展开更多
Slope instability in hilly regions is a highly complex phenomenon,with triggering factors ranging from natural events to anthropogenic activities.Such failures hit disastrous losses both in terms of material as well a...Slope instability in hilly regions is a highly complex phenomenon,with triggering factors ranging from natural events to anthropogenic activities.Such failures hit disastrous losses both in terms of material as well as life.It is necessary to comprehend the mechanism of these failures to mitigate such events and also to predict their vulnerability for better preparedness.Significant advancements have already been done in the area of slope stability analysis,and scores of valued tools and techniques have been developed,such as limit equilibrium methods,finite element and finite difference methods,stochastic methods,and several of their combinations.In this study,an attempt has been made to capitalize on machine learning tools to predict the factor of safety of rock slope stability in hilly regions.Three road-cut slopes have been considered and their stability is determined using both finite element(FE)and machine learning(ML)techniques.The idea to intertwine these approaches is to supplement each other and enhance the reliability of the results.The geotechnical data was acquired through field investigation trips to the adopted mountainous sites.Since the slopes at the site are rocky,in the FE model,the Generalized Hoek Brown(GHB)material model with shear strength reduction technique have been used.In the implementation of ML models,Random Forest(RF)and Gradient Boosting Machine(GBM)models have been used.For the training of the ML model,ample published data has been utilized,while for testing the ML model,the data from the current slope site is used.The analysis in ML model is carried out in three stages:a)without Hyperparameter tuning,b)with Hyperparameter tuning using GridSearchCV,and c)Pipeline incorporating Recursive Feature Elimination(RFE).Performance metrics,including Mean Absolute Error(MAE),Mean Squared Error(MSE),and R^(2) score,were evaluated to assess the accuracy of the model.A slight discrepancy within a range of 10 percent has been found,which is rather expected due to factors such as grid refinement and,data volume and variability.Overall,the proposed ML model demonstrates excellent compatibility with the FE model results.This study is an attempt to pick relevant ML techniques to develop a purpose-built framework that has the potential to validate the rock slope stability obtained using the traditional methods.展开更多
Diabetic Retinopathy(DR)is a critical disorder that affects the retina due to the constant rise in diabetics and remains the major cause of blindness across the world.Early detection and timely treatment are essential...Diabetic Retinopathy(DR)is a critical disorder that affects the retina due to the constant rise in diabetics and remains the major cause of blindness across the world.Early detection and timely treatment are essential to mitigate the effects of DR,such as retinal damage and vision impairment.Several conventional approaches have been proposed to detect DR early and accurately,but they are limited by data imbalance,interpretability,overfitting,convergence time,and other issues.To address these drawbacks and improve DR detection accurately,a distributed Explainable Convolutional Neural network-enabled Light Gradient Boosting Machine(DE-ExLNN)is proposed in this research.The model combines an explainable Convolutional Neural Network(CNN)and Light Gradient Boosting Machine(LightGBM),achieving highly accurate outcomes in DR detection.LightGBM serves as the detection model,and the inclusion of an explainable CNN addresses issues that conventional CNN classifiers could not resolve.A custom dataset was created for this research,containing both fundus and OCTA images collected from a realtime environment,providing more accurate results compared to standard conventional DR datasets.The custom dataset demonstrates notable accuracy,sensitivity,specificity,and Matthews Correlation Coefficient(MCC)scores,underscoring the effectiveness of this approach.Evaluations against other standard datasets achieved an accuracy of 93.94%,sensitivity of 93.90%,specificity of 93.99%,and MCC of 93.88%for fundus images.For OCTA images,the results obtained an accuracy of 95.30%,sensitivity of 95.50%,specificity of 95.09%,andMCC of 95%.Results prove that the combination of explainable CNN and LightGBMoutperforms othermethods.The inclusion of distributed learning enhances the model’s efficiency by reducing time consumption and complexity while facilitating feature extraction.展开更多
Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced mach...Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.展开更多
Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis technique...Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis techniques based on an advanced machine learning algorithm. An ensemble classification model based on Aviation Safety Reporting System(ASRS) has been proposed to analyze aviation safety targeting the people injured in the system.The ensemble classification model shall contain two modules: the data-driven module consisting of data cleaning, feature selection,and imbalanced data division and reorganization, and the modeldriven module stacked by Random Forest(RF), XGBoost(XGB),and Light Gradient Boosting Machine(LGBM) separately. The results indicate that the ensemble model could solve the data imbalance while vastly improving accuracy. LGBM illustrates higher accuracy and faster run in the analysis of a single model of the ASRS-based imbalanced data, while the ensemble model has the best performance in classification at the same time. The ensemble model proposed for imbalanced data classification can provide a certain reference for similar data processing while improving the safety of civil aviation.展开更多
The rapid advancement of artificial intelligence has introduced new vitality into cemented paste backfill(CPB)technology.However,current machine learning models for CPB-strength prediction are generally forward-only a...The rapid advancement of artificial intelligence has introduced new vitality into cemented paste backfill(CPB)technology.However,current machine learning models for CPB-strength prediction are generally forward-only and single-output,and lack clarity on multi-feature interactions and an integrated full-process design framework.To this end,this study proposes a light gradient boosting machine(LightGBM)model,optimized by Optuna,for predicting the CPB strength at multiple curing ages(3,7,and 28 d).The model dataset comprised the unconfined compressive strength(UCS)results of 738 CPB specimens prepared with various types of tailings and mix proportions.SHapley Additive exPlanations(SHAP)were employed to elucidate the influence patterns and relative importance of input features while inherently considering their complex,multi-feature interaction effects on the output of the model.Additionally,a simulated annealing(SA)algorithm was integrated with the predictive model to enable the inverse process from the target UCS value to the optimal material-mix proportions.The results demonstrated the effectiveness of Optuna in hyperparameter tuning,leading to an optimized LightGBM model that accurately predicted the multi-age CPB UCS(R^(2)>0.98).SHAP analysis identified key features,notably the high correlation of the water-cement ratio and CaO content in the tailings with the CPB strength.The SA algorithm effectively provided optimal CPB mix proportions that met the target 28-d UCS value,balancing multiple conflicting objectives such as the solid content and cement dosage.Finally,a user-friendly graphical user interface was developed to integrate these models and provide an accessible,visual,machinelearning-based CPB-strength design tool for mining engineers.展开更多
The popularity of news,which conveys newsworthy events which occur during day to people,is substantially important for the spectator or audience.People interact with news website and share news links or their opinions...The popularity of news,which conveys newsworthy events which occur during day to people,is substantially important for the spectator or audience.People interact with news website and share news links or their opinions.This study uses supervised learning based machine learning techniques in order to predict news popularity in social media sources.These techniques consist of basically two phrases:a)the training data is sent as input to the classifier algorithm,b)the performance of prelearned algorithm is tested on the testing data.And so,a knowledge discovery from the data is performed.In this context,firstly,twelve datasets from a set of data are obtained within the frame of four categories:Economic,Microsoft,Obama and Palestine.Second,news popularity prediction in social network services is carried out by utilizing Gradient Boosted Trees,Multi-Layer Perceptron and Random Forest learning algorithms.The prediction performances of all algorithms are examined by considering Mean Absolute Error,Root Mean Squared Error and the R-squared evaluation metrics.The results show that most of the models designed by using these algorithms are proved to be applicable for this subject.Consequently,a comprehensive study for the news prediction is presented,using different techniques,drawing conclusions about the performances of algorithms in this study.展开更多
基金supported by the National Natural Science Foundation of China Project(No.62302540)please visit their website at https://www.nsfc.gov.cn/(accessed on 18 June 2024).
文摘The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.
基金supported by the National Key R&D Program of China (No. 2022YFA1603300)the Romanian Ministry of Research,Innovation and Digitalization under Contract PN 23.21.01.06+1 种基金The ELI-RO project with Contract ELI-RORDI-2024-008 (AMAP)a grant from the Romanian Ministry of Research,Innovation and Digitization,CNCS-UEFIS-CDI,with project numbers PN-Ⅲ-P4-PCE-2021-1014, PN-Ⅲ-P4-PCE-2021-0595, and PN-Ⅲ-P1-1.1-TE2021-1464 within PNCDI Ⅲ
文摘The first 2^(+)excited states of the nucleus directly reflect the interaction between the shell structure and the nucleus,providing insights into the validity of the shell model and nuclear structure characteristics.Although the features of the first 2^(+)excited states can be measured for stable nuclei and calculated using nuclear models,significant uncertainty remains.This study employs a machine learning model based on a light gradient boosting machine(LightGBM)to investigate the first 2^(+)excited states.Specifically,the training of the LightGBM algorithm and the prediction of the first 2^(+)properties of 642 nuclei are presented.Furthermore,detailed comparisons of the LightGBM predictions were performed with available experimental data,shell model calculations,and Bayesian neural network predictions.The results revealed that the average difference between the LightGBM predictions and the experimental data was 18 times smaller than that obtained by the shell model and only 70%of the BNN prediction results.Considering Mg,Ca,Kr,Sm,and Pb isotopes as examples,it was also observed that LightGBM can effectively reproduce the magic number mutation caused by shell effects,with the energy being as low as 0.04 MeV due to shape coexistence.Therefore,we believe that leveraging LightGBM-based machine learning can profoundly enhance our insights into nuclear structures and provide new avenues for nuclear physics research.
文摘Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.
文摘In this paper,an advanced and optimized Light Gradient Boosting Machine(LGBM)technique is proposed to identify the intrusive activities in the Internet of Things(IoT)network.The followings are the major contributions:i)An optimized LGBM model has been developed for the identification of malicious IoT activities in the IoT network;ii)An efficient evolutionary optimization approach has been adopted for finding the optimal set of hyper-parameters of LGBM for the projected problem.Here,a Genetic Algorithm(GA)with k-way tournament selection and uniform crossover operation is used for efficient exploration of hyper-parameter search space;iii)Finally,the performance of the proposed model is evaluated using state-of-the-art ensemble learning and machine learning-based model to achieve overall generalized performance and efficiency.Simulation outcomes reveal that the proposed approach is superior to other considered methods and proves to be a robust approach to intrusion detection in an IoT environment.
基金This work is supported by the National Nature Science Foundation of China(No.51875100,No.61673108,No.61674133)The authors would like to thank anonymous reviewers and the associate editor,whose constructive comments help improve the presentation of this work.
文摘Coal rock mass instability fracture may result in serious hazards to underground coal mining.Acoustic emissions(AE)stimulated by internal structure fracture should carry lots of favorable information about health condition of rock mass.AE as a sensitive non-destructive test method is gradually utilized to detect anomaly conditions of coal rock.This paper proposes an improved multi-resolution feature to extract AE waveform at different frequency resolutions using Coilflet Wavelet Transform method(CWT).It is further adopt an efficient Light Gradient Boosting Machine(LightGBM)by several cascaded sub weak classifier models to merge AE features at different views of frequency for coal rock anomaly damage recognition.The results denote that the proposed method achieves excellent recognition performance on anomaly damage levels of coal rock.It is an effective method to detect the critical stability further to predict the rock mass bursting in time.
基金supported by the State Key Laboratory of Hydraulic Engineering Simulation and Safety(Tianjin University)(Grant Number HESS-2106),Scientific and Technological Projects of Henan Province(Grant Number 222102320025)Key Scientific Research Project in Colleges and Universities of Henan Province of China(Grant Number 22B570003)+2 种基金National Natural Science Foundation of China(Grant Number 52109040,51739009)Excellent Youth Fund of Henan Province of China(212300410088)Science and Technology Innovation Talents Project of Henan Education Department of China(21HASTIT011).
文摘Global climate change and sea level rise have led to increased losses from flooding.Accurate prediction of floods is essential to mitigating flood losses in coastal cities.Physically based models cannot satisfy the demand for real-time prediction for urban flooding due to their computational complexity.In this study,we proposed a hybrid modeling approach for rapid prediction of urban floods,coupling the physically based model with the light gradient boosting machine(LightGBM)model.A hydrological–hydraulic model was used to provide sufficient data for the LightGBM model based on the personal computer storm water management model(PCSWMM).The variables related to rainfall,tide level,and the location of flood points were used as the input for the LightGBM model.To improve the prediction accuracy,the hyperparameters of the LightGBM model are optimized by grid search algorithm and K-fold cross-validation.Taking Haidian Island,Hainan Province,China as a case study,the optimum values of the learning rate,number of estimators,and number of leaves of the LightGBM model are 0.11,450,and 12,respectively.The Nash-Sutcliffe efficiency coefficient(NSE)of the LightGBM model on the test set is 0.9896,indicating that the LightGBM model has reliable predictions and outperforms random forest(RF),extreme gradient boosting(XGBoost),and k-nearest neighbor(KNN).From the LightGBM model,the variables related to tide level were analyzed as the dominant variables for predicting the inundation depth based on the Gini index in the study area.The proposed LightGBM model provides a scientific reference for flood control in coastal cities considering its superior performance and computational efficiency.
基金supported by Guangzhou Municipal Health Science and Technology General Program (20221A011083)the Key Medical Disciplines and Specialties Program of Guangzhou(2021-2023)Guangdong University Innovation Team Project(2024KCXTD029)。
文摘BACKGROUND:The problem of prolonged emergency department length of stay(EDLOS) is becoming increasingly crucial.This study aims to develop a machine learning(ML) model to predict EDLOS,with EDLOS as the outcome variable and demographic characteristics,triage level,and medical resource utilization as predictive factors.METHODS:A retrospective analysis was performed on the patients who visited the emergency department of the Second Affiliated Hospital of Guangzhou Medical University from March 2019to September 2021,and a total of 321,012 cases were identified.According to the inclusion and exclusion criteria,187,028 cases were finally included in the analysis.ML analysis was performed using R-squared(R^(2)),and the predictive factors and the EDLOS were used as independent variables and dependent variables,respectively,to establish models.The performance evaluation of the ML models was conducted through the utilization of the mean absolute error(MAE),root mean square error(RMSE),and R^(2),enabling an objective comparative analysis.RESULTS:In the comparative analysis of the six ML models,light gradient boosting machine(LightGBM) model demonstrated the lowest MAE(443.519) and RMSE(826.783),and the highest R^(2) value(0.48),indicating better model fit and predictive performance.Among the top 10 predictive factors associated with EDLOS according to the LightGBM model,the emergency waiting time,age,and emergency arrival time had the most significant impact on the EDLOS.CONCLUSION:The LightGBM model suggests that the emergency waiting time,age,and emergency arrival time may be used to predict the EDLOS.
基金supported by the Center for Mining,Electro-Mechanical research of Hanoi University of Mining and Geology(HUMG),Hanoi,Vietnamfinancially supported by the Hunan Provincial Department of Education General Project(19C1744)+1 种基金Hunan Province Science Foundation for Youth Scholars of China fund(2018JJ3510)the Innovation-Driven Project of Central South University(2020CX040)。
文摘Blasting is well-known as an effective method for fragmenting or moving rock in open-pit mines.To evaluate the quality of blasting,the size of rock distribution is used as a critical criterion in blasting operations.A high percentage of oversized rocks generated by blasting operations can lead to economic and environmental damage.Therefore,this study proposed four novel intelligent models to predict the size of rock distribution in mine blasting in order to optimize blasting parameters,as well as the efficiency of blasting operation in open mines.Accordingly,a nature-inspired algorithm(i.e.,firefly algorithm-FFA)and different machine learning algorithms(i.e.,gradient boosting machine(GBM),support vector machine(SVM),Gaussian process(GP),and artificial neural network(ANN))were combined for this aim,abbreviated as FFA-GBM,FFA-SVM,FFA-GP,and FFA-ANN,respectively.Subsequently,predicted results from the abovementioned models were compared with each other using three statistical indicators(e.g.,mean absolute error,root-mean-squared error,and correlation coefficient)and color intensity method.For developing and simulating the size of rock in blasting operations,136 blasting events with their images were collected and analyzed by the Split-Desktop software.In which,111 events were randomly selected for the development and optimization of the models.Subsequently,the remaining 25 blasting events were applied to confirm the accuracy of the proposed models.Herein,blast design parameters were regarded as input variables to predict the size of rock in blasting operations.Finally,the obtained results revealed that the FFA is a robust optimization algorithm for estimating rock fragmentation in bench blasting.Among the models developed in this study,FFA-GBM provided the highest accuracy in predicting the size of fragmented rocks.The other techniques(i.e.,FFA-SVM,FFA-GP,and FFA-ANN)yielded lower computational stability and efficiency.Hence,the FFA-GBM model can be used as a powerful and precise soft computing tool that can be applied to practical engineering cases aiming to improve the quality of blasting and rock fragmentation.
基金financially supported by the National Natural Science Foundation of China(Nos.51974023 and 52374321)the funding of State Key Laboratory of Advanced Metallurgy,University of Science and Technology Beijing(No.41621005)the Youth Science and Technology Innovation Fund of Jianlong Group-University of Science and Technology Beijing(No.20231235).
文摘Accurate prediction of molten steel temperature in the ladle furnace(LF)refining process has an important influence on the quality of molten steel and the control of steelmaking cost.Extensive research on establishing models to predict molten steel temperature has been conducted.However,most researchers focus solely on improving the accuracy of the model,neglecting its explainability.The present study aims to develop a high-precision and explainable model with improved reliability and transparency.The eXtreme gradient boosting(XGBoost)and light gradient boosting machine(LGBM)were utilized,along with bayesian optimization and grey wolf optimiz-ation(GWO),to establish the prediction model.Different performance evaluation metrics and graphical representations were applied to compare the optimal XGBoost and LGBM models obtained through varying hyperparameter optimization methods with the other models.The findings indicated that the GWO-LGBM model outperformed other methods in predicting molten steel temperature,with a high pre-diction accuracy of 89.35%within the error range of±5°C.The model’s learning/decision process was revealed,and the influence degree of different variables on the molten steel temperature was clarified using the tree structure visualization and SHapley Additive exPlana-tions(SHAP)analysis.Consequently,the explainability of the optimal GWO-LGBM model was enhanced,providing reliable support for prediction results.
文摘Artificial lift plays an important role in petroleum industry to sustain production flowrate and to extend the lifespan of oil wells. One of the most popular artificial lift methods is Electric Submersible Pumps (ESP) because it can produce high flowrate even for wells with great depth. Although ESPs are designed to work under extreme conditions such as corrosion, high temperatures and high pressure, their lifespan is much shorter than expected. ESP failures lead to production loss and increase the cost of replacement, because the cost of intervention work for ESP is much higher than for other artificial lift methods, especially for offshore wells. Therefore, the prediction of ESP failures is highly valuable in oil production and contribute</span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">s</span></span></span></span><span><span><span><span style="font-family:""><span style="font-family:Verdana;"> a lot to the design, construction and operation of oil wells. The contribution of this study is to use 3 machine learning algorithms, which are Decision Tree, Random Forest and Gradient Boosting Machine, to build predictive models for ESP lifespan while using both dynamic and static ESP parameters. The results of these </span><span style="font-family:Verdana;">models were compared to find out the most suitable model for </span></span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">the </span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">prediction of ESP life cycle. In addition, this study also evaluated the influence factor of various operating param</span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">e</span></span></span></span><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;"><span style="font-family:Verdana;">ters to forecast the most impact parameters on the duration of ESP. The results of this study can provide a better understanding of ESP behavior so that early actions can be realized to prevent potential ESP failures</span></span></span></span><span style="font-family:Verdana;">.
文摘Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network.On the other hand,these advantages create a more vulnerable environment with substantial risks,culminating in network difficulties,system paralysis,online banking frauds,and robberies.These issues have a significant detrimental impact on organizations,enterprises,and even economies.Accuracy,high performance,and real-time systems are necessary to achieve this goal.Using a SDN to extend intelligent machine learning methodologies in an Intrusion Detection System(IDS)has stimulated the interest of numerous research investigators over the last decade.In this paper,a novel HFS-LGBM IDS is proposed for SDN.First,the Hybrid Feature Selection algorithm consisting of two phases is applied to reduce the data dimension and to obtain an optimal feature subset.In thefirst phase,the Correlation based Feature Selection(CFS)algorithm is used to obtain the feature subset.The optimal feature set is obtained by applying the Random Forest Recursive Feature Elimination(RF-RFE)in the second phase.A LightGBM algorithm is then used to detect and classify different types of attacks.The experimental results based on NSL-KDD dataset show that the proposed system produces outstanding results compared to the existing methods in terms of accuracy,precision,recall and f-measure.
文摘Slope instability in hilly regions is a highly complex phenomenon,with triggering factors ranging from natural events to anthropogenic activities.Such failures hit disastrous losses both in terms of material as well as life.It is necessary to comprehend the mechanism of these failures to mitigate such events and also to predict their vulnerability for better preparedness.Significant advancements have already been done in the area of slope stability analysis,and scores of valued tools and techniques have been developed,such as limit equilibrium methods,finite element and finite difference methods,stochastic methods,and several of their combinations.In this study,an attempt has been made to capitalize on machine learning tools to predict the factor of safety of rock slope stability in hilly regions.Three road-cut slopes have been considered and their stability is determined using both finite element(FE)and machine learning(ML)techniques.The idea to intertwine these approaches is to supplement each other and enhance the reliability of the results.The geotechnical data was acquired through field investigation trips to the adopted mountainous sites.Since the slopes at the site are rocky,in the FE model,the Generalized Hoek Brown(GHB)material model with shear strength reduction technique have been used.In the implementation of ML models,Random Forest(RF)and Gradient Boosting Machine(GBM)models have been used.For the training of the ML model,ample published data has been utilized,while for testing the ML model,the data from the current slope site is used.The analysis in ML model is carried out in three stages:a)without Hyperparameter tuning,b)with Hyperparameter tuning using GridSearchCV,and c)Pipeline incorporating Recursive Feature Elimination(RFE).Performance metrics,including Mean Absolute Error(MAE),Mean Squared Error(MSE),and R^(2) score,were evaluated to assess the accuracy of the model.A slight discrepancy within a range of 10 percent has been found,which is rather expected due to factors such as grid refinement and,data volume and variability.Overall,the proposed ML model demonstrates excellent compatibility with the FE model results.This study is an attempt to pick relevant ML techniques to develop a purpose-built framework that has the potential to validate the rock slope stability obtained using the traditional methods.
基金funded by the Centre for Advanced Modelling and Geospatial Information Systems(CAMGIS),Faculty of Engineering and IT,University of Technology Sydneysupported by the Research Funding Program,King Saud University,Riyadh,Saudi Arabia,under Project Ongoing Research Funding program(ORF-2025-14).
文摘Diabetic Retinopathy(DR)is a critical disorder that affects the retina due to the constant rise in diabetics and remains the major cause of blindness across the world.Early detection and timely treatment are essential to mitigate the effects of DR,such as retinal damage and vision impairment.Several conventional approaches have been proposed to detect DR early and accurately,but they are limited by data imbalance,interpretability,overfitting,convergence time,and other issues.To address these drawbacks and improve DR detection accurately,a distributed Explainable Convolutional Neural network-enabled Light Gradient Boosting Machine(DE-ExLNN)is proposed in this research.The model combines an explainable Convolutional Neural Network(CNN)and Light Gradient Boosting Machine(LightGBM),achieving highly accurate outcomes in DR detection.LightGBM serves as the detection model,and the inclusion of an explainable CNN addresses issues that conventional CNN classifiers could not resolve.A custom dataset was created for this research,containing both fundus and OCTA images collected from a realtime environment,providing more accurate results compared to standard conventional DR datasets.The custom dataset demonstrates notable accuracy,sensitivity,specificity,and Matthews Correlation Coefficient(MCC)scores,underscoring the effectiveness of this approach.Evaluations against other standard datasets achieved an accuracy of 93.94%,sensitivity of 93.90%,specificity of 93.99%,and MCC of 93.88%for fundus images.For OCTA images,the results obtained an accuracy of 95.30%,sensitivity of 95.50%,specificity of 95.09%,andMCC of 95%.Results prove that the combination of explainable CNN and LightGBMoutperforms othermethods.The inclusion of distributed learning enhances the model’s efficiency by reducing time consumption and complexity while facilitating feature extraction.
基金supported by the National Natural Science Foundation of China Civil Aviation Joint Fund (U1833110)Research on the Dual Prevention Mechanism and Intelligent Management Technology f or Civil Aviation Safety Risks (YK23-03-05)。
文摘Aviation accidents are currently one of the leading causes of significant injuries and deaths worldwide. This entices researchers to investigate aircraft safety using data analysis approaches based on an advanced machine learning algorithm.To assess aviation safety and identify the causes of incidents, a classification model with light gradient boosting machine (LGBM)based on the aviation safety reporting system (ASRS) has been developed. It is improved by k-fold cross-validation with hybrid sampling model (HSCV), which may boost classification performance and maintain data balance. The results show that employing the LGBM-HSCV model can significantly improve accuracy while alleviating data imbalance. Vertical comparison with other cross-validation (CV) methods and lateral comparison with different fold times comprise the comparative approach. Aside from the comparison, two further CV approaches based on the improved method in this study are discussed:one with a different sampling and folding order, and the other with more CV. According to the assessment indices with different methods, the LGBMHSCV model proposed here is effective at detecting incident causes. The improved model for imbalanced data categorization proposed may serve as a point of reference for similar data processing, and the model’s accurate identification of civil aviation incident causes can assist to improve civil aviation safety.
基金Supported by the Joint Fund of National Natural Science Foundation of China and Civil Aviation Administration of China (U1833110)。
文摘Nowadays aviation accidents have become one of the major causes of severe injuries and fatalities around the world. This attracts the research community to look into aviation safety by applying data analysis techniques based on an advanced machine learning algorithm. An ensemble classification model based on Aviation Safety Reporting System(ASRS) has been proposed to analyze aviation safety targeting the people injured in the system.The ensemble classification model shall contain two modules: the data-driven module consisting of data cleaning, feature selection,and imbalanced data division and reorganization, and the modeldriven module stacked by Random Forest(RF), XGBoost(XGB),and Light Gradient Boosting Machine(LGBM) separately. The results indicate that the ensemble model could solve the data imbalance while vastly improving accuracy. LGBM illustrates higher accuracy and faster run in the analysis of a single model of the ASRS-based imbalanced data, while the ensemble model has the best performance in classification at the same time. The ensemble model proposed for imbalanced data classification can provide a certain reference for similar data processing while improving the safety of civil aviation.
基金funded by the Basic Research Project of the Yunnan Provincial Department of Science and Technology(No.202301AU070185)National Natural Science Foundation of China(No.52074137)Yunnan Fundamental Research Projects(No.202201AT070151)
文摘The rapid advancement of artificial intelligence has introduced new vitality into cemented paste backfill(CPB)technology.However,current machine learning models for CPB-strength prediction are generally forward-only and single-output,and lack clarity on multi-feature interactions and an integrated full-process design framework.To this end,this study proposes a light gradient boosting machine(LightGBM)model,optimized by Optuna,for predicting the CPB strength at multiple curing ages(3,7,and 28 d).The model dataset comprised the unconfined compressive strength(UCS)results of 738 CPB specimens prepared with various types of tailings and mix proportions.SHapley Additive exPlanations(SHAP)were employed to elucidate the influence patterns and relative importance of input features while inherently considering their complex,multi-feature interaction effects on the output of the model.Additionally,a simulated annealing(SA)algorithm was integrated with the predictive model to enable the inverse process from the target UCS value to the optimal material-mix proportions.The results demonstrated the effectiveness of Optuna in hyperparameter tuning,leading to an optimized LightGBM model that accurately predicted the multi-age CPB UCS(R^(2)>0.98).SHAP analysis identified key features,notably the high correlation of the water-cement ratio and CaO content in the tailings with the CPB strength.The SA algorithm effectively provided optimal CPB mix proportions that met the target 28-d UCS value,balancing multiple conflicting objectives such as the solid content and cement dosage.Finally,a user-friendly graphical user interface was developed to integrate these models and provide an accessible,visual,machinelearning-based CPB-strength design tool for mining engineers.
文摘The popularity of news,which conveys newsworthy events which occur during day to people,is substantially important for the spectator or audience.People interact with news website and share news links or their opinions.This study uses supervised learning based machine learning techniques in order to predict news popularity in social media sources.These techniques consist of basically two phrases:a)the training data is sent as input to the classifier algorithm,b)the performance of prelearned algorithm is tested on the testing data.And so,a knowledge discovery from the data is performed.In this context,firstly,twelve datasets from a set of data are obtained within the frame of four categories:Economic,Microsoft,Obama and Palestine.Second,news popularity prediction in social network services is carried out by utilizing Gradient Boosted Trees,Multi-Layer Perceptron and Random Forest learning algorithms.The prediction performances of all algorithms are examined by considering Mean Absolute Error,Root Mean Squared Error and the R-squared evaluation metrics.The results show that most of the models designed by using these algorithms are proved to be applicable for this subject.Consequently,a comprehensive study for the news prediction is presented,using different techniques,drawing conclusions about the performances of algorithms in this study.