针对代价敏感学习问题,研究boosting算法的代价敏感扩展。提出一种基于代价敏感采样的代价敏感boosting学习方法,通过在原始boosting每轮迭代中引入代价敏感采样,最小化代价敏感损失期望。基于上述学习框架,推导出两种代价敏感boosting...针对代价敏感学习问题,研究boosting算法的代价敏感扩展。提出一种基于代价敏感采样的代价敏感boosting学习方法,通过在原始boosting每轮迭代中引入代价敏感采样,最小化代价敏感损失期望。基于上述学习框架,推导出两种代价敏感boosting算法,同时,揭示并解释已有算法的不稳定本质。在加州大学欧文分校(University of California,Irvine,UCI)数据集和麻省理工学院生物和计算学习中心(Center for Biological&Computational Learning,CBCL)人脸数据集上的实验结果表明,对于代价敏感分类问题,代价敏感采样boosting算法优于原始boosting和已有代价敏感boosting算法。展开更多
BACKGROUND Severe esophagogastric varices(EGVs)significantly affect prognosis of patients with hepatitis B because of the risk of life-threatening hemorrhage.Endoscopy is the gold standard for EGV detection but it is ...BACKGROUND Severe esophagogastric varices(EGVs)significantly affect prognosis of patients with hepatitis B because of the risk of life-threatening hemorrhage.Endoscopy is the gold standard for EGV detection but it is invasive,costly and carries risks.Noninvasive predictive models using ultrasound and serological markers are essential for identifying high-risk patients and optimizing endoscopy utilization.Machine learning(ML)offers a powerful approach to analyze complex clinical data and improve predictive accuracy.This study hypothesized that ML models,utilizing noninvasive ultrasound and serological markers,can accurately predict the risk of EGVs in hepatitis B patients,thereby improving clinical decisionmaking.AIM To construct and validate a noninvasive predictive model using ML for EGVs in hepatitis B patients.METHODS We retrospectively collected ultrasound and serological data from 310 eligible cases,randomly dividing them into training(80%)and validation(20%)groups.Eleven ML algorithms were used to build predictive models.The performance of the models was evaluated using the area under the curve and decision curve analysis.The best-performing model was further analyzed using SHapley Additive exPlanation to interpret feature importance.RESULTS Among the 310 patients,124 were identified as high-risk for EGVs.The extreme gradient boosting model demonstrated the best performance,achieving an area under the curve of 0.96 in the validation set.The model also exhibited high sensitivity(78%),specificity(94%),positive predictive value(84%),negative predictive value(88%),F1 score(83%),and overall accuracy(86%).The top four predictive variables were albumin,prothrombin time,portal vein flow velocity and spleen stiffness.A web-based version of the model was developed for clinical use,providing real-time predictions for high-risk patients.CONCLUSION We identified an efficient noninvasive predictive model using extreme gradient boosting for EGVs among hepatitis B patients.The model,presented as a web application,has potential for screening high-risk EGV patients and can aid clinicians in optimizing the use of endoscopy.展开更多
Cyber-physical systems(CPS)represent a sophisticated integration of computational and physical components that power critical applications such as smart manufacturing,healthcare,and autonomous infrastructure.However,t...Cyber-physical systems(CPS)represent a sophisticated integration of computational and physical components that power critical applications such as smart manufacturing,healthcare,and autonomous infrastructure.However,their extensive reliance on internet connectivity makes them increasingly susceptible to cyber threats,potentially leading to operational failures and data breaches.Furthermore,CPS faces significant threats related to unauthorized access,improper management,and tampering of the content it generates.In this paper,we propose an intrusion detection system(IDS)optimized for CPS environments using a hybrid approach by combining a natureinspired feature selection scheme,such as Grey Wolf Optimization(GWO),in connection with the emerging Light Gradient Boosting Machine(LightGBM)classifier,named as GWO-LightGBM.While gradient boosting methods have been explored in prior IDS research,our novelty lies in proposing a hybrid approach targeting CPS-specific operational constraints,such as low-latency response and accurate detection of rare and critical attack types.We evaluate GWO-LightGBM against GWO-XGBoost,GWO-CatBoost,and an artificial neural network(ANN)baseline using the NSL-KDD and CIC-IDS-2017 benchmark datasets.The proposed models are assessed across multiple metrics,including accuracy,precision,recall,and F1-score,with an emphasis on class-wise performance and training efficiency.The proposed GWO-LightGBM model achieves the highest overall accuracy(99.73%)for NSL-KDD and(99.61%)for CIC-IDS-2017,demonstrating superior performance in detecting minority classes such as Remote-to-Local(R2L)and Other attacks—commonly overlooked by other classifiers.Moreover,the proposed model consumes lower training time,highlighting its practical feasibility and scalability for real-time CPS deployment.展开更多
This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging...This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.展开更多
BACKGROUND Diabetic foot ulcer(DFU)is a serious and destructive complication of diabetes,which has a high amputation rate and carries a huge social burden.Early detection of risk factors and intervention are essential...BACKGROUND Diabetic foot ulcer(DFU)is a serious and destructive complication of diabetes,which has a high amputation rate and carries a huge social burden.Early detection of risk factors and intervention are essential to reduce amputation rates.With the development of artificial intelligence technology,efficient interpretable predictive models can be generated in clinical practice to improve DFU care.AIM To develop and validate an interpretable model for predicting amputation risk in DFU patients.METHODS This retrospective study collected basic data from 599 patients with DFU in Beijing Shijitan Hospital between January 2015 and June 2024.The data set was randomly divided into a training set and test set with fivefold cross-validation.Three binary variable models were built with the eXtreme Gradient Boosting(XGBoost)algorithm to input risk factors that predict amputation probability.The model performance was optimized by adjusting the super parameters.The pre-dictive performance of the three models was expressed by sensitivity,specificity,positive predictive value,negative predictive value and area under the curve(AUC).Visualization of the prediction results was realized through SHapley Additive exPlanation(SHAP).RESULTS A total of 157(26.2%)patients underwent minor amputation during hospitalization and 50(8.3%)had major amputation.All three XGBoost models demonstrated good discriminative ability,with AUC values>0.7.The model for predicting major amputation achieved the highest performance[AUC=0.977,95%confidence interval(CI):0.956-0.998],followed by the minor amputation model(AUC=0.800,95%CI:0.762-0.838)and the non-amputation model(AUC=0.772,95%CI:0.730-0.814).Feature importance ranking of the three models revealed the risk factors for minor and major amputation.Wagner grade 4/5,osteomyelitis,and high C-reactive protein were all considered important predictive variables.CONCLUSION XGBoost effectively predicts diabetic foot amputation risk and provides interpretable insights to support person-alized treatment decisions.展开更多
To get large dissymmetric factor(g_(lum))of organic circularly polarized luminescence(CPL)materials is still a great challenge.Although helical chirality and planar chirality are usual efficient access to enhancement ...To get large dissymmetric factor(g_(lum))of organic circularly polarized luminescence(CPL)materials is still a great challenge.Although helical chirality and planar chirality are usual efficient access to enhancement of CPL,they are not combined together to boost CPL.Here,a new tetraphenylethylene(TPE)tetracycle acid helicate bearing both helical chirality and planar chirality was designed and synthesized.Uniquely,synergy of the helical chirality and planar chirality was used to boost CPL signals both in solution and in helical self-assemblies.In the presence of octadecylamine,the TPE helicate could form helical nanofibers that emitted strong CPL signals with an absolute g_(lum)value up to 0.237.Exceptionally,followed by addition of para-phenylenediamine,the g_(lum)value was successively increased to 0.387 due to formation of bigger helical nanofibers.Compared with that of TPE helicate itself,the CPL signal of the self-assemblies was not only magnified by 104-fold but also inversed,which was very rare result for CPL-active materials.Surprisingly,the interaction of TPE helicate with xylylenediamine even gave a gel,which was transformed into suspension by shaking.Unexpectedly,the suspension showed 40-fold stronger CPL signals than the gel with signal direction inversion each other.Using synergy of the helical chirality and planar chirality to significantly boost CPL intensity provides a new strategy in preparation of organic CPL materials having very large g_(lum)value.展开更多
In a video that has mesmerized audiences worldwide,a humanoid robot displays a magical move of self-defense,executing a flawless 720-degree spinning kick to knock out a baton held in a human hand.This is Chinese compa...In a video that has mesmerized audiences worldwide,a humanoid robot displays a magical move of self-defense,executing a flawless 720-degree spinning kick to knock out a baton held in a human hand.This is Chinese company Unitree Robotics’G1 robot,embodying the innovation that has propelled China forward as the world’s second largest economy.展开更多
The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial int...The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.展开更多
The first 2^(+)excited states of the nucleus directly reflect the interaction between the shell structure and the nucleus,providing insights into the validity of the shell model and nuclear structure characteristics.A...The first 2^(+)excited states of the nucleus directly reflect the interaction between the shell structure and the nucleus,providing insights into the validity of the shell model and nuclear structure characteristics.Although the features of the first 2^(+)excited states can be measured for stable nuclei and calculated using nuclear models,significant uncertainty remains.This study employs a machine learning model based on a light gradient boosting machine(LightGBM)to investigate the first 2^(+)excited states.Specifically,the training of the LightGBM algorithm and the prediction of the first 2^(+)properties of 642 nuclei are presented.Furthermore,detailed comparisons of the LightGBM predictions were performed with available experimental data,shell model calculations,and Bayesian neural network predictions.The results revealed that the average difference between the LightGBM predictions and the experimental data was 18 times smaller than that obtained by the shell model and only 70%of the BNN prediction results.Considering Mg,Ca,Kr,Sm,and Pb isotopes as examples,it was also observed that LightGBM can effectively reproduce the magic number mutation caused by shell effects,with the energy being as low as 0.04 MeV due to shape coexistence.Therefore,we believe that leveraging LightGBM-based machine learning can profoundly enhance our insights into nuclear structures and provide new avenues for nuclear physics research.展开更多
Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradien...Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.展开更多
Boosting is one of the most representational ensemble prediction methods. It can be divided into two se-ries: Boost-by-majority and Adaboost. This paper briefly introduces the research status of Boosting and one of it...Boosting is one of the most representational ensemble prediction methods. It can be divided into two se-ries: Boost-by-majority and Adaboost. This paper briefly introduces the research status of Boosting and one of its seri-als-AdaBoost,analyzes the typical algorithms of AdaBoost.展开更多
文摘针对代价敏感学习问题,研究boosting算法的代价敏感扩展。提出一种基于代价敏感采样的代价敏感boosting学习方法,通过在原始boosting每轮迭代中引入代价敏感采样,最小化代价敏感损失期望。基于上述学习框架,推导出两种代价敏感boosting算法,同时,揭示并解释已有算法的不稳定本质。在加州大学欧文分校(University of California,Irvine,UCI)数据集和麻省理工学院生物和计算学习中心(Center for Biological&Computational Learning,CBCL)人脸数据集上的实验结果表明,对于代价敏感分类问题,代价敏感采样boosting算法优于原始boosting和已有代价敏感boosting算法。
基金Supported by the Agency Natural Science Foundation of Fujian Province,China,No.2022J011285 and No.2023J011480.
文摘BACKGROUND Severe esophagogastric varices(EGVs)significantly affect prognosis of patients with hepatitis B because of the risk of life-threatening hemorrhage.Endoscopy is the gold standard for EGV detection but it is invasive,costly and carries risks.Noninvasive predictive models using ultrasound and serological markers are essential for identifying high-risk patients and optimizing endoscopy utilization.Machine learning(ML)offers a powerful approach to analyze complex clinical data and improve predictive accuracy.This study hypothesized that ML models,utilizing noninvasive ultrasound and serological markers,can accurately predict the risk of EGVs in hepatitis B patients,thereby improving clinical decisionmaking.AIM To construct and validate a noninvasive predictive model using ML for EGVs in hepatitis B patients.METHODS We retrospectively collected ultrasound and serological data from 310 eligible cases,randomly dividing them into training(80%)and validation(20%)groups.Eleven ML algorithms were used to build predictive models.The performance of the models was evaluated using the area under the curve and decision curve analysis.The best-performing model was further analyzed using SHapley Additive exPlanation to interpret feature importance.RESULTS Among the 310 patients,124 were identified as high-risk for EGVs.The extreme gradient boosting model demonstrated the best performance,achieving an area under the curve of 0.96 in the validation set.The model also exhibited high sensitivity(78%),specificity(94%),positive predictive value(84%),negative predictive value(88%),F1 score(83%),and overall accuracy(86%).The top four predictive variables were albumin,prothrombin time,portal vein flow velocity and spleen stiffness.A web-based version of the model was developed for clinical use,providing real-time predictions for high-risk patients.CONCLUSION We identified an efficient noninvasive predictive model using extreme gradient boosting for EGVs among hepatitis B patients.The model,presented as a web application,has potential for screening high-risk EGV patients and can aid clinicians in optimizing the use of endoscopy.
基金supported by Culture,Sports and Tourism R&D Program through the Korea Creative Content Agency grant funded by the Ministry of Culture,Sports and Tourism in 2024(Project Name:Global Talent Training Program for Copyright Management Technology in Game Contents,Project Number:RS-2024-00396709,Contribution Rate:100%).
文摘Cyber-physical systems(CPS)represent a sophisticated integration of computational and physical components that power critical applications such as smart manufacturing,healthcare,and autonomous infrastructure.However,their extensive reliance on internet connectivity makes them increasingly susceptible to cyber threats,potentially leading to operational failures and data breaches.Furthermore,CPS faces significant threats related to unauthorized access,improper management,and tampering of the content it generates.In this paper,we propose an intrusion detection system(IDS)optimized for CPS environments using a hybrid approach by combining a natureinspired feature selection scheme,such as Grey Wolf Optimization(GWO),in connection with the emerging Light Gradient Boosting Machine(LightGBM)classifier,named as GWO-LightGBM.While gradient boosting methods have been explored in prior IDS research,our novelty lies in proposing a hybrid approach targeting CPS-specific operational constraints,such as low-latency response and accurate detection of rare and critical attack types.We evaluate GWO-LightGBM against GWO-XGBoost,GWO-CatBoost,and an artificial neural network(ANN)baseline using the NSL-KDD and CIC-IDS-2017 benchmark datasets.The proposed models are assessed across multiple metrics,including accuracy,precision,recall,and F1-score,with an emphasis on class-wise performance and training efficiency.The proposed GWO-LightGBM model achieves the highest overall accuracy(99.73%)for NSL-KDD and(99.61%)for CIC-IDS-2017,demonstrating superior performance in detecting minority classes such as Remote-to-Local(R2L)and Other attacks—commonly overlooked by other classifiers.Moreover,the proposed model consumes lower training time,highlighting its practical feasibility and scalability for real-time CPS deployment.
基金supported by a National Research Foundation of Korea(NRF)grant funded by the Korean government(MSIT)(RS-2023-00222536).
文摘This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.
文摘BACKGROUND Diabetic foot ulcer(DFU)is a serious and destructive complication of diabetes,which has a high amputation rate and carries a huge social burden.Early detection of risk factors and intervention are essential to reduce amputation rates.With the development of artificial intelligence technology,efficient interpretable predictive models can be generated in clinical practice to improve DFU care.AIM To develop and validate an interpretable model for predicting amputation risk in DFU patients.METHODS This retrospective study collected basic data from 599 patients with DFU in Beijing Shijitan Hospital between January 2015 and June 2024.The data set was randomly divided into a training set and test set with fivefold cross-validation.Three binary variable models were built with the eXtreme Gradient Boosting(XGBoost)algorithm to input risk factors that predict amputation probability.The model performance was optimized by adjusting the super parameters.The pre-dictive performance of the three models was expressed by sensitivity,specificity,positive predictive value,negative predictive value and area under the curve(AUC).Visualization of the prediction results was realized through SHapley Additive exPlanation(SHAP).RESULTS A total of 157(26.2%)patients underwent minor amputation during hospitalization and 50(8.3%)had major amputation.All three XGBoost models demonstrated good discriminative ability,with AUC values>0.7.The model for predicting major amputation achieved the highest performance[AUC=0.977,95%confidence interval(CI):0.956-0.998],followed by the minor amputation model(AUC=0.800,95%CI:0.762-0.838)and the non-amputation model(AUC=0.772,95%CI:0.730-0.814).Feature importance ranking of the three models revealed the risk factors for minor and major amputation.Wagner grade 4/5,osteomyelitis,and high C-reactive protein were all considered important predictive variables.CONCLUSION XGBoost effectively predicts diabetic foot amputation risk and provides interpretable insights to support person-alized treatment decisions.
基金National Natural Science Foundation of China(Nos.22072050,22372066 and 22301090)the Open Research Fund(No.2024JYBKF05)of Key Laboratory of Material Chemistry for Energy Conversion and Storage(HUST)Ministry of Educationthe China Postdoctoral Science Foundation(No.2023M731189)for financial support,and thank the Analytical and Testing Centre at Huazhong University of Science and Technology for measurement.
文摘To get large dissymmetric factor(g_(lum))of organic circularly polarized luminescence(CPL)materials is still a great challenge.Although helical chirality and planar chirality are usual efficient access to enhancement of CPL,they are not combined together to boost CPL.Here,a new tetraphenylethylene(TPE)tetracycle acid helicate bearing both helical chirality and planar chirality was designed and synthesized.Uniquely,synergy of the helical chirality and planar chirality was used to boost CPL signals both in solution and in helical self-assemblies.In the presence of octadecylamine,the TPE helicate could form helical nanofibers that emitted strong CPL signals with an absolute g_(lum)value up to 0.237.Exceptionally,followed by addition of para-phenylenediamine,the g_(lum)value was successively increased to 0.387 due to formation of bigger helical nanofibers.Compared with that of TPE helicate itself,the CPL signal of the self-assemblies was not only magnified by 104-fold but also inversed,which was very rare result for CPL-active materials.Surprisingly,the interaction of TPE helicate with xylylenediamine even gave a gel,which was transformed into suspension by shaking.Unexpectedly,the suspension showed 40-fold stronger CPL signals than the gel with signal direction inversion each other.Using synergy of the helical chirality and planar chirality to significantly boost CPL intensity provides a new strategy in preparation of organic CPL materials having very large g_(lum)value.
文摘In a video that has mesmerized audiences worldwide,a humanoid robot displays a magical move of self-defense,executing a flawless 720-degree spinning kick to knock out a baton held in a human hand.This is Chinese company Unitree Robotics’G1 robot,embodying the innovation that has propelled China forward as the world’s second largest economy.
基金supported by the National Natural Science Foundation of China Project(No.62302540)please visit their website at https://www.nsfc.gov.cn/(accessed on 18 June 2024).
文摘The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.
基金supported by the National Key R&D Program of China (No. 2022YFA1603300)the Romanian Ministry of Research,Innovation and Digitalization under Contract PN 23.21.01.06+1 种基金The ELI-RO project with Contract ELI-RORDI-2024-008 (AMAP)a grant from the Romanian Ministry of Research,Innovation and Digitization,CNCS-UEFIS-CDI,with project numbers PN-Ⅲ-P4-PCE-2021-1014, PN-Ⅲ-P4-PCE-2021-0595, and PN-Ⅲ-P1-1.1-TE2021-1464 within PNCDI Ⅲ
文摘The first 2^(+)excited states of the nucleus directly reflect the interaction between the shell structure and the nucleus,providing insights into the validity of the shell model and nuclear structure characteristics.Although the features of the first 2^(+)excited states can be measured for stable nuclei and calculated using nuclear models,significant uncertainty remains.This study employs a machine learning model based on a light gradient boosting machine(LightGBM)to investigate the first 2^(+)excited states.Specifically,the training of the LightGBM algorithm and the prediction of the first 2^(+)properties of 642 nuclei are presented.Furthermore,detailed comparisons of the LightGBM predictions were performed with available experimental data,shell model calculations,and Bayesian neural network predictions.The results revealed that the average difference between the LightGBM predictions and the experimental data was 18 times smaller than that obtained by the shell model and only 70%of the BNN prediction results.Considering Mg,Ca,Kr,Sm,and Pb isotopes as examples,it was also observed that LightGBM can effectively reproduce the magic number mutation caused by shell effects,with the energy being as low as 0.04 MeV due to shape coexistence.Therefore,we believe that leveraging LightGBM-based machine learning can profoundly enhance our insights into nuclear structures and provide new avenues for nuclear physics research.
文摘Addressing classification and prediction challenges, tree ensemble models have gained significant importance. Boosting ensemble techniques are commonly employed for forecasting Type-II diabetes mellitus. Light Gradient Boosting Machine (LightGBM) is a widely used algorithm known for its leaf growth strategy, loss reduction, and enhanced training precision. However, LightGBM is prone to overfitting. In contrast, CatBoost utilizes balanced base predictors known as decision tables, which mitigate overfitting risks and significantly improve testing time efficiency. CatBoost’s algorithm structure counteracts gradient boosting biases and incorporates an overfitting detector to stop training early. This study focuses on developing a hybrid model that combines LightGBM and CatBoost to minimize overfitting and improve accuracy by reducing variance. For the purpose of finding the best hyperparameters to use with the underlying learners, the Bayesian hyperparameter optimization method is used. By fine-tuning the regularization parameter values, the hybrid model effectively reduces variance (overfitting). Comparative evaluation against LightGBM, CatBoost, XGBoost, Decision Tree, Random Forest, AdaBoost, and GBM algorithms demonstrates that the hybrid model has the best F1-score (99.37%), recall (99.25%), and accuracy (99.37%). Consequently, the proposed framework holds promise for early diabetes prediction in the healthcare industry and exhibits potential applicability to other datasets sharing similarities with diabetes.
文摘Boosting is one of the most representational ensemble prediction methods. It can be divided into two se-ries: Boost-by-majority and Adaboost. This paper briefly introduces the research status of Boosting and one of its seri-als-AdaBoost,analyzes the typical algorithms of AdaBoost.