The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial int...The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.展开更多
Accurate reservoir permeability determination is crucial in hydrocarbon exploration and production.Conventional methods relying on empirical correlations and assumptions often result in high costs,time consumption,ina...Accurate reservoir permeability determination is crucial in hydrocarbon exploration and production.Conventional methods relying on empirical correlations and assumptions often result in high costs,time consumption,inaccuracies,and uncertainties.This study introduces a novel hybrid machine learning approach to predict the permeability of the Wangkwar formation in the Gunya oilfield,Northwestern Uganda.The group method of data handling with differential evolution(GMDH-DE)algorithm was used to predict permeability due to its capability to manage complex,nonlinear relationships between variables,reduced computation time,and parameter optimization through evolutionary algorithms.Using 1953 samples from Gunya-1 and Gunya-2 wells for training and 1563 samples from Gunya-3 for testing,the GMDH-DE outperformed the group method of data handling(GMDH)and random forest(RF)in predicting permeability with higher accuracy and lower computation time.The GMDH-DE achieved an R^(2)of 0.9985,RMSE of 3.157,MAE of 2.366,and ME of 0.001 during training,and for testing,the ME,MAE,RMSE,and R^(2)were 1.3508,12.503,21.3898,and 0.9534,respectively.Additionally,the GMDH-DE demonstrated a 41%reduction in processing time compared to GMDH and RF.The model was also used to predict the permeability of the Mita Gamma well in the Mandawa basin,Tanzania,which lacks core data.Shapley additive explanations(SHAP)analysis identified thermal neutron porosity(TNPH),effective porosity(PHIE),and spectral gamma-ray(SGR)as the most critical parameters in permeability prediction.Therefore,the GMDH-DE model offers a novel,efficient,and accurate approach for fast permeability prediction,enhancing hydrocarbon exploration and production.展开更多
Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett...Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。展开更多
Accurately revealing the spatial heterogeneity in the trade-offs and synergies of land use functions(LUFs)and their driving factors is imperative for advancing sustainable land utilization and optimizing land use plan...Accurately revealing the spatial heterogeneity in the trade-offs and synergies of land use functions(LUFs)and their driving factors is imperative for advancing sustainable land utilization and optimizing land use planning.This is especially critical for ecologically vulnerable inland river basins in arid regions.However,existing methods struggle to effectively capture complex nonlinear interactions among environmental factors and their multifaceted relationships with trade-offs and synergies of LUFs,especially for the inland river basins in arid regions.Consequently,this study focused on the middle reaches of the Heihe River Basin(MHRB),an arid inland river basin in northwestern China.Using land use,socioeconomic,meteorological,and hydrological data from 2000 to 2020,we analyzed the spatiotemporal patterns of LUFs and their trade-off and synergy relationships from the perspective of production,living,ecological functions.Additionally,we employed an integrated Extreme Gradient Boosting(XGBoost)-SHapley Additive exPlanations(SHAP)framework to investigate the environmental factors influencing the spatial heterogeneity in the trade-offs and synergies of LUFs.Our findings reveal that from 2000 to 2020,the production,living,and ecological functions of land use within the MHRB exhibited an increasing trend,demonstrating a distinct spatial pattern of''high in the southwest and low in the northeast''.Significant spatial heterogeneity defined the trade-off and synergistic relationships,with trade-offs dominating human activity-intensive oasis areas,while synergies prevailed in other areas.During the study period,synergistic relationships between production and living functions and between production and ecological functions were relatively robust,whereas synergies in living-ecological functions remained weaker.Natural factors(digital elevation model(DEM),annual mean temperature,Normalized Difference Vegetation Index(NDVI),and annual precipitation)emerged as the primary factors driving the trade-offs and synergies of LUFs,followed by socioeconomic factors(population density,Gross Domestic Product(GDP),and land use intensity),while distance factors(distance to water bodies,distance to residential areas,and distance to roads)exerted minimal influence.Notably,the interactions among NDVI,annual mean temperature,DEM,and land use intensity exerted the most substantial impacts on the relationships among LUFs.This study provides novel perspectives and methodologies for unraveling the mechanisms underlying the spatial heterogeneity in the trade-offs and synergies of LUFs,offering scientific insights to inform regional land use planning and sustainable natural resource management in inland river basins in arid regions.展开更多
This study addresses gaps in aftershock prediction research by proposing an interpretable hybrid machine learning model that leverages multi-source data.The model overcomes challenges related to the selection of influ...This study addresses gaps in aftershock prediction research by proposing an interpretable hybrid machine learning model that leverages multi-source data.The model overcomes challenges related to the selection of influencing factors,model types,prediction result visualization,and decision mechanism interpretability.It integrates mainshock factors,geological features,site characteristics,and terrain conditions using geospatial information system(GIS)technology.By employing the stacking algorithm to optimize and combine XGBoost and LightGBM models,the proposed model significantly improves the prediction performance.Visualization through aftershock hazard mapping offers a robust tool for aftershock warning.The Shapley additive explanations(SHAP)model is used to explain the decision-making process from both global and local perspectives.Results show that,compared to the optimized XGBoost-CMA_ES and LightGBM-CMA_ES hybrid models,the stacking model achieves area under the curve(AUC)increases of 7.71%and 5.72% on the test set,respectively,with a maximum prediction accuracy of 0.9344.The hazard zoning map identifies high-risk areas mainly around fault lines and near the epicenter.As hazard levels rise,the proportion and density of aftershocks in these areas increase.The SHAP model results highlight the distance to fault as the most critical factor.The study integrates local explanations with on-site investigations,effectively visualizing the contributions of different factors to aftershocks.This research provides new tools and methods for enhancing aftershock warning and response.展开更多
构建使用了PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,分析使用PD-1肿瘤抑制剂导致的甲状腺功能障碍的相关风险因素,设计监测预警系统。选取2020年—2023年广西医科大学附属肿瘤医院1225例使用PD-1抑制剂肿瘤患者的临床资...构建使用了PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,分析使用PD-1肿瘤抑制剂导致的甲状腺功能障碍的相关风险因素,设计监测预警系统。选取2020年—2023年广西医科大学附属肿瘤医院1225例使用PD-1抑制剂肿瘤患者的临床资料,包括人口学特征、既往史、实验室检测等63个变量。本文选取相关性前10/20/30/40/50/60个变量的4种传统机器学习模型进行性能比较。通过F1分数、灵敏度、准确率、精确率、特异性曲线下面积(Area Under the Curve,AUC)评估以上预测模型的性能,并利用Shapley加性解释(Shapley Additive Explanation,SHAP)可视化解释本文的机器学习模型。与促甲状腺激素相关性排名前10的变量依次为:羟丁酸脱氢酶、乳酸脱氢酶、淋巴细胞绝对值、天门冬氨酸转移酶、钙离子、碱性磷酸酶、谷氨酰转肽酶、单核细胞绝对值、红细胞分布宽度SD、胆碱酯酶。建立了使用PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,并在全局解释和局部解释的层面上分别作出模型预测结果影响的解释。展开更多
Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tiansh...Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tianshan Mountains of Northwest China.We integrated multi-source remote sensing with the Carnegie–Ames–Stanford Approach(CASA)model to estimate NPP during 2000–2020,assessed trend persistence using the Hurst exponent,and identified key drivers and nonlinear thresholds with Extreme Gradient Boosting(XGBoost)and SHapley Additive exPlanations(SHAP).Total NPP averaged 55.74 Tg C/a and ranged from 48.07 to 65.91 Tg C/a from 2000 to 2020,while regional mean NPP rose from 138.97 to 160.69 g C/(m^(2)·a).Land use transfer analysis showed that grassland expanded mainly at the expense of unutilized land and that cropland increased overall.Although NPP increased across 64.11%of the region during 2000–2020,persistence analysis suggested that 53.93%of the Tianshan Mountains was prone to continued NPP decline,including 36.41%with significant projected decline and 17.52%with weak projected decline;these areas formed degradation hotspots concentrated in the central and northern Tianshan Mountains.In contrast,potential improvement was limited(strong persistent improvement:4.97%;strong anti-persistent improvement:0.36%).Driver attribution indicated that land use dominated NPP variability(mean absolute SHAP value=29.54%),followed by precipitation(16.03%)and temperature(11.05%).SHAP dependence analyses showed that precipitation effects stabilized at 300.00–400.00 mm,and temperature exhibited an inverted U-shaped response with a peak near 0.00°C.These findings indicated that persistent degradation risk arose from hydrothermal constraints interacting with land use conversion,highlighting the need for threshold-informed,spatially targeted management to sustain carbon sequestration in arid mountain ecosystems.展开更多
基金supported by the National Natural Science Foundation of China Project(No.62302540)please visit their website at https://www.nsfc.gov.cn/(accessed on 18 June 2024).
文摘The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.
基金supported by the Major National Science and Technology Programs in the“Thirteenth Five-Year”Plan period(Grant No.2017ZX05032-002-004)the Innovation Team Funding of Natural Science Foundation of Hubei Province,China(Grant No.2021CFA031)the Chinese Scholarship Council(CSC)and Silk Road Institute for their support in terms of stipend.
文摘Accurate reservoir permeability determination is crucial in hydrocarbon exploration and production.Conventional methods relying on empirical correlations and assumptions often result in high costs,time consumption,inaccuracies,and uncertainties.This study introduces a novel hybrid machine learning approach to predict the permeability of the Wangkwar formation in the Gunya oilfield,Northwestern Uganda.The group method of data handling with differential evolution(GMDH-DE)algorithm was used to predict permeability due to its capability to manage complex,nonlinear relationships between variables,reduced computation time,and parameter optimization through evolutionary algorithms.Using 1953 samples from Gunya-1 and Gunya-2 wells for training and 1563 samples from Gunya-3 for testing,the GMDH-DE outperformed the group method of data handling(GMDH)and random forest(RF)in predicting permeability with higher accuracy and lower computation time.The GMDH-DE achieved an R^(2)of 0.9985,RMSE of 3.157,MAE of 2.366,and ME of 0.001 during training,and for testing,the ME,MAE,RMSE,and R^(2)were 1.3508,12.503,21.3898,and 0.9534,respectively.Additionally,the GMDH-DE demonstrated a 41%reduction in processing time compared to GMDH and RF.The model was also used to predict the permeability of the Mita Gamma well in the Mandawa basin,Tanzania,which lacks core data.Shapley additive explanations(SHAP)analysis identified thermal neutron porosity(TNPH),effective porosity(PHIE),and spectral gamma-ray(SGR)as the most critical parameters in permeability prediction.Therefore,the GMDH-DE model offers a novel,efficient,and accurate approach for fast permeability prediction,enhancing hydrocarbon exploration and production.
基金support provided by The Science and Technology Development Fund,Macao SAR,China(File Nos.0057/2020/AGJ and SKL-IOTSC-2021-2023)Science and Technology Program of Guangdong Province,China(Grant No.2021A0505080009).
文摘Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。
基金funded by the University Teachers Innovation Fund Project of Gansu Province(2025A-001)the Northwest Normal University Young Teachers'Scientific Research Ability Improvement Plan(NWNULKQN2024-20).
文摘Accurately revealing the spatial heterogeneity in the trade-offs and synergies of land use functions(LUFs)and their driving factors is imperative for advancing sustainable land utilization and optimizing land use planning.This is especially critical for ecologically vulnerable inland river basins in arid regions.However,existing methods struggle to effectively capture complex nonlinear interactions among environmental factors and their multifaceted relationships with trade-offs and synergies of LUFs,especially for the inland river basins in arid regions.Consequently,this study focused on the middle reaches of the Heihe River Basin(MHRB),an arid inland river basin in northwestern China.Using land use,socioeconomic,meteorological,and hydrological data from 2000 to 2020,we analyzed the spatiotemporal patterns of LUFs and their trade-off and synergy relationships from the perspective of production,living,ecological functions.Additionally,we employed an integrated Extreme Gradient Boosting(XGBoost)-SHapley Additive exPlanations(SHAP)framework to investigate the environmental factors influencing the spatial heterogeneity in the trade-offs and synergies of LUFs.Our findings reveal that from 2000 to 2020,the production,living,and ecological functions of land use within the MHRB exhibited an increasing trend,demonstrating a distinct spatial pattern of''high in the southwest and low in the northeast''.Significant spatial heterogeneity defined the trade-off and synergistic relationships,with trade-offs dominating human activity-intensive oasis areas,while synergies prevailed in other areas.During the study period,synergistic relationships between production and living functions and between production and ecological functions were relatively robust,whereas synergies in living-ecological functions remained weaker.Natural factors(digital elevation model(DEM),annual mean temperature,Normalized Difference Vegetation Index(NDVI),and annual precipitation)emerged as the primary factors driving the trade-offs and synergies of LUFs,followed by socioeconomic factors(population density,Gross Domestic Product(GDP),and land use intensity),while distance factors(distance to water bodies,distance to residential areas,and distance to roads)exerted minimal influence.Notably,the interactions among NDVI,annual mean temperature,DEM,and land use intensity exerted the most substantial impacts on the relationships among LUFs.This study provides novel perspectives and methodologies for unraveling the mechanisms underlying the spatial heterogeneity in the trade-offs and synergies of LUFs,offering scientific insights to inform regional land use planning and sustainable natural resource management in inland river basins in arid regions.
基金supported by the National Key Research and Development Program of China(Grant No.2023YFC3007203).
文摘This study addresses gaps in aftershock prediction research by proposing an interpretable hybrid machine learning model that leverages multi-source data.The model overcomes challenges related to the selection of influencing factors,model types,prediction result visualization,and decision mechanism interpretability.It integrates mainshock factors,geological features,site characteristics,and terrain conditions using geospatial information system(GIS)technology.By employing the stacking algorithm to optimize and combine XGBoost and LightGBM models,the proposed model significantly improves the prediction performance.Visualization through aftershock hazard mapping offers a robust tool for aftershock warning.The Shapley additive explanations(SHAP)model is used to explain the decision-making process from both global and local perspectives.Results show that,compared to the optimized XGBoost-CMA_ES and LightGBM-CMA_ES hybrid models,the stacking model achieves area under the curve(AUC)increases of 7.71%and 5.72% on the test set,respectively,with a maximum prediction accuracy of 0.9344.The hazard zoning map identifies high-risk areas mainly around fault lines and near the epicenter.As hazard levels rise,the proportion and density of aftershocks in these areas increase.The SHAP model results highlight the distance to fault as the most critical factor.The study integrates local explanations with on-site investigations,effectively visualizing the contributions of different factors to aftershocks.This research provides new tools and methods for enhancing aftershock warning and response.
文摘构建使用了PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,分析使用PD-1肿瘤抑制剂导致的甲状腺功能障碍的相关风险因素,设计监测预警系统。选取2020年—2023年广西医科大学附属肿瘤医院1225例使用PD-1抑制剂肿瘤患者的临床资料,包括人口学特征、既往史、实验室检测等63个变量。本文选取相关性前10/20/30/40/50/60个变量的4种传统机器学习模型进行性能比较。通过F1分数、灵敏度、准确率、精确率、特异性曲线下面积(Area Under the Curve,AUC)评估以上预测模型的性能,并利用Shapley加性解释(Shapley Additive Explanation,SHAP)可视化解释本文的机器学习模型。与促甲状腺激素相关性排名前10的变量依次为:羟丁酸脱氢酶、乳酸脱氢酶、淋巴细胞绝对值、天门冬氨酸转移酶、钙离子、碱性磷酸酶、谷氨酰转肽酶、单核细胞绝对值、红细胞分布宽度SD、胆碱酯酶。建立了使用PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,并在全局解释和局部解释的层面上分别作出模型预测结果影响的解释。
基金supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2023E01006,2024TSYCCX0004).
文摘Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tianshan Mountains of Northwest China.We integrated multi-source remote sensing with the Carnegie–Ames–Stanford Approach(CASA)model to estimate NPP during 2000–2020,assessed trend persistence using the Hurst exponent,and identified key drivers and nonlinear thresholds with Extreme Gradient Boosting(XGBoost)and SHapley Additive exPlanations(SHAP).Total NPP averaged 55.74 Tg C/a and ranged from 48.07 to 65.91 Tg C/a from 2000 to 2020,while regional mean NPP rose from 138.97 to 160.69 g C/(m^(2)·a).Land use transfer analysis showed that grassland expanded mainly at the expense of unutilized land and that cropland increased overall.Although NPP increased across 64.11%of the region during 2000–2020,persistence analysis suggested that 53.93%of the Tianshan Mountains was prone to continued NPP decline,including 36.41%with significant projected decline and 17.52%with weak projected decline;these areas formed degradation hotspots concentrated in the central and northern Tianshan Mountains.In contrast,potential improvement was limited(strong persistent improvement:4.97%;strong anti-persistent improvement:0.36%).Driver attribution indicated that land use dominated NPP variability(mean absolute SHAP value=29.54%),followed by precipitation(16.03%)and temperature(11.05%).SHAP dependence analyses showed that precipitation effects stabilized at 300.00–400.00 mm,and temperature exhibited an inverted U-shaped response with a peak near 0.00°C.These findings indicated that persistent degradation risk arose from hydrothermal constraints interacting with land use conversion,highlighting the need for threshold-informed,spatially targeted management to sustain carbon sequestration in arid mountain ecosystems.