This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging...This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.展开更多
Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural ...Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.展开更多
The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial int...The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.展开更多
Deep learning models have become a core technological tool in the field of medical image analysis.However,these models often suffer from a lack of transparency in their decision-making processes,leading to challenges ...Deep learning models have become a core technological tool in the field of medical image analysis.However,these models often suffer from a lack of transparency in their decision-making processes,leading to challenges related to trust and interpret ability in clinical applications.To address this issue,explainable artificial intelligence(XAI)techniques have been applied to medical image analysis.While showing promising potential,XAI also brings significant ethical risks in practice—most notably,the problem of spurious explanations.Such explanations may rise further concerns regarding patient privacy,data security,and the attribution of decisionmaking authority in medical contexts.This paper analyzes the application of XAI methods—particularly saliency aps—in medical image interpretation,identifies the underlying causes of spurious explanations,and proposes possible mitigation strategies.The aim is to contribute to the responsible and sustainable integration of explainable AI into clinical practice.展开更多
Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learni...Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learning(DL)approaches often face several limitations,including inefficient feature extraction,class imbalance,suboptimal classification performance,and limited interpretability,which collectively hinder their deployment in clinical settings.To address these challenges,we propose a novel DL framework for heart disease prediction that integrates a comprehensive preprocessing pipeline with an advanced classification architecture.The preprocessing stage involves label encoding and feature scaling.To address the issue of class imbalance inherent in the personal key indicators of the heart disease dataset,the localized random affine shadowsampling technique is employed,which enhances minority class representation while minimizing overfitting.At the core of the framework lies the Deep Residual Network(DeepResNet),which employs hierarchical residual transformations to facilitate efficient feature extraction and capture complex,non-linear relationships in the data.Experimental results demonstrate that the proposed model significantly outperforms existing techniques,achieving improvements of 3.26%in accuracy,3.16%in area under the receiver operating characteristics,1.09%in recall,and 1.07%in F1-score.Furthermore,robustness is validated using 10-fold crossvalidation,confirming the model’s generalizability across diverse data distributions.Moreover,model interpretability is ensured through the integration of Shapley additive explanations and local interpretable model-agnostic explanations,offering valuable insights into the contribution of individual features to model predictions.Overall,the proposed DL framework presents a robust,interpretable,and clinically applicable solution for heart disease prediction.展开更多
Inverse design of advanced materials represents a pivotal challenge in materials science.Leveraging the latent space of Variational Autoencoders(VAEs)for material optimization has emerged as a significant advancement ...Inverse design of advanced materials represents a pivotal challenge in materials science.Leveraging the latent space of Variational Autoencoders(VAEs)for material optimization has emerged as a significant advancement in the field of material inverse design.However,VAEs are inherently prone to generating blurred images,posing challenges for precise inverse design and microstructure manufacturing.While increasing the dimensionality of the VAE latent space can mitigate reconstruction blurriness to some extent,it simultaneously imposes a substantial burden on target optimization due to an excessively high search space.To address these limitations,this study adopts a Variational Autoencoder guided Conditional Diffusion Generative Model(VAE-CDGM)framework integrated with Bayesian optimization to achieve the inverse design of composite materials with targeted mechanical properties.The VAE-CDGM model synergizes the strengths of VAEs and Denoising Diffusion Probabilistic Models(DDPM),enabling the generation of high-quality,sharp images while preserving a manipulable latent space.To accommodate varying dimensional requirements of the latent space,two optimization strategies are proposed.When the latent space dimensionality is excessively high,SHapley Additive exPlanations(SHAP)sensitivity analysis is employed to identify critical latent features for optimization within a reduced subspace.Conversely,direct optimization is performed in the low-dimensional latent space of VAE-CDGM when dimensionality is modest.The results demonstrate that both strategies accurately achieve the targeted design of composite materials while circumventing the blurred reconstruction flaws of VAEs,which offers a novel pathway for the precise design of advanced materials.展开更多
Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tiansh...Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tianshan Mountains of Northwest China.We integrated multi-source remote sensing with the Carnegie–Ames–Stanford Approach(CASA)model to estimate NPP during 2000–2020,assessed trend persistence using the Hurst exponent,and identified key drivers and nonlinear thresholds with Extreme Gradient Boosting(XGBoost)and SHapley Additive exPlanations(SHAP).Total NPP averaged 55.74 Tg C/a and ranged from 48.07 to 65.91 Tg C/a from 2000 to 2020,while regional mean NPP rose from 138.97 to 160.69 g C/(m^(2)·a).Land use transfer analysis showed that grassland expanded mainly at the expense of unutilized land and that cropland increased overall.Although NPP increased across 64.11%of the region during 2000–2020,persistence analysis suggested that 53.93%of the Tianshan Mountains was prone to continued NPP decline,including 36.41%with significant projected decline and 17.52%with weak projected decline;these areas formed degradation hotspots concentrated in the central and northern Tianshan Mountains.In contrast,potential improvement was limited(strong persistent improvement:4.97%;strong anti-persistent improvement:0.36%).Driver attribution indicated that land use dominated NPP variability(mean absolute SHAP value=29.54%),followed by precipitation(16.03%)and temperature(11.05%).SHAP dependence analyses showed that precipitation effects stabilized at 300.00–400.00 mm,and temperature exhibited an inverted U-shaped response with a peak near 0.00°C.These findings indicated that persistent degradation risk arose from hydrothermal constraints interacting with land use conversion,highlighting the need for threshold-informed,spatially targeted management to sustain carbon sequestration in arid mountain ecosystems.展开更多
构建使用了PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,分析使用PD-1肿瘤抑制剂导致的甲状腺功能障碍的相关风险因素,设计监测预警系统。选取2020年—2023年广西医科大学附属肿瘤医院1225例使用PD-1抑制剂肿瘤患者的临床资...构建使用了PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,分析使用PD-1肿瘤抑制剂导致的甲状腺功能障碍的相关风险因素,设计监测预警系统。选取2020年—2023年广西医科大学附属肿瘤医院1225例使用PD-1抑制剂肿瘤患者的临床资料,包括人口学特征、既往史、实验室检测等63个变量。本文选取相关性前10/20/30/40/50/60个变量的4种传统机器学习模型进行性能比较。通过F1分数、灵敏度、准确率、精确率、特异性曲线下面积(Area Under the Curve,AUC)评估以上预测模型的性能,并利用Shapley加性解释(Shapley Additive Explanation,SHAP)可视化解释本文的机器学习模型。与促甲状腺激素相关性排名前10的变量依次为:羟丁酸脱氢酶、乳酸脱氢酶、淋巴细胞绝对值、天门冬氨酸转移酶、钙离子、碱性磷酸酶、谷氨酰转肽酶、单核细胞绝对值、红细胞分布宽度SD、胆碱酯酶。建立了使用PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,并在全局解释和局部解释的层面上分别作出模型预测结果影响的解释。展开更多
基金supported by a National Research Foundation of Korea(NRF)grant funded by the Korean government(MSIT)(RS-2023-00222536).
文摘This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.
文摘Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.
基金supported by the National Natural Science Foundation of China Project(No.62302540)please visit their website at https://www.nsfc.gov.cn/(accessed on 18 June 2024).
文摘The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.
文摘Deep learning models have become a core technological tool in the field of medical image analysis.However,these models often suffer from a lack of transparency in their decision-making processes,leading to challenges related to trust and interpret ability in clinical applications.To address this issue,explainable artificial intelligence(XAI)techniques have been applied to medical image analysis.While showing promising potential,XAI also brings significant ethical risks in practice—most notably,the problem of spurious explanations.Such explanations may rise further concerns regarding patient privacy,data security,and the attribution of decisionmaking authority in medical contexts.This paper analyzes the application of XAI methods—particularly saliency aps—in medical image interpretation,identifies the underlying causes of spurious explanations,and proposes possible mitigation strategies.The aim is to contribute to the responsible and sustainable integration of explainable AI into clinical practice.
基金funded by Ongoing Research Funding Program for Project number(ORF-2025-648),King Saud University,Riyadh,Saudi Arabia.
文摘Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learning(DL)approaches often face several limitations,including inefficient feature extraction,class imbalance,suboptimal classification performance,and limited interpretability,which collectively hinder their deployment in clinical settings.To address these challenges,we propose a novel DL framework for heart disease prediction that integrates a comprehensive preprocessing pipeline with an advanced classification architecture.The preprocessing stage involves label encoding and feature scaling.To address the issue of class imbalance inherent in the personal key indicators of the heart disease dataset,the localized random affine shadowsampling technique is employed,which enhances minority class representation while minimizing overfitting.At the core of the framework lies the Deep Residual Network(DeepResNet),which employs hierarchical residual transformations to facilitate efficient feature extraction and capture complex,non-linear relationships in the data.Experimental results demonstrate that the proposed model significantly outperforms existing techniques,achieving improvements of 3.26%in accuracy,3.16%in area under the receiver operating characteristics,1.09%in recall,and 1.07%in F1-score.Furthermore,robustness is validated using 10-fold crossvalidation,confirming the model’s generalizability across diverse data distributions.Moreover,model interpretability is ensured through the integration of Shapley additive explanations and local interpretable model-agnostic explanations,offering valuable insights into the contribution of individual features to model predictions.Overall,the proposed DL framework presents a robust,interpretable,and clinically applicable solution for heart disease prediction.
文摘Inverse design of advanced materials represents a pivotal challenge in materials science.Leveraging the latent space of Variational Autoencoders(VAEs)for material optimization has emerged as a significant advancement in the field of material inverse design.However,VAEs are inherently prone to generating blurred images,posing challenges for precise inverse design and microstructure manufacturing.While increasing the dimensionality of the VAE latent space can mitigate reconstruction blurriness to some extent,it simultaneously imposes a substantial burden on target optimization due to an excessively high search space.To address these limitations,this study adopts a Variational Autoencoder guided Conditional Diffusion Generative Model(VAE-CDGM)framework integrated with Bayesian optimization to achieve the inverse design of composite materials with targeted mechanical properties.The VAE-CDGM model synergizes the strengths of VAEs and Denoising Diffusion Probabilistic Models(DDPM),enabling the generation of high-quality,sharp images while preserving a manipulable latent space.To accommodate varying dimensional requirements of the latent space,two optimization strategies are proposed.When the latent space dimensionality is excessively high,SHapley Additive exPlanations(SHAP)sensitivity analysis is employed to identify critical latent features for optimization within a reduced subspace.Conversely,direct optimization is performed in the low-dimensional latent space of VAE-CDGM when dimensionality is modest.The results demonstrate that both strategies accurately achieve the targeted design of composite materials while circumventing the blurred reconstruction flaws of VAEs,which offers a novel pathway for the precise design of advanced materials.
基金supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2023E01006,2024TSYCCX0004).
文摘Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tianshan Mountains of Northwest China.We integrated multi-source remote sensing with the Carnegie–Ames–Stanford Approach(CASA)model to estimate NPP during 2000–2020,assessed trend persistence using the Hurst exponent,and identified key drivers and nonlinear thresholds with Extreme Gradient Boosting(XGBoost)and SHapley Additive exPlanations(SHAP).Total NPP averaged 55.74 Tg C/a and ranged from 48.07 to 65.91 Tg C/a from 2000 to 2020,while regional mean NPP rose from 138.97 to 160.69 g C/(m^(2)·a).Land use transfer analysis showed that grassland expanded mainly at the expense of unutilized land and that cropland increased overall.Although NPP increased across 64.11%of the region during 2000–2020,persistence analysis suggested that 53.93%of the Tianshan Mountains was prone to continued NPP decline,including 36.41%with significant projected decline and 17.52%with weak projected decline;these areas formed degradation hotspots concentrated in the central and northern Tianshan Mountains.In contrast,potential improvement was limited(strong persistent improvement:4.97%;strong anti-persistent improvement:0.36%).Driver attribution indicated that land use dominated NPP variability(mean absolute SHAP value=29.54%),followed by precipitation(16.03%)and temperature(11.05%).SHAP dependence analyses showed that precipitation effects stabilized at 300.00–400.00 mm,and temperature exhibited an inverted U-shaped response with a peak near 0.00°C.These findings indicated that persistent degradation risk arose from hydrothermal constraints interacting with land use conversion,highlighting the need for threshold-informed,spatially targeted management to sustain carbon sequestration in arid mountain ecosystems.
文摘构建使用了PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,分析使用PD-1肿瘤抑制剂导致的甲状腺功能障碍的相关风险因素,设计监测预警系统。选取2020年—2023年广西医科大学附属肿瘤医院1225例使用PD-1抑制剂肿瘤患者的临床资料,包括人口学特征、既往史、实验室检测等63个变量。本文选取相关性前10/20/30/40/50/60个变量的4种传统机器学习模型进行性能比较。通过F1分数、灵敏度、准确率、精确率、特异性曲线下面积(Area Under the Curve,AUC)评估以上预测模型的性能,并利用Shapley加性解释(Shapley Additive Explanation,SHAP)可视化解释本文的机器学习模型。与促甲状腺激素相关性排名前10的变量依次为:羟丁酸脱氢酶、乳酸脱氢酶、淋巴细胞绝对值、天门冬氨酸转移酶、钙离子、碱性磷酸酶、谷氨酰转肽酶、单核细胞绝对值、红细胞分布宽度SD、胆碱酯酶。建立了使用PD-1抑制剂的肿瘤患者出现甲状腺功能障碍的风险预测模型,并在全局解释和局部解释的层面上分别作出模型预测结果影响的解释。