Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learni...Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learning(DL)approaches often face several limitations,including inefficient feature extraction,class imbalance,suboptimal classification performance,and limited interpretability,which collectively hinder their deployment in clinical settings.To address these challenges,we propose a novel DL framework for heart disease prediction that integrates a comprehensive preprocessing pipeline with an advanced classification architecture.The preprocessing stage involves label encoding and feature scaling.To address the issue of class imbalance inherent in the personal key indicators of the heart disease dataset,the localized random affine shadowsampling technique is employed,which enhances minority class representation while minimizing overfitting.At the core of the framework lies the Deep Residual Network(DeepResNet),which employs hierarchical residual transformations to facilitate efficient feature extraction and capture complex,non-linear relationships in the data.Experimental results demonstrate that the proposed model significantly outperforms existing techniques,achieving improvements of 3.26%in accuracy,3.16%in area under the receiver operating characteristics,1.09%in recall,and 1.07%in F1-score.Furthermore,robustness is validated using 10-fold crossvalidation,confirming the model’s generalizability across diverse data distributions.Moreover,model interpretability is ensured through the integration of Shapley additive explanations and local interpretable model-agnostic explanations,offering valuable insights into the contribution of individual features to model predictions.Overall,the proposed DL framework presents a robust,interpretable,and clinically applicable solution for heart disease prediction.展开更多
Inverse design of advanced materials represents a pivotal challenge in materials science.Leveraging the latent space of Variational Autoencoders(VAEs)for material optimization has emerged as a significant advancement ...Inverse design of advanced materials represents a pivotal challenge in materials science.Leveraging the latent space of Variational Autoencoders(VAEs)for material optimization has emerged as a significant advancement in the field of material inverse design.However,VAEs are inherently prone to generating blurred images,posing challenges for precise inverse design and microstructure manufacturing.While increasing the dimensionality of the VAE latent space can mitigate reconstruction blurriness to some extent,it simultaneously imposes a substantial burden on target optimization due to an excessively high search space.To address these limitations,this study adopts a Variational Autoencoder guided Conditional Diffusion Generative Model(VAE-CDGM)framework integrated with Bayesian optimization to achieve the inverse design of composite materials with targeted mechanical properties.The VAE-CDGM model synergizes the strengths of VAEs and Denoising Diffusion Probabilistic Models(DDPM),enabling the generation of high-quality,sharp images while preserving a manipulable latent space.To accommodate varying dimensional requirements of the latent space,two optimization strategies are proposed.When the latent space dimensionality is excessively high,SHapley Additive exPlanations(SHAP)sensitivity analysis is employed to identify critical latent features for optimization within a reduced subspace.Conversely,direct optimization is performed in the low-dimensional latent space of VAE-CDGM when dimensionality is modest.The results demonstrate that both strategies accurately achieve the targeted design of composite materials while circumventing the blurred reconstruction flaws of VAEs,which offers a novel pathway for the precise design of advanced materials.展开更多
Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tiansh...Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tianshan Mountains of Northwest China.We integrated multi-source remote sensing with the Carnegie–Ames–Stanford Approach(CASA)model to estimate NPP during 2000–2020,assessed trend persistence using the Hurst exponent,and identified key drivers and nonlinear thresholds with Extreme Gradient Boosting(XGBoost)and SHapley Additive exPlanations(SHAP).Total NPP averaged 55.74 Tg C/a and ranged from 48.07 to 65.91 Tg C/a from 2000 to 2020,while regional mean NPP rose from 138.97 to 160.69 g C/(m^(2)·a).Land use transfer analysis showed that grassland expanded mainly at the expense of unutilized land and that cropland increased overall.Although NPP increased across 64.11%of the region during 2000–2020,persistence analysis suggested that 53.93%of the Tianshan Mountains was prone to continued NPP decline,including 36.41%with significant projected decline and 17.52%with weak projected decline;these areas formed degradation hotspots concentrated in the central and northern Tianshan Mountains.In contrast,potential improvement was limited(strong persistent improvement:4.97%;strong anti-persistent improvement:0.36%).Driver attribution indicated that land use dominated NPP variability(mean absolute SHAP value=29.54%),followed by precipitation(16.03%)and temperature(11.05%).SHAP dependence analyses showed that precipitation effects stabilized at 300.00–400.00 mm,and temperature exhibited an inverted U-shaped response with a peak near 0.00°C.These findings indicated that persistent degradation risk arose from hydrothermal constraints interacting with land use conversion,highlighting the need for threshold-informed,spatially targeted management to sustain carbon sequestration in arid mountain ecosystems.展开更多
本文聚焦我国青海湖流域的水文过程,基于多年气象和水文动态数据,发展了一种融合概念性水文模型FLEX (FluxExchange)和门控循环单元(Gated Recurrent Unit, GRU)的混合模型对流域内最大支流布哈河的逐日径流进行了模拟和预测。在构建混...本文聚焦我国青海湖流域的水文过程,基于多年气象和水文动态数据,发展了一种融合概念性水文模型FLEX (FluxExchange)和门控循环单元(Gated Recurrent Unit, GRU)的混合模型对流域内最大支流布哈河的逐日径流进行了模拟和预测。在构建混合模型中,采用了三种策略提升模拟精度:引入差分进化自适应算法DREAM(zs)反演水文参数优化FLEX模型;采用变分模态分解(VMD)提取径流数据的信息和特征;利用麻雀搜索算法(SSA)优化深度学习GRU的参数。研究将FLEX模型的模拟结果连同气象数据一起作为神经网络的输入,从而构建了FLEX-VMD-SSA-GRU混合模型。同时,探讨了不同的气象输入条件对模拟结果的影响和贡献:基于7个主要气象要素,由少及多设置了14组输入情景模拟。最后通过SHAP对深度学习方法的结果进行分析,揭示了气象变量对径流长期趋势的贡献和重要度。展开更多
Sinter is the core raw material for blast furnaces.Flue pressure,which is an important state parameter,affects sinter quality.In this paper,flue pressure prediction and optimization were studied based on the shapley a...Sinter is the core raw material for blast furnaces.Flue pressure,which is an important state parameter,affects sinter quality.In this paper,flue pressure prediction and optimization were studied based on the shapley additive explanation(SHAP)to predict the flue pressure and take targeted adjustment measures.First,the sintering process data were collected and processed.A flue pressure prediction model was then constructed after comparing different feature selection methods and model algorithms using SHAP+extremely random-ized trees(ET).The prediction accuracy of the model within the error range of±0.25 kPa was 92.63%.SHAP analysis was employed to improve the interpretability of the prediction model.The effects of various sintering operation parameters on flue pressure,the relation-ship between the numerical range of key operation parameters and flue pressure,the effect of operation parameter combinations on flue pressure,and the prediction process of the flue pressure prediction model on a single sample were analyzed.A flue pressure optimization module was also constructed and analyzed when the prediction satisfied the judgment conditions.The operating parameter combination was then pushed.The flue pressure was increased by 5.87%during the verification process,achieving a good optimization effect.展开更多
BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR...BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR)is a common polypectomy proce-dure in clinical practice,but it has a high postoperative recurrence rate.Currently,there is no predictive model for the recurrence of colorectal polyps after EMR.AIM To construct and validate a machine learning(ML)model for predicting the risk of colorectal polyp recurrence one year after EMR.METHODS This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou.Additionally,a total of 166 patients were collected to form a prospective validation set.Feature variable screening was conducted using uni-variate and multivariate logistic regression analyses,and five ML algorithms were used to construct the predictive models.The optimal models were evaluated based on different performance metrics.Decision curve analysis(DCA)and SHapley Additive exPlanation(SHAP)analysis were performed to assess clinical applicability and predictor importance.RESULTS Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR(P<0.05).Among the models,eXtreme Gradient Boosting(XGBoost)demonstrated the highest area under the curve(AUC)in the training set,internal validation set,and prospective validation set,with AUCs of 0.909(95%CI:0.89-0.92),0.921(95%CI:0.90-0.94),and 0.963(95%CI:0.94-0.99),respectively.DCA indicated favorable clinical utility for the XGBoost model.SHAP analysis identified smoking history,family history,and age as the top three most important predictors in the model.CONCLUSION The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.展开更多
Mechanical properties are critical to the quality of hot-rolled steel pipe products.Accurately understanding the relationship between rolling parameters and mechanical properties is crucial for effective prediction an...Mechanical properties are critical to the quality of hot-rolled steel pipe products.Accurately understanding the relationship between rolling parameters and mechanical properties is crucial for effective prediction and control.To address this,an industrial big data platform was developed to collect and process multi-source heterogeneous data from the entire production process,providing a complete dataset for mechanical property prediction.The adaptive bandwidth kernel density estimation(ABKDE)method was proposed to adjust bandwidth dynamically based on data density.Combining long short-term memory neural networks with ABKDE offers robust prediction interval capabilities for mechanical properties.The proposed method was deployed in a large-scale steel plant,which demonstrated superior prediction interval performance compared to lower upper bound estimation,mean variance estimation,and extreme learning machine-adaptive bandwidth kernel density estimation,achieving a prediction interval normalized average width of 0.37,a prediction interval coverage probability of 0.94,and the lowest coverage width-based criterion of 1.35.Notably,shapley additive explanations-based explanations significantly improved the proposed model’s credibility by providing a clear analysis of feature impacts.展开更多
This study presents an enhanced convolutional neural network(CNN)model integrated with Explainable Artificial Intelligence(XAI)techniques for accurate prediction and interpretation of wheat crop diseases.The aim is to...This study presents an enhanced convolutional neural network(CNN)model integrated with Explainable Artificial Intelligence(XAI)techniques for accurate prediction and interpretation of wheat crop diseases.The aim is to streamline the detection process while offering transparent insights into the model’s decision-making to support effective disease management.To evaluate the model,a dataset was collected from wheat fields in Kotli,Azad Kashmir,Pakistan,and tested across multiple data splits.The proposed model demonstrates improved stability,faster conver-gence,and higher classification accuracy.The results show significant improvements in prediction accuracy and stability compared to prior works,achieving up to 100%accuracy in certain configurations.In addition,XAI methods such as Local Interpretable Model-agnostic Explanations(LIME)and Shapley Additive Explanations(SHAP)were employed to explain the model’s predictions,highlighting the most influential features contributing to classification decisions.The combined use of CNN and XAI offers a dual benefit:strong predictive performance and clear interpretability of outcomes,which is especially critical in real-world agricultural applications.These findings underscore the potential of integrating deep learning models with XAI to advance automated plant disease detection.The study offers a precise,reliable,and interpretable solution for improving wheat production and promoting agricultural sustainability.Future extensions of this work may include scaling the dataset across broader regions and incorporating additional modalities such as environmental data to enhance model robustness and generalization.展开更多
Machine learning(ML)models are widely used for predicting undrained shear strength(USS),but interpretability has been a limitation in various studies.Therefore,this study introduced shapley additive explanations(SHAP)...Machine learning(ML)models are widely used for predicting undrained shear strength(USS),but interpretability has been a limitation in various studies.Therefore,this study introduced shapley additive explanations(SHAP)to clarify the contribution of each input feature in USS prediction.Three ML models,artificial neural network(ANN),extreme gradient boosting(XGBoost),and random forest(RF),were employed,with accuracy evaluated using mean squared error,mean absolute error,and coefficient of determination(R^(2)).The RF achieved the highest performance with an R^(2) of 0.82.SHAP analysis identified pre-consolidation stress as a key contributor to USS prediction.SHAP dependence plots reveal that the ANN captures smoother,linear feature-output relationships,while the RF handles complex,non-linear interactions more effectively.This suggests a non-linear relationship between USS and input features,with RF outperforming ANN.These findings highlight SHAP’s role in enhancing interpretability and promoting transparency and reliability in ML predictions for geotechnical applications.展开更多
In this study,we used the Kolmogorov-Arnold networks(KAN)model based on the Kolmogorov-Arnold representation theorem for a comprehensive and fair evaluation.We compare its performance with four other powerful classifi...In this study,we used the Kolmogorov-Arnold networks(KAN)model based on the Kolmogorov-Arnold representation theorem for a comprehensive and fair evaluation.We compare its performance with four other powerful classification models across three datasets:a simple slope binary classification dataset,an imbalanced rockburst dataset,and a highly discrete liquefaction dataset.First,a thorough review of machine-learning algorithms for geohazard assessment was conducted.Subsequently,three datasets were collected from real engineering practices,and their data structures were visualized.Bayesian optimization was then used to adjust the parameters of all models across all datasets.To ensure model interpretability,a global sensitivity analysis based on Sobol indices was performed,establishing an interpretable visual analysis of the model's decision-making process.For a fair evaluation,various metrics and repeated stratified 10-fold cross-validation were employed to comprehensively analyze the predictive results of the models.The results indicate that although the KAN model,based on the RBF kernel,achieves the expected performance on the binary classification dataset,it also performs well on imbalanced and highly discrete datasets,significantly surpassing other commonly used classification models.This demonstrated the broad application potential of the KAN model in geotechnical engineering.展开更多
The Titanic sunk 113 years ago on April 14-15,after hitting an iceberg,with human error likely causing the ship to wander into those dangerous waters.Today,autonomous systems built on AI can help ships avoid such acci...The Titanic sunk 113 years ago on April 14-15,after hitting an iceberg,with human error likely causing the ship to wander into those dangerous waters.Today,autonomous systems built on AI can help ships avoid such accidents.But could such a system explain to the captain why it was controlling the ship in a certain way?展开更多
Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural ...Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.展开更多
BACKGROUND Parastomal hernia(PSH)is a common and challenging complication following preventive ostomy in rectal cancer patients,lacking accurate tools for early risk prediction.AIM To explore the application of machin...BACKGROUND Parastomal hernia(PSH)is a common and challenging complication following preventive ostomy in rectal cancer patients,lacking accurate tools for early risk prediction.AIM To explore the application of machine learning algorithms in predicting the occurrence of PSH in patients undergoing preventive ostomy after rectal cancer resection,providing valuable support for clinical decision-making.METHODS A retrospective analysis was conducted on the clinical data of 579 patients who underwent rectal cancer resection with preventive ostomy at Tongji Hospital,Huazhong University of Science and Technology,between January 2015 and June 2023.Various machine learning models were constructed and trained using preoperative and intraoperative clinical variables to assess their predictive performance for PSH risk.SHapley Additive exPlanations(SHAP)were used to analyze the importance of features in the models.RESULTS A total of 579 patients were included,with 31(5.3%)developing PSH.Among the machine learning models,the random forest(RF)model showed the best performance.In the test set,the RF model achieved an area under the curve of 0.900,sensitivity of 0.900,and specificity of 0.725.SHAP analysis revealed that tumor distance from the anal verge,body mass index,and preoperative hypertension were the key factors influencing the occurrence of PSH.CONCLUSION Machine learning,particularly the RF model,demonstrates high accuracy and reliability in predicting PSH after preventive ostomy in rectal cancer patients.This technology supports personalized risk assessment and postoperative management,showing significant potential for clinical application.An online predictive platform based on the RF model(https://yangsu2023.shinyapps.io/parastomal_hernia/)has been developed to assist in early screening and intervention for high-risk patients,further enhancing postoperative management and improving patients’quality of life.展开更多
Foam concrete is widely used in engineering due to its lightweight and high porosity.Its compressive strength,a key performance indicator,is influenced by multiple factors,showing nonlinear variation.As compressive st...Foam concrete is widely used in engineering due to its lightweight and high porosity.Its compressive strength,a key performance indicator,is influenced by multiple factors,showing nonlinear variation.As compressive strength tests for foam concrete take a long time,a fast and accurate prediction method is needed.In recent years,machine learning has become a powerful tool for predicting the compressive strength of cement-based materials.However,existing studies often use a limited number of input parameters,and the prediction accuracy of machine learning models under the influence of multiple parameters and nonlinearity remains unclear.This study selects foam concrete density,water-to-cement ratio(W/C),supplementary cementitious material replacement rate(SCM),fine aggregate to binder ratio(FA/Binder),superplasticizer content(SP),and age of the concrete(Age)as input parameters,with compressive strength as the output.Five different machine learning models were compared,and sensitivity analysis,based on Shapley Additive Explanations(SHAP),was used to assess the contribution of each input parameter.The results show that Gaussian Process Regression(GPR)outperforms the other models,with R2,RMSE,MAE,and MAPE values of 0.95,1.6,0.81,and 0.2,respectively.It is because GPR,optimized through Bayesian methods,better fits complex nonlinear relationships,especially considering a large number of input parameters.Sensitivity analysis indicates that the influence of input parameters on compressive strength decreases in the following order:foam concrete density,W/C,Age,FA/Binder,SP,and SCM.展开更多
BACKGROUND Despite the promising prospects of utilizing artificial intelligence and machine learning(ML)for comprehensive disease analysis,few models constructed have been applied in clinical practice due to their com...BACKGROUND Despite the promising prospects of utilizing artificial intelligence and machine learning(ML)for comprehensive disease analysis,few models constructed have been applied in clinical practice due to their complexity and the lack of reasonable explanations.In contrast to previous studies with small sample sizes and limited model interpretability,we developed a transparent eXtreme Gradient Boosting(XGBoost)-based model supported by multi-center data,using patients'basic information and clinical indicators to forecast the occurrence of anastomotic leakage(AL)after rectal cancer resection surgery.The model demonstrated robust predictive performance and identified clinically relevant thresholds,which may assist physicians in optimizing perioperative management.AIM To develop an interpretable ML model for accurately predicting the occurrence probability of AL after rectal cancer resection and define our clinical alert values for serum calcium ions.METHODS Patients who underwent anterior resection of the rectum for rectal carcinoma at the Department of Digestive Surgery,Xijing Hospital of Digestive Diseases,Air Force Medical University,and Shaanxi Provincial People's Hospital,were retrospectively collected from January 2011 to December 2021.Ten ML models were integrated to analyze the data and develop the predictive models.Receiver operating characteristic(ROC)curves,calibration curve,decision curve analysis,accuracy,sensitivity,specificity,positive predictive value,negative predictive value,and F1 score were used to evaluate model performance.We employed the SHapley Additive exPlanations(SHAP)algorithm to explain the feature importance of the optimal model.RESULTS A total of ten features were integrated to construct the predictive model and identify the optimal model.XGBoost was considered the best-performing model with an area under the ROC curve(AUC)of 0.984(95%confidence interval:0.972-0.996)in the test set(accuracy:0.925;sensitivity:0.92;specificity:0.927).Furthermore,the model achieved an AUC of 0.703 in external validation.The interpretable SHAP algorithm revealed that the serum calcium ion level was the crucial factor influencing the predictions of the model.CONCLUSION A superior predictive model,leveraging clinical data,has been crafted by employing the most effective XGBoost from a selection of ten algorithms.This model,by predicting the occurrence of AL in patients after rectal cancer resection,has identified the significant role of serum calcium ion levels,providing guidance for clinical practice.The integration of SHAP provides a clear interpretation of the model's predictions.展开更多
Machine learning-based Debris Flow Susceptibility Mapping(DFSM)has emerged as an effective approach for assessing debris flow likelihood,yet its application faces three critical challenges:insufficient reliability of ...Machine learning-based Debris Flow Susceptibility Mapping(DFSM)has emerged as an effective approach for assessing debris flow likelihood,yet its application faces three critical challenges:insufficient reliability of training samples caused by biased negative sampling,opaque decision-making mechanisms in models,and subjective susceptibility mapping methods that lack quantitative evaluation criteria.This study focuses on the Yalong River basin.By integrating high-resolution remote sensing interpretation and field surveys,we established a refined sample database that includes 1,736 debris flow gullies.To address spatial bias in traditional random negative sampling,we developed a semi-supervised optimization strategy based on iterative confidence screening.Comparative experiments with four treebased models(XGBoost,CatBoost,LGBM,and Random Forest)reveal that the optimized sampling strategy improved overall model performance by 8%-12%,with XGBoost achieving the highest accuracy(AUC=0.882)and RF performing the lowest(AUC=0.820).SHAP-based global-local interpretability analysis(applicable to all tree models)identifies elevation and short-duration rainfall as dominant controlling factors.Furthermore,among the tested tree-based models,XGBoost optimized with semisupervised sampling demonstrates the highest reliability in debris flow susceptibility mapping(DFSM),achieving a comprehensive accuracy of 83.64%due to its optimal generalization-stability equilibrium.展开更多
基金funded by Ongoing Research Funding Program for Project number(ORF-2025-648),King Saud University,Riyadh,Saudi Arabia.
文摘Heart disease remains a leading cause of mortality worldwide,emphasizing the urgent need for reliable and interpretable predictive models to support early diagnosis and timely intervention.However,existing Deep Learning(DL)approaches often face several limitations,including inefficient feature extraction,class imbalance,suboptimal classification performance,and limited interpretability,which collectively hinder their deployment in clinical settings.To address these challenges,we propose a novel DL framework for heart disease prediction that integrates a comprehensive preprocessing pipeline with an advanced classification architecture.The preprocessing stage involves label encoding and feature scaling.To address the issue of class imbalance inherent in the personal key indicators of the heart disease dataset,the localized random affine shadowsampling technique is employed,which enhances minority class representation while minimizing overfitting.At the core of the framework lies the Deep Residual Network(DeepResNet),which employs hierarchical residual transformations to facilitate efficient feature extraction and capture complex,non-linear relationships in the data.Experimental results demonstrate that the proposed model significantly outperforms existing techniques,achieving improvements of 3.26%in accuracy,3.16%in area under the receiver operating characteristics,1.09%in recall,and 1.07%in F1-score.Furthermore,robustness is validated using 10-fold crossvalidation,confirming the model’s generalizability across diverse data distributions.Moreover,model interpretability is ensured through the integration of Shapley additive explanations and local interpretable model-agnostic explanations,offering valuable insights into the contribution of individual features to model predictions.Overall,the proposed DL framework presents a robust,interpretable,and clinically applicable solution for heart disease prediction.
文摘Inverse design of advanced materials represents a pivotal challenge in materials science.Leveraging the latent space of Variational Autoencoders(VAEs)for material optimization has emerged as a significant advancement in the field of material inverse design.However,VAEs are inherently prone to generating blurred images,posing challenges for precise inverse design and microstructure manufacturing.While increasing the dimensionality of the VAE latent space can mitigate reconstruction blurriness to some extent,it simultaneously imposes a substantial burden on target optimization due to an excessively high search space.To address these limitations,this study adopts a Variational Autoencoder guided Conditional Diffusion Generative Model(VAE-CDGM)framework integrated with Bayesian optimization to achieve the inverse design of composite materials with targeted mechanical properties.The VAE-CDGM model synergizes the strengths of VAEs and Denoising Diffusion Probabilistic Models(DDPM),enabling the generation of high-quality,sharp images while preserving a manipulable latent space.To accommodate varying dimensional requirements of the latent space,two optimization strategies are proposed.When the latent space dimensionality is excessively high,SHapley Additive exPlanations(SHAP)sensitivity analysis is employed to identify critical latent features for optimization within a reduced subspace.Conversely,direct optimization is performed in the low-dimensional latent space of VAE-CDGM when dimensionality is modest.The results demonstrate that both strategies accurately achieve the targeted design of composite materials while circumventing the blurred reconstruction flaws of VAEs,which offers a novel pathway for the precise design of advanced materials.
基金supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(2023E01006,2024TSYCCX0004).
文摘Arid mountain ecosystems are highly sensitive to hydrothermal stress and land use intensification,yet where net primary productivity(NPP)degradation is likely to persist and what drives it remain unclear in the Tianshan Mountains of Northwest China.We integrated multi-source remote sensing with the Carnegie–Ames–Stanford Approach(CASA)model to estimate NPP during 2000–2020,assessed trend persistence using the Hurst exponent,and identified key drivers and nonlinear thresholds with Extreme Gradient Boosting(XGBoost)and SHapley Additive exPlanations(SHAP).Total NPP averaged 55.74 Tg C/a and ranged from 48.07 to 65.91 Tg C/a from 2000 to 2020,while regional mean NPP rose from 138.97 to 160.69 g C/(m^(2)·a).Land use transfer analysis showed that grassland expanded mainly at the expense of unutilized land and that cropland increased overall.Although NPP increased across 64.11%of the region during 2000–2020,persistence analysis suggested that 53.93%of the Tianshan Mountains was prone to continued NPP decline,including 36.41%with significant projected decline and 17.52%with weak projected decline;these areas formed degradation hotspots concentrated in the central and northern Tianshan Mountains.In contrast,potential improvement was limited(strong persistent improvement:4.97%;strong anti-persistent improvement:0.36%).Driver attribution indicated that land use dominated NPP variability(mean absolute SHAP value=29.54%),followed by precipitation(16.03%)and temperature(11.05%).SHAP dependence analyses showed that precipitation effects stabilized at 300.00–400.00 mm,and temperature exhibited an inverted U-shaped response with a peak near 0.00°C.These findings indicated that persistent degradation risk arose from hydrothermal constraints interacting with land use conversion,highlighting the need for threshold-informed,spatially targeted management to sustain carbon sequestration in arid mountain ecosystems.
文摘本文聚焦我国青海湖流域的水文过程,基于多年气象和水文动态数据,发展了一种融合概念性水文模型FLEX (FluxExchange)和门控循环单元(Gated Recurrent Unit, GRU)的混合模型对流域内最大支流布哈河的逐日径流进行了模拟和预测。在构建混合模型中,采用了三种策略提升模拟精度:引入差分进化自适应算法DREAM(zs)反演水文参数优化FLEX模型;采用变分模态分解(VMD)提取径流数据的信息和特征;利用麻雀搜索算法(SSA)优化深度学习GRU的参数。研究将FLEX模型的模拟结果连同气象数据一起作为神经网络的输入,从而构建了FLEX-VMD-SSA-GRU混合模型。同时,探讨了不同的气象输入条件对模拟结果的影响和贡献:基于7个主要气象要素,由少及多设置了14组输入情景模拟。最后通过SHAP对深度学习方法的结果进行分析,揭示了气象变量对径流长期趋势的贡献和重要度。
基金supported by the General Program of the National Natural Science Foundation of China(No.52274326)the China Baowu Low Carbon Metallurgy Innovation Foundation(No.BWLCF202109)the Seventh Batch of Ten Thousand Talents Plan of China(No.ZX20220553).
文摘Sinter is the core raw material for blast furnaces.Flue pressure,which is an important state parameter,affects sinter quality.In this paper,flue pressure prediction and optimization were studied based on the shapley additive explanation(SHAP)to predict the flue pressure and take targeted adjustment measures.First,the sintering process data were collected and processed.A flue pressure prediction model was then constructed after comparing different feature selection methods and model algorithms using SHAP+extremely random-ized trees(ET).The prediction accuracy of the model within the error range of±0.25 kPa was 92.63%.SHAP analysis was employed to improve the interpretability of the prediction model.The effects of various sintering operation parameters on flue pressure,the relation-ship between the numerical range of key operation parameters and flue pressure,the effect of operation parameter combinations on flue pressure,and the prediction process of the flue pressure prediction model on a single sample were analyzed.A flue pressure optimization module was also constructed and analyzed when the prediction satisfied the judgment conditions.The operating parameter combination was then pushed.The flue pressure was increased by 5.87%during the verification process,achieving a good optimization effect.
文摘BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR)is a common polypectomy proce-dure in clinical practice,but it has a high postoperative recurrence rate.Currently,there is no predictive model for the recurrence of colorectal polyps after EMR.AIM To construct and validate a machine learning(ML)model for predicting the risk of colorectal polyp recurrence one year after EMR.METHODS This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou.Additionally,a total of 166 patients were collected to form a prospective validation set.Feature variable screening was conducted using uni-variate and multivariate logistic regression analyses,and five ML algorithms were used to construct the predictive models.The optimal models were evaluated based on different performance metrics.Decision curve analysis(DCA)and SHapley Additive exPlanation(SHAP)analysis were performed to assess clinical applicability and predictor importance.RESULTS Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR(P<0.05).Among the models,eXtreme Gradient Boosting(XGBoost)demonstrated the highest area under the curve(AUC)in the training set,internal validation set,and prospective validation set,with AUCs of 0.909(95%CI:0.89-0.92),0.921(95%CI:0.90-0.94),and 0.963(95%CI:0.94-0.99),respectively.DCA indicated favorable clinical utility for the XGBoost model.SHAP analysis identified smoking history,family history,and age as the top three most important predictors in the model.CONCLUSION The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.
基金supported by the National Key Research and Development Plan(Grant No.2023YFB3712400)the National Key Research and Development Plan(Grant No.2020YFB1713600).
文摘Mechanical properties are critical to the quality of hot-rolled steel pipe products.Accurately understanding the relationship between rolling parameters and mechanical properties is crucial for effective prediction and control.To address this,an industrial big data platform was developed to collect and process multi-source heterogeneous data from the entire production process,providing a complete dataset for mechanical property prediction.The adaptive bandwidth kernel density estimation(ABKDE)method was proposed to adjust bandwidth dynamically based on data density.Combining long short-term memory neural networks with ABKDE offers robust prediction interval capabilities for mechanical properties.The proposed method was deployed in a large-scale steel plant,which demonstrated superior prediction interval performance compared to lower upper bound estimation,mean variance estimation,and extreme learning machine-adaptive bandwidth kernel density estimation,achieving a prediction interval normalized average width of 0.37,a prediction interval coverage probability of 0.94,and the lowest coverage width-based criterion of 1.35.Notably,shapley additive explanations-based explanations significantly improved the proposed model’s credibility by providing a clear analysis of feature impacts.
文摘This study presents an enhanced convolutional neural network(CNN)model integrated with Explainable Artificial Intelligence(XAI)techniques for accurate prediction and interpretation of wheat crop diseases.The aim is to streamline the detection process while offering transparent insights into the model’s decision-making to support effective disease management.To evaluate the model,a dataset was collected from wheat fields in Kotli,Azad Kashmir,Pakistan,and tested across multiple data splits.The proposed model demonstrates improved stability,faster conver-gence,and higher classification accuracy.The results show significant improvements in prediction accuracy and stability compared to prior works,achieving up to 100%accuracy in certain configurations.In addition,XAI methods such as Local Interpretable Model-agnostic Explanations(LIME)and Shapley Additive Explanations(SHAP)were employed to explain the model’s predictions,highlighting the most influential features contributing to classification decisions.The combined use of CNN and XAI offers a dual benefit:strong predictive performance and clear interpretability of outcomes,which is especially critical in real-world agricultural applications.These findings underscore the potential of integrating deep learning models with XAI to advance automated plant disease detection.The study offers a precise,reliable,and interpretable solution for improving wheat production and promoting agricultural sustainability.Future extensions of this work may include scaling the dataset across broader regions and incorporating additional modalities such as environmental data to enhance model robustness and generalization.
基金Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for supporting this study
文摘Machine learning(ML)models are widely used for predicting undrained shear strength(USS),but interpretability has been a limitation in various studies.Therefore,this study introduced shapley additive explanations(SHAP)to clarify the contribution of each input feature in USS prediction.Three ML models,artificial neural network(ANN),extreme gradient boosting(XGBoost),and random forest(RF),were employed,with accuracy evaluated using mean squared error,mean absolute error,and coefficient of determination(R^(2)).The RF achieved the highest performance with an R^(2) of 0.82.SHAP analysis identified pre-consolidation stress as a key contributor to USS prediction.SHAP dependence plots reveal that the ANN captures smoother,linear feature-output relationships,while the RF handles complex,non-linear interactions more effectively.This suggests a non-linear relationship between USS and input features,with RF outperforming ANN.These findings highlight SHAP’s role in enhancing interpretability and promoting transparency and reliability in ML predictions for geotechnical applications.
基金supported by the National Natural Science Foundation of China(Grant Nos.42107214 and 42477157).
文摘In this study,we used the Kolmogorov-Arnold networks(KAN)model based on the Kolmogorov-Arnold representation theorem for a comprehensive and fair evaluation.We compare its performance with four other powerful classification models across three datasets:a simple slope binary classification dataset,an imbalanced rockburst dataset,and a highly discrete liquefaction dataset.First,a thorough review of machine-learning algorithms for geohazard assessment was conducted.Subsequently,three datasets were collected from real engineering practices,and their data structures were visualized.Bayesian optimization was then used to adjust the parameters of all models across all datasets.To ensure model interpretability,a global sensitivity analysis based on Sobol indices was performed,establishing an interpretable visual analysis of the model's decision-making process.For a fair evaluation,various metrics and repeated stratified 10-fold cross-validation were employed to comprehensively analyze the predictive results of the models.The results indicate that although the KAN model,based on the RBF kernel,achieves the expected performance on the binary classification dataset,it also performs well on imbalanced and highly discrete datasets,significantly surpassing other commonly used classification models.This demonstrated the broad application potential of the KAN model in geotechnical engineering.
文摘The Titanic sunk 113 years ago on April 14-15,after hitting an iceberg,with human error likely causing the ship to wander into those dangerous waters.Today,autonomous systems built on AI can help ships avoid such accidents.But could such a system explain to the captain why it was controlling the ship in a certain way?
文摘Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.
文摘BACKGROUND Parastomal hernia(PSH)is a common and challenging complication following preventive ostomy in rectal cancer patients,lacking accurate tools for early risk prediction.AIM To explore the application of machine learning algorithms in predicting the occurrence of PSH in patients undergoing preventive ostomy after rectal cancer resection,providing valuable support for clinical decision-making.METHODS A retrospective analysis was conducted on the clinical data of 579 patients who underwent rectal cancer resection with preventive ostomy at Tongji Hospital,Huazhong University of Science and Technology,between January 2015 and June 2023.Various machine learning models were constructed and trained using preoperative and intraoperative clinical variables to assess their predictive performance for PSH risk.SHapley Additive exPlanations(SHAP)were used to analyze the importance of features in the models.RESULTS A total of 579 patients were included,with 31(5.3%)developing PSH.Among the machine learning models,the random forest(RF)model showed the best performance.In the test set,the RF model achieved an area under the curve of 0.900,sensitivity of 0.900,and specificity of 0.725.SHAP analysis revealed that tumor distance from the anal verge,body mass index,and preoperative hypertension were the key factors influencing the occurrence of PSH.CONCLUSION Machine learning,particularly the RF model,demonstrates high accuracy and reliability in predicting PSH after preventive ostomy in rectal cancer patients.This technology supports personalized risk assessment and postoperative management,showing significant potential for clinical application.An online predictive platform based on the RF model(https://yangsu2023.shinyapps.io/parastomal_hernia/)has been developed to assist in early screening and intervention for high-risk patients,further enhancing postoperative management and improving patients’quality of life.
基金supported by the Postgraduate Innovation Program of Chongqing University of Science and Technology(Grant No.YKJCX2420605)Research Foundation of Chongqing University of Science and Technology(Grant No.ckrc20241225)+1 种基金Opening Projects of State Key Laboratory of Solid Waste Reuse for Building Materials(Grant No.SWR-2021-005)Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJQN202401510)。
文摘Foam concrete is widely used in engineering due to its lightweight and high porosity.Its compressive strength,a key performance indicator,is influenced by multiple factors,showing nonlinear variation.As compressive strength tests for foam concrete take a long time,a fast and accurate prediction method is needed.In recent years,machine learning has become a powerful tool for predicting the compressive strength of cement-based materials.However,existing studies often use a limited number of input parameters,and the prediction accuracy of machine learning models under the influence of multiple parameters and nonlinearity remains unclear.This study selects foam concrete density,water-to-cement ratio(W/C),supplementary cementitious material replacement rate(SCM),fine aggregate to binder ratio(FA/Binder),superplasticizer content(SP),and age of the concrete(Age)as input parameters,with compressive strength as the output.Five different machine learning models were compared,and sensitivity analysis,based on Shapley Additive Explanations(SHAP),was used to assess the contribution of each input parameter.The results show that Gaussian Process Regression(GPR)outperforms the other models,with R2,RMSE,MAE,and MAPE values of 0.95,1.6,0.81,and 0.2,respectively.It is because GPR,optimized through Bayesian methods,better fits complex nonlinear relationships,especially considering a large number of input parameters.Sensitivity analysis indicates that the influence of input parameters on compressive strength decreases in the following order:foam concrete density,W/C,Age,FA/Binder,SP,and SCM.
基金Supported by National Natural Science Foundation of China,No.82172781Shaanxi Health Scientific Research Innovation Team Project,No.2024TD-06.
文摘BACKGROUND Despite the promising prospects of utilizing artificial intelligence and machine learning(ML)for comprehensive disease analysis,few models constructed have been applied in clinical practice due to their complexity and the lack of reasonable explanations.In contrast to previous studies with small sample sizes and limited model interpretability,we developed a transparent eXtreme Gradient Boosting(XGBoost)-based model supported by multi-center data,using patients'basic information and clinical indicators to forecast the occurrence of anastomotic leakage(AL)after rectal cancer resection surgery.The model demonstrated robust predictive performance and identified clinically relevant thresholds,which may assist physicians in optimizing perioperative management.AIM To develop an interpretable ML model for accurately predicting the occurrence probability of AL after rectal cancer resection and define our clinical alert values for serum calcium ions.METHODS Patients who underwent anterior resection of the rectum for rectal carcinoma at the Department of Digestive Surgery,Xijing Hospital of Digestive Diseases,Air Force Medical University,and Shaanxi Provincial People's Hospital,were retrospectively collected from January 2011 to December 2021.Ten ML models were integrated to analyze the data and develop the predictive models.Receiver operating characteristic(ROC)curves,calibration curve,decision curve analysis,accuracy,sensitivity,specificity,positive predictive value,negative predictive value,and F1 score were used to evaluate model performance.We employed the SHapley Additive exPlanations(SHAP)algorithm to explain the feature importance of the optimal model.RESULTS A total of ten features were integrated to construct the predictive model and identify the optimal model.XGBoost was considered the best-performing model with an area under the ROC curve(AUC)of 0.984(95%confidence interval:0.972-0.996)in the test set(accuracy:0.925;sensitivity:0.92;specificity:0.927).Furthermore,the model achieved an AUC of 0.703 in external validation.The interpretable SHAP algorithm revealed that the serum calcium ion level was the crucial factor influencing the predictions of the model.CONCLUSION A superior predictive model,leveraging clinical data,has been crafted by employing the most effective XGBoost from a selection of ten algorithms.This model,by predicting the occurrence of AL in patients after rectal cancer resection,has identified the significant role of serum calcium ion levels,providing guidance for clinical practice.The integration of SHAP provides a clear interpretation of the model's predictions.
基金funded by the Second Tibetan Plateau Scientific Expedition and Research,Ministry of Science and Technology(Project No.2019QZKK0902)the West Light Foundation of the Chinese Academy of Sciences(Project No.E3R2120)the Research Programme of Institute of Mountain Hazards and Environment,Chinese Academy of Sciences(Project No.IMHE-ZDRW-01).
文摘Machine learning-based Debris Flow Susceptibility Mapping(DFSM)has emerged as an effective approach for assessing debris flow likelihood,yet its application faces three critical challenges:insufficient reliability of training samples caused by biased negative sampling,opaque decision-making mechanisms in models,and subjective susceptibility mapping methods that lack quantitative evaluation criteria.This study focuses on the Yalong River basin.By integrating high-resolution remote sensing interpretation and field surveys,we established a refined sample database that includes 1,736 debris flow gullies.To address spatial bias in traditional random negative sampling,we developed a semi-supervised optimization strategy based on iterative confidence screening.Comparative experiments with four treebased models(XGBoost,CatBoost,LGBM,and Random Forest)reveal that the optimized sampling strategy improved overall model performance by 8%-12%,with XGBoost achieving the highest accuracy(AUC=0.882)and RF performing the lowest(AUC=0.820).SHAP-based global-local interpretability analysis(applicable to all tree models)identifies elevation and short-duration rainfall as dominant controlling factors.Furthermore,among the tested tree-based models,XGBoost optimized with semisupervised sampling demonstrates the highest reliability in debris flow susceptibility mapping(DFSM),achieving a comprehensive accuracy of 83.64%due to its optimal generalization-stability equilibrium.