This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging...This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.展开更多
The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial int...The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.展开更多
Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural ...Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.展开更多
Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett...Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。展开更多
This paper takes a microanalytic perspective on the speech and gestures used by one teacher of ESL (English as a Second Language) in an intensive English program classroom. Videotaped excerpts from her intermediate-...This paper takes a microanalytic perspective on the speech and gestures used by one teacher of ESL (English as a Second Language) in an intensive English program classroom. Videotaped excerpts from her intermediate-level grammar course were transcribed to represent the speech, gesture and other non-verbal behavior that accompanied unplanned explanations of vocabulary that arose during three focus-on-form lessons. The gesture classification system of McNeill (1992), which delineates different types of hand movements (iconics metaphorics, deictics, beats), was used to understand the role the gestures played in these explanations. Results suggest that gestures and other non-verbal behavior are forms of input to classroom second language learners that must be considered a salient factor in classroom-based SLA (Second Language Acquisition) research展开更多
In this paper I examine the following claims by William Eaton in his monograph Boyle on Fire: (i) that Boyle's religious convictions led him to believe that the world was not completely explicable, and this shows ...In this paper I examine the following claims by William Eaton in his monograph Boyle on Fire: (i) that Boyle's religious convictions led him to believe that the world was not completely explicable, and this shows that there is a shortcoming in the power of mechanical explanations; (ii) that mechanical explanations offer only sufficient, not necessary explanations, and this too was taken by Boyle to be a limit in the explanatory power of mechanical explanations; (iii) that the mature Boyle thought that there could be more intelligible explanatory models than mechanism; and (iv) that what Boyle says at any point in his career is incompatible with the statement of Maria Boas-Hall, i.e., that the mechanical hypothesis can explicate all natural phenomena. Since all four of these claims are part of Eaton's developmental argument, my rejection of them will not only show how the particular developmental story Eaton diagnoses is inaccurate, but will also explain what limits there actually are in Boyle's account of the intelligibility of mechanical explanations. My account will also show why important philosophers like Locke and Leibniz should be interested in Boyle's philosophical work.展开更多
Wind power forecasting(WPF)is important for safe,stable,and reliable integration of new energy technologies into power systems.Machine learning(ML)algorithms have recently attracted increasing attention in the field o...Wind power forecasting(WPF)is important for safe,stable,and reliable integration of new energy technologies into power systems.Machine learning(ML)algorithms have recently attracted increasing attention in the field of WPF.However,opaque decisions and lack of trustworthiness of black-box models for WPF could cause scheduling risks.This study develops a method for identifying risky models in practical applications and avoiding the risks.First,a local interpretable model-agnostic explanations algorithm is introduced and improved for WPF model analysis.On that basis,a novel index is presented to quantify the level at which neural networks or other black-box models can trust features involved in training.Then,by revealing the operational mechanism for local samples,human interpretability of the black-box model is examined under different accuracies,time horizons,and seasons.This interpretability provides a basis for several technical routes for WPF from the viewpoint of the forecasting model.Moreover,further improvements in accuracy of WPF are explored by evaluating possibilities of using interpretable ML models that use multi-horizons global trust modeling and multi-seasons interpretable feature selection methods.Experimental results from a wind farm in China show that error can be robustly reduced.展开更多
Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensiv...Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensive training datasets.The public release and rapid refinement of large language models(LLMs)like ChatGPT have accelerated the adoption of GAI across various medical specialties,offering new tools for education,clinical simulation,and research.Dermatology training,which heavily relies on visual pattern recognition and requires extensive exposure to diverse morphological presentations,faces persistent challenges such as uneven distribu-tion of educational resources,limited patient exposure for rare conditions,and variability in teaching quality.Exploring the integration of GAI into pedagogical frameworks offers innovative approaches to address these challenges,potentially enhancing the quality,standardization,scalability,and accessibility of dermatology ed-ucation.This comprehensive review examines the core concepts and technical foundations of GAI,highlights its specific applications within dermatology teaching and learning—including simulated case generation,per-sonalized learning pathways,and academic support—and discusses the current limitations,practical challenges,and ethical considerations surrounding its use.The aim is to provide a balanced perspective on the significant potential of GAI for transforming dermatology education and to offer evidence-based insights to guide future exploration,implementation,and policy development.展开更多
Contemporary demands necessitate the swift and accurate detection of cracks in critical infrastructures,including tunnels and pavements.This study proposed a transfer learning-based encoder-decoder method with visual ...Contemporary demands necessitate the swift and accurate detection of cracks in critical infrastructures,including tunnels and pavements.This study proposed a transfer learning-based encoder-decoder method with visual explanations for infrastructure crack segmentation.Firstly,a vast dataset containing 7089 images was developed,comprising diverse conditions—simple and complex crack patterns as well as clean and rough backgrounds.Secondly,leveraging transfer learning,an encoder-decoder model with visual explanations was formulated,utilizing varied pre-trained convolutional neural network(CNN)as the encoder.Visual explanations were achieved through gradient-weighted class activation mapping(Grad-CAM)to interpret the CNN segmentation model.Thirdly,accuracy,complexity(computation and model),and memory usage assessed CNN feasibility in practical engineering.Model performance was gauged via prediction and visual explanation.The investigation encompassed hyperparameters,data augmentation,deep learning from scratch vs.transfer learning,segmentation model architectures,segmentation model encoders,and encoder pre-training strategies.Results underscored transfer learning’s potency in enhancing CNN accuracy for crack segmentation,surpassing deep learning from scratch.Notably,encoder classification accuracy bore no significant correlation with CNN segmentation accuracy.Among all tested models,UNet-EfficientNet_B7 excelled in crack segmentation,harmonizing accuracy,complexity,memory usage,prediction,and visual explanation.展开更多
This study presents an enhanced convolutional neural network(CNN)model integrated with Explainable Artificial Intelligence(XAI)techniques for accurate prediction and interpretation of wheat crop diseases.The aim is to...This study presents an enhanced convolutional neural network(CNN)model integrated with Explainable Artificial Intelligence(XAI)techniques for accurate prediction and interpretation of wheat crop diseases.The aim is to streamline the detection process while offering transparent insights into the model’s decision-making to support effective disease management.To evaluate the model,a dataset was collected from wheat fields in Kotli,Azad Kashmir,Pakistan,and tested across multiple data splits.The proposed model demonstrates improved stability,faster conver-gence,and higher classification accuracy.The results show significant improvements in prediction accuracy and stability compared to prior works,achieving up to 100%accuracy in certain configurations.In addition,XAI methods such as Local Interpretable Model-agnostic Explanations(LIME)and Shapley Additive Explanations(SHAP)were employed to explain the model’s predictions,highlighting the most influential features contributing to classification decisions.The combined use of CNN and XAI offers a dual benefit:strong predictive performance and clear interpretability of outcomes,which is especially critical in real-world agricultural applications.These findings underscore the potential of integrating deep learning models with XAI to advance automated plant disease detection.The study offers a precise,reliable,and interpretable solution for improving wheat production and promoting agricultural sustainability.Future extensions of this work may include scaling the dataset across broader regions and incorporating additional modalities such as environmental data to enhance model robustness and generalization.展开更多
Mechanical properties are critical to the quality of hot-rolled steel pipe products.Accurately understanding the relationship between rolling parameters and mechanical properties is crucial for effective prediction an...Mechanical properties are critical to the quality of hot-rolled steel pipe products.Accurately understanding the relationship between rolling parameters and mechanical properties is crucial for effective prediction and control.To address this,an industrial big data platform was developed to collect and process multi-source heterogeneous data from the entire production process,providing a complete dataset for mechanical property prediction.The adaptive bandwidth kernel density estimation(ABKDE)method was proposed to adjust bandwidth dynamically based on data density.Combining long short-term memory neural networks with ABKDE offers robust prediction interval capabilities for mechanical properties.The proposed method was deployed in a large-scale steel plant,which demonstrated superior prediction interval performance compared to lower upper bound estimation,mean variance estimation,and extreme learning machine-adaptive bandwidth kernel density estimation,achieving a prediction interval normalized average width of 0.37,a prediction interval coverage probability of 0.94,and the lowest coverage width-based criterion of 1.35.Notably,shapley additive explanations-based explanations significantly improved the proposed model’s credibility by providing a clear analysis of feature impacts.展开更多
The study by Huang et al,published in the World Journal of Gastroenterology,advances intrahepatic cholangiocarcinoma(ICC)management by developing a machine-learning model to predict textbook outcomes(TO)based on preop...The study by Huang et al,published in the World Journal of Gastroenterology,advances intrahepatic cholangiocarcinoma(ICC)management by developing a machine-learning model to predict textbook outcomes(TO)based on preoperative factors.By analyzing data from 376 patients across four Chinese medical centers,the researchers identified key variables influencing TO,including Child-Pugh classification,Eastern Cooperative Oncology Group score,hepatitis B status,and tumor size.The model,created using logistic regression and the extreme gradient boosting algorithm,demonstrated high predictive accuracy,with area under the curve values of 0.8825 for internal validation and 0.8346 for external validation.The integration of the Shapley additive explanation technique enhances the interpretability of the model,which is crucial for clinical decision-making.This research highlights the potential of machine learning to improve surgical planning and patient outcomes in ICC,opening possibilities for personalized treatment approaches based on individual patient characteristics and risk factors.展开更多
Machine learning-based Debris Flow Susceptibility Mapping(DFSM)has emerged as an effective approach for assessing debris flow likelihood,yet its application faces three critical challenges:insufficient reliability of ...Machine learning-based Debris Flow Susceptibility Mapping(DFSM)has emerged as an effective approach for assessing debris flow likelihood,yet its application faces three critical challenges:insufficient reliability of training samples caused by biased negative sampling,opaque decision-making mechanisms in models,and subjective susceptibility mapping methods that lack quantitative evaluation criteria.This study focuses on the Yalong River basin.By integrating high-resolution remote sensing interpretation and field surveys,we established a refined sample database that includes 1,736 debris flow gullies.To address spatial bias in traditional random negative sampling,we developed a semi-supervised optimization strategy based on iterative confidence screening.Comparative experiments with four treebased models(XGBoost,CatBoost,LGBM,and Random Forest)reveal that the optimized sampling strategy improved overall model performance by 8%-12%,with XGBoost achieving the highest accuracy(AUC=0.882)and RF performing the lowest(AUC=0.820).SHAP-based global-local interpretability analysis(applicable to all tree models)identifies elevation and short-duration rainfall as dominant controlling factors.Furthermore,among the tested tree-based models,XGBoost optimized with semisupervised sampling demonstrates the highest reliability in debris flow susceptibility mapping(DFSM),achieving a comprehensive accuracy of 83.64%due to its optimal generalization-stability equilibrium.展开更多
BACKGROUND Female depression is a prevalent and increasingly recognized mental health issue.Due to cultural and social factors,many female patients still face challenges in diagnosis and treatment,and traditional asse...BACKGROUND Female depression is a prevalent and increasingly recognized mental health issue.Due to cultural and social factors,many female patients still face challenges in diagnosis and treatment,and traditional assessment methods often fail to identify high-risk individuals accurately.This highlights the necessity of developing more precise predictive tools.Utilizing machine learning(ML)algorithms to construct predictive models may overcome the limitations of traditional methods,providing more comprehensive support for women’s mental health.AIM To construct an ML-nomogram hybrid model that translates multivariate risk predictors of female depressive symptoms into actionable clinical scoring thresholds,optimizing predictive accuracy and interpretability for healthcare applications.METHODS We analyzed data from 7609 female participants aged 18 to 85 years from the Guangdong Provincial Sleep and Psychosomatic Health Survey.Sixteen variables,including anxiety symptoms,insomnia,chronic diseases,exercise habits,and age,were selected based on prior literature and comprehensively incorporated into ML models to maximize predictive information utilization.Three ML algorithms,extreme gradient boosting,support vector machine,and light gradient boosting machine,were employed to construct predictive models.Model performance was evaluated using accuracy,precision,recall,F1 score,and area under the curve(AUC).Feature importance was interpreted using SHapley Additive exPlanations(SHAP),with ablation studies validating the impact of the top five SHAPderived features on predictive performance,and a nomogram was constructed based on these prioritized predictors.Clinical utility was assessed through decision curve analysis.RESULTS The prevalence of depressive symptoms was 6.8%among the sample.The evaluation of predictive models revealed that light gradient boosting machine achieved a top-performing AUC of 0.867,placing it ahead of extreme gradient boosting(AUC=0.862)and support vector machine(AUC=0.849).SHAP analysis identified insomnia,anxiety symptoms,age,chronic disease,and exercise as the top five predictors.The nomogram based on these features demonstrated excellent discrimination(AUC=0.910)and calibration,with significant net benefits in decision curve analysis compared to baseline strategies.The model effectively stratifies depressive symptoms risk,facilitating personalized and quantitative assessments in clinical settings.We also developed an interactive digital version of the nomogram to facilitate its application in clinical practice.CONCLUSION The ML-based model effectively predicts depressive symptoms in women,identifying insomnia,anxiety symptoms,age,chronic diseases,and exercise as key predictors,offering a practical tool for early detection and intervention.展开更多
Foam concrete is widely used in engineering due to its lightweight and high porosity.Its compressive strength,a key performance indicator,is influenced by multiple factors,showing nonlinear variation.As compressive st...Foam concrete is widely used in engineering due to its lightweight and high porosity.Its compressive strength,a key performance indicator,is influenced by multiple factors,showing nonlinear variation.As compressive strength tests for foam concrete take a long time,a fast and accurate prediction method is needed.In recent years,machine learning has become a powerful tool for predicting the compressive strength of cement-based materials.However,existing studies often use a limited number of input parameters,and the prediction accuracy of machine learning models under the influence of multiple parameters and nonlinearity remains unclear.This study selects foam concrete density,water-to-cement ratio(W/C),supplementary cementitious material replacement rate(SCM),fine aggregate to binder ratio(FA/Binder),superplasticizer content(SP),and age of the concrete(Age)as input parameters,with compressive strength as the output.Five different machine learning models were compared,and sensitivity analysis,based on Shapley Additive Explanations(SHAP),was used to assess the contribution of each input parameter.The results show that Gaussian Process Regression(GPR)outperforms the other models,with R2,RMSE,MAE,and MAPE values of 0.95,1.6,0.81,and 0.2,respectively.It is because GPR,optimized through Bayesian methods,better fits complex nonlinear relationships,especially considering a large number of input parameters.Sensitivity analysis indicates that the influence of input parameters on compressive strength decreases in the following order:foam concrete density,W/C,Age,FA/Binder,SP,and SCM.展开更多
BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms ...BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.展开更多
Improving early diagnosis of autism spectrum disorder(ASD)in children increasingly relies on predictive models that are reliable and accessible to non-experts.This study aims to develop such models using Python-based ...Improving early diagnosis of autism spectrum disorder(ASD)in children increasingly relies on predictive models that are reliable and accessible to non-experts.This study aims to develop such models using Python-based tools to improve ASD diagnosis in clinical settings.We performed exploratory data analysis to ensure data quality and identify key patterns in pediatric ASD data.We selected the categorical boosting(CatBoost)algorithm to effectively handle the large number of categorical variables.We used the PyCaret automated machine learning(AutoML)tool to make the models user-friendly for clinicians without extensive machine learning expertise.In addition,we applied Shapley additive explanations(SHAP),an explainable artificial intelligence(XAI)technique,to improve the interpretability of the models.Models developed using CatBoost and other AI algorithms showed high accuracy in diagnosing ASD in children.SHAP provided clear insights into the influence of each variable on diagnostic outcomes,making model decisions transparent and understandable to healthcare professionals.By integrating robust machine learning methods with user-friendly tools such as PyCaret and leveraging XAI techniques such as SHAP,this study contributes to the development of reliable,interpretable,and accessible diagnostic tools for ASD.These advances hold great promise for supporting informed decision-making in clinical settings,ultimately improving early identification and intervention strategies for ASD in the pediatric population.However,the study is limited by the dataset’s demographic imbalance and the lack of external clinical validation,which should be addressed in future research.展开更多
基金supported by a National Research Foundation of Korea(NRF)grant funded by the Korean government(MSIT)(RS-2023-00222536).
文摘This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.
基金supported by the National Natural Science Foundation of China Project(No.62302540)please visit their website at https://www.nsfc.gov.cn/(accessed on 18 June 2024).
文摘The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.
文摘Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.
基金support provided by The Science and Technology Development Fund,Macao SAR,China(File Nos.0057/2020/AGJ and SKL-IOTSC-2021-2023)Science and Technology Program of Guangdong Province,China(Grant No.2021A0505080009).
文摘Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。
文摘This paper takes a microanalytic perspective on the speech and gestures used by one teacher of ESL (English as a Second Language) in an intensive English program classroom. Videotaped excerpts from her intermediate-level grammar course were transcribed to represent the speech, gesture and other non-verbal behavior that accompanied unplanned explanations of vocabulary that arose during three focus-on-form lessons. The gesture classification system of McNeill (1992), which delineates different types of hand movements (iconics metaphorics, deictics, beats), was used to understand the role the gestures played in these explanations. Results suggest that gestures and other non-verbal behavior are forms of input to classroom second language learners that must be considered a salient factor in classroom-based SLA (Second Language Acquisition) research
文摘In this paper I examine the following claims by William Eaton in his monograph Boyle on Fire: (i) that Boyle's religious convictions led him to believe that the world was not completely explicable, and this shows that there is a shortcoming in the power of mechanical explanations; (ii) that mechanical explanations offer only sufficient, not necessary explanations, and this too was taken by Boyle to be a limit in the explanatory power of mechanical explanations; (iii) that the mature Boyle thought that there could be more intelligible explanatory models than mechanism; and (iv) that what Boyle says at any point in his career is incompatible with the statement of Maria Boas-Hall, i.e., that the mechanical hypothesis can explicate all natural phenomena. Since all four of these claims are part of Eaton's developmental argument, my rejection of them will not only show how the particular developmental story Eaton diagnoses is inaccurate, but will also explain what limits there actually are in Boyle's account of the intelligibility of mechanical explanations. My account will also show why important philosophers like Locke and Leibniz should be interested in Boyle's philosophical work.
基金supported by the National Key R&D Program of China(Technology and application of wind power/photovoltaic power prediction for promoting renewable energy consumption)under Grant(2018YFB0904200).
文摘Wind power forecasting(WPF)is important for safe,stable,and reliable integration of new energy technologies into power systems.Machine learning(ML)algorithms have recently attracted increasing attention in the field of WPF.However,opaque decisions and lack of trustworthiness of black-box models for WPF could cause scheduling risks.This study develops a method for identifying risky models in practical applications and avoiding the risks.First,a local interpretable model-agnostic explanations algorithm is introduced and improved for WPF model analysis.On that basis,a novel index is presented to quantify the level at which neural networks or other black-box models can trust features involved in training.Then,by revealing the operational mechanism for local samples,human interpretability of the black-box model is examined under different accuracies,time horizons,and seasons.This interpretability provides a basis for several technical routes for WPF from the viewpoint of the forecasting model.Moreover,further improvements in accuracy of WPF are explored by evaluating possibilities of using interpretable ML models that use multi-horizons global trust modeling and multi-seasons interpretable feature selection methods.Experimental results from a wind farm in China show that error can be robustly reduced.
文摘Generative Artificial Intelligence(GAI)refers to a class of AI systems capable of creating novel,coherent,and contextually relevant content—such as text,images,audio,and video—based on patterns learned from extensive training datasets.The public release and rapid refinement of large language models(LLMs)like ChatGPT have accelerated the adoption of GAI across various medical specialties,offering new tools for education,clinical simulation,and research.Dermatology training,which heavily relies on visual pattern recognition and requires extensive exposure to diverse morphological presentations,faces persistent challenges such as uneven distribu-tion of educational resources,limited patient exposure for rare conditions,and variability in teaching quality.Exploring the integration of GAI into pedagogical frameworks offers innovative approaches to address these challenges,potentially enhancing the quality,standardization,scalability,and accessibility of dermatology ed-ucation.This comprehensive review examines the core concepts and technical foundations of GAI,highlights its specific applications within dermatology teaching and learning—including simulated case generation,per-sonalized learning pathways,and academic support—and discusses the current limitations,practical challenges,and ethical considerations surrounding its use.The aim is to provide a balanced perspective on the significant potential of GAI for transforming dermatology education and to offer evidence-based insights to guide future exploration,implementation,and policy development.
基金the National Natural Science Foundation of China(Grant Nos.52090083 and 52378405)Key Technology R&D Plan of Yunnan Provincial Department of Science and Technology(Grant No.202303AA080003)for their financial support.
文摘Contemporary demands necessitate the swift and accurate detection of cracks in critical infrastructures,including tunnels and pavements.This study proposed a transfer learning-based encoder-decoder method with visual explanations for infrastructure crack segmentation.Firstly,a vast dataset containing 7089 images was developed,comprising diverse conditions—simple and complex crack patterns as well as clean and rough backgrounds.Secondly,leveraging transfer learning,an encoder-decoder model with visual explanations was formulated,utilizing varied pre-trained convolutional neural network(CNN)as the encoder.Visual explanations were achieved through gradient-weighted class activation mapping(Grad-CAM)to interpret the CNN segmentation model.Thirdly,accuracy,complexity(computation and model),and memory usage assessed CNN feasibility in practical engineering.Model performance was gauged via prediction and visual explanation.The investigation encompassed hyperparameters,data augmentation,deep learning from scratch vs.transfer learning,segmentation model architectures,segmentation model encoders,and encoder pre-training strategies.Results underscored transfer learning’s potency in enhancing CNN accuracy for crack segmentation,surpassing deep learning from scratch.Notably,encoder classification accuracy bore no significant correlation with CNN segmentation accuracy.Among all tested models,UNet-EfficientNet_B7 excelled in crack segmentation,harmonizing accuracy,complexity,memory usage,prediction,and visual explanation.
文摘This study presents an enhanced convolutional neural network(CNN)model integrated with Explainable Artificial Intelligence(XAI)techniques for accurate prediction and interpretation of wheat crop diseases.The aim is to streamline the detection process while offering transparent insights into the model’s decision-making to support effective disease management.To evaluate the model,a dataset was collected from wheat fields in Kotli,Azad Kashmir,Pakistan,and tested across multiple data splits.The proposed model demonstrates improved stability,faster conver-gence,and higher classification accuracy.The results show significant improvements in prediction accuracy and stability compared to prior works,achieving up to 100%accuracy in certain configurations.In addition,XAI methods such as Local Interpretable Model-agnostic Explanations(LIME)and Shapley Additive Explanations(SHAP)were employed to explain the model’s predictions,highlighting the most influential features contributing to classification decisions.The combined use of CNN and XAI offers a dual benefit:strong predictive performance and clear interpretability of outcomes,which is especially critical in real-world agricultural applications.These findings underscore the potential of integrating deep learning models with XAI to advance automated plant disease detection.The study offers a precise,reliable,and interpretable solution for improving wheat production and promoting agricultural sustainability.Future extensions of this work may include scaling the dataset across broader regions and incorporating additional modalities such as environmental data to enhance model robustness and generalization.
基金supported by the National Key Research and Development Plan(Grant No.2023YFB3712400)the National Key Research and Development Plan(Grant No.2020YFB1713600).
文摘Mechanical properties are critical to the quality of hot-rolled steel pipe products.Accurately understanding the relationship between rolling parameters and mechanical properties is crucial for effective prediction and control.To address this,an industrial big data platform was developed to collect and process multi-source heterogeneous data from the entire production process,providing a complete dataset for mechanical property prediction.The adaptive bandwidth kernel density estimation(ABKDE)method was proposed to adjust bandwidth dynamically based on data density.Combining long short-term memory neural networks with ABKDE offers robust prediction interval capabilities for mechanical properties.The proposed method was deployed in a large-scale steel plant,which demonstrated superior prediction interval performance compared to lower upper bound estimation,mean variance estimation,and extreme learning machine-adaptive bandwidth kernel density estimation,achieving a prediction interval normalized average width of 0.37,a prediction interval coverage probability of 0.94,and the lowest coverage width-based criterion of 1.35.Notably,shapley additive explanations-based explanations significantly improved the proposed model’s credibility by providing a clear analysis of feature impacts.
文摘The study by Huang et al,published in the World Journal of Gastroenterology,advances intrahepatic cholangiocarcinoma(ICC)management by developing a machine-learning model to predict textbook outcomes(TO)based on preoperative factors.By analyzing data from 376 patients across four Chinese medical centers,the researchers identified key variables influencing TO,including Child-Pugh classification,Eastern Cooperative Oncology Group score,hepatitis B status,and tumor size.The model,created using logistic regression and the extreme gradient boosting algorithm,demonstrated high predictive accuracy,with area under the curve values of 0.8825 for internal validation and 0.8346 for external validation.The integration of the Shapley additive explanation technique enhances the interpretability of the model,which is crucial for clinical decision-making.This research highlights the potential of machine learning to improve surgical planning and patient outcomes in ICC,opening possibilities for personalized treatment approaches based on individual patient characteristics and risk factors.
基金funded by the Second Tibetan Plateau Scientific Expedition and Research,Ministry of Science and Technology(Project No.2019QZKK0902)the West Light Foundation of the Chinese Academy of Sciences(Project No.E3R2120)the Research Programme of Institute of Mountain Hazards and Environment,Chinese Academy of Sciences(Project No.IMHE-ZDRW-01).
文摘Machine learning-based Debris Flow Susceptibility Mapping(DFSM)has emerged as an effective approach for assessing debris flow likelihood,yet its application faces three critical challenges:insufficient reliability of training samples caused by biased negative sampling,opaque decision-making mechanisms in models,and subjective susceptibility mapping methods that lack quantitative evaluation criteria.This study focuses on the Yalong River basin.By integrating high-resolution remote sensing interpretation and field surveys,we established a refined sample database that includes 1,736 debris flow gullies.To address spatial bias in traditional random negative sampling,we developed a semi-supervised optimization strategy based on iterative confidence screening.Comparative experiments with four treebased models(XGBoost,CatBoost,LGBM,and Random Forest)reveal that the optimized sampling strategy improved overall model performance by 8%-12%,with XGBoost achieving the highest accuracy(AUC=0.882)and RF performing the lowest(AUC=0.820).SHAP-based global-local interpretability analysis(applicable to all tree models)identifies elevation and short-duration rainfall as dominant controlling factors.Furthermore,among the tested tree-based models,XGBoost optimized with semisupervised sampling demonstrates the highest reliability in debris flow susceptibility mapping(DFSM),achieving a comprehensive accuracy of 83.64%due to its optimal generalization-stability equilibrium.
基金Supported by Longyan City Science and Technology Plan Project,No.2024 LYF17067.
文摘BACKGROUND Female depression is a prevalent and increasingly recognized mental health issue.Due to cultural and social factors,many female patients still face challenges in diagnosis and treatment,and traditional assessment methods often fail to identify high-risk individuals accurately.This highlights the necessity of developing more precise predictive tools.Utilizing machine learning(ML)algorithms to construct predictive models may overcome the limitations of traditional methods,providing more comprehensive support for women’s mental health.AIM To construct an ML-nomogram hybrid model that translates multivariate risk predictors of female depressive symptoms into actionable clinical scoring thresholds,optimizing predictive accuracy and interpretability for healthcare applications.METHODS We analyzed data from 7609 female participants aged 18 to 85 years from the Guangdong Provincial Sleep and Psychosomatic Health Survey.Sixteen variables,including anxiety symptoms,insomnia,chronic diseases,exercise habits,and age,were selected based on prior literature and comprehensively incorporated into ML models to maximize predictive information utilization.Three ML algorithms,extreme gradient boosting,support vector machine,and light gradient boosting machine,were employed to construct predictive models.Model performance was evaluated using accuracy,precision,recall,F1 score,and area under the curve(AUC).Feature importance was interpreted using SHapley Additive exPlanations(SHAP),with ablation studies validating the impact of the top five SHAPderived features on predictive performance,and a nomogram was constructed based on these prioritized predictors.Clinical utility was assessed through decision curve analysis.RESULTS The prevalence of depressive symptoms was 6.8%among the sample.The evaluation of predictive models revealed that light gradient boosting machine achieved a top-performing AUC of 0.867,placing it ahead of extreme gradient boosting(AUC=0.862)and support vector machine(AUC=0.849).SHAP analysis identified insomnia,anxiety symptoms,age,chronic disease,and exercise as the top five predictors.The nomogram based on these features demonstrated excellent discrimination(AUC=0.910)and calibration,with significant net benefits in decision curve analysis compared to baseline strategies.The model effectively stratifies depressive symptoms risk,facilitating personalized and quantitative assessments in clinical settings.We also developed an interactive digital version of the nomogram to facilitate its application in clinical practice.CONCLUSION The ML-based model effectively predicts depressive symptoms in women,identifying insomnia,anxiety symptoms,age,chronic diseases,and exercise as key predictors,offering a practical tool for early detection and intervention.
基金supported by the Postgraduate Innovation Program of Chongqing University of Science and Technology(Grant No.YKJCX2420605)Research Foundation of Chongqing University of Science and Technology(Grant No.ckrc20241225)+1 种基金Opening Projects of State Key Laboratory of Solid Waste Reuse for Building Materials(Grant No.SWR-2021-005)Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJQN202401510)。
文摘Foam concrete is widely used in engineering due to its lightweight and high porosity.Its compressive strength,a key performance indicator,is influenced by multiple factors,showing nonlinear variation.As compressive strength tests for foam concrete take a long time,a fast and accurate prediction method is needed.In recent years,machine learning has become a powerful tool for predicting the compressive strength of cement-based materials.However,existing studies often use a limited number of input parameters,and the prediction accuracy of machine learning models under the influence of multiple parameters and nonlinearity remains unclear.This study selects foam concrete density,water-to-cement ratio(W/C),supplementary cementitious material replacement rate(SCM),fine aggregate to binder ratio(FA/Binder),superplasticizer content(SP),and age of the concrete(Age)as input parameters,with compressive strength as the output.Five different machine learning models were compared,and sensitivity analysis,based on Shapley Additive Explanations(SHAP),was used to assess the contribution of each input parameter.The results show that Gaussian Process Regression(GPR)outperforms the other models,with R2,RMSE,MAE,and MAPE values of 0.95,1.6,0.81,and 0.2,respectively.It is because GPR,optimized through Bayesian methods,better fits complex nonlinear relationships,especially considering a large number of input parameters.Sensitivity analysis indicates that the influence of input parameters on compressive strength decreases in the following order:foam concrete density,W/C,Age,FA/Binder,SP,and SCM.
文摘BACKGROUND Severe dengue children with critical complications have been attributed to high mortality rates,varying from approximately 1%to over 20%.To date,there is a lack of data on machine-learning-based algorithms for predicting the risk of inhospital mortality in children with dengue shock syndrome(DSS).AIM To develop machine-learning models to estimate the risk of death in hospitalized children with DSS.METHODS This single-center retrospective study was conducted at tertiary Children’s Hospital No.2 in Viet Nam,between 2013 and 2022.The primary outcome was the in-hospital mortality rate in children with DSS admitted to the pediatric intensive care unit(PICU).Nine significant features were predetermined for further analysis using machine learning models.An oversampling method was used to enhance the model performance.Supervised models,including logistic regression,Naïve Bayes,Random Forest(RF),K-nearest neighbors,Decision Tree and Extreme Gradient Boosting(XGBoost),were employed to develop predictive models.The Shapley Additive Explanation was used to determine the degree of contribution of the features.RESULTS In total,1278 PICU-admitted children with complete data were included in the analysis.The median patient age was 8.1 years(interquartile range:5.4-10.7).Thirty-nine patients(3%)died.The RF and XGboost models demonstrated the highest performance.The Shapley Addictive Explanations model revealed that the most important predictive features included younger age,female patients,presence of underlying diseases,severe transaminitis,severe bleeding,low platelet counts requiring platelet transfusion,elevated levels of international normalized ratio,blood lactate and serum creatinine,large volume of resuscitation fluid and a high vasoactive inotropic score(>30).CONCLUSION We developed robust machine learning-based models to estimate the risk of death in hospitalized children with DSS.The study findings are applicable to the design of management schemes to enhance survival outcomes of patients with DSS.
基金supported by the National Research Foundation of Korea(NRF)grant funded by the Korean government(MSIT)(No.RS-2023-00218176)the Soonchunhyang University Research Fund.
文摘Improving early diagnosis of autism spectrum disorder(ASD)in children increasingly relies on predictive models that are reliable and accessible to non-experts.This study aims to develop such models using Python-based tools to improve ASD diagnosis in clinical settings.We performed exploratory data analysis to ensure data quality and identify key patterns in pediatric ASD data.We selected the categorical boosting(CatBoost)algorithm to effectively handle the large number of categorical variables.We used the PyCaret automated machine learning(AutoML)tool to make the models user-friendly for clinicians without extensive machine learning expertise.In addition,we applied Shapley additive explanations(SHAP),an explainable artificial intelligence(XAI)technique,to improve the interpretability of the models.Models developed using CatBoost and other AI algorithms showed high accuracy in diagnosing ASD in children.SHAP provided clear insights into the influence of each variable on diagnostic outcomes,making model decisions transparent and understandable to healthcare professionals.By integrating robust machine learning methods with user-friendly tools such as PyCaret and leveraging XAI techniques such as SHAP,this study contributes to the development of reliable,interpretable,and accessible diagnostic tools for ASD.These advances hold great promise for supporting informed decision-making in clinical settings,ultimately improving early identification and intervention strategies for ASD in the pediatric population.However,the study is limited by the dataset’s demographic imbalance and the lack of external clinical validation,which should be addressed in future research.