BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surger...BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surgery combined with antiparasitic therapy is ideal,many patients present late,missing hepatectomy opportunities.Ex vivo liver resection and autotransplantation(ELRA)offers hope for such patients.Traditional surgical decision-making,relying on clinical experience,is prone to bias.Machine learning can enhance decision-making by identifying key factors influencing surgical choices.This study innovatively employs multiple machine learning methods by integrating various feature selection techniques and SHapley Additive exPlanations(SHAP)interpretive analysis to deeply explore the key decision factors influencing surgical strategies.AIM To determine the key preoperative factors influencing surgical decision-making in hepatic AE(HAE)using machine learning.METHODS This was a retrospective cohort study at the First Affiliated Hospital of Xinjiang Medical University(July 2010 to August 2024).There were 710 HAE patients(545 hepatectomy and 165 ELRA)with complete clinical data.Data included demographics,laboratory indicators,imaging,and pathology.Feature selection was performed using recursive feature elimination,minimum redundancy maximum relevance,and least absolute shrinkage and selection operator regression,with the intersection of these methods yielding 10 critical features.Eleven machinelearning algorithms were compared,with eXtreme Gradient Boosting(XGBoost)optimized using Bayesian optimization.Model interpretability was assessed using SHAP analysis.RESULTS The XGBoost model achieved an area under the curve of 0.935 in the training set and 0.734 in the validation set.The optimal threshold(0.28)yielded sensitivity of 93.6%and specificity of 90.9%.SHAP analysis identified type of vascular invasion as the most important feature,followed by platelet count and prothrombin time.Lesions invading the hepatic vein,inferior vena cava,or multiple vessels significantly increased the likelihood of ELRA.Calibration curves showed good agreement between predicted and observed probabilities(0.2-0.7 range).The model demonstrated high net clinical benefit in Decision Curve Analysis,with accuracy of 0.837,recall of 0.745,and F1 score of 0.788.CONCLUSION Vascular invasion is the dominant factor influencing the choice of surgical approach in HAE.Machine-learning models,particularly XGBoost,can provide transparent and data-driven support for personalized decision-making.展开更多
With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Neve...With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.展开更多
Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartph...Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.展开更多
Deep learning(DL)has become a crucial technique for predicting the El Niño-Southern Oscillation(ENSO)and evaluating its predictability.While various DL-based models have been developed for ENSO predictions,many f...Deep learning(DL)has become a crucial technique for predicting the El Niño-Southern Oscillation(ENSO)and evaluating its predictability.While various DL-based models have been developed for ENSO predictions,many fail to capture the coherent multivariate evolution within the coupled ocean-atmosphere system of the tropical Pacific.To address this three-dimensional(3D)limitation and represent ENSO-related ocean-atmosphere interactions more accurately,a novel this 3D multivariate prediction model was proposed based on a Transformer architecture,which incorporates a spatiotemporal self-attention mechanism.This model,named 3D-Geoformer,offers several advantages,enabling accurate ENSO predictions up to one and a half years in advance.Furthermore,an integrated gradient method was introduced into the model to identify the sources of predictability for sea surface temperature(SST)variability in the eastern equatorial Pacific.Results reveal that the 3D-Geoformer effectively captures ENSO-related precursors during the evolution of ENSO events,particularly the thermocline feedback processes and ocean temperature anomaly pathways on and off the equator.By extending DL-based ENSO predictions from one-dimensional Niño time series to 3D multivariate fields,the 3D-Geoformer represents a significant advancement in ENSO prediction.This study provides details in the model formulation,analysis procedures,sensitivity experiments,and illustrative examples,offering practical guidance for the application of the model in ENSO research.展开更多
In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroi...In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce overtreatment.However,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and transparency.This paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present state-of-the-artmodels.Our study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction models.In the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the dataset.The original dataset is used in trainingmachine learning models,and further used in generating SHAP values fromthesemodels.In the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based analysis.This new integrated dataset is used in re-training the machine learning models.The new SHAP values generated from these models help in validating the contributions of feature sets in predicting malignancy.The conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making systems.In this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the predictions.The study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of explainability.The proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area under the receiver operating characteristic(AUROC)are also higher than the baseline models.The results of the proposed model help us identify the dominant feature sets that impact thyroid cancer classification and prediction.The features{calcification}and{shape}consistently emerged as the top-ranked features associated with thyroid malignancy,in both association-rule based interestingnessmetric values and SHAPmethods.The paper highlights the potential of the rule-based integrated models with SHAP in bridging the gap between the machine learning predictions and the interpretability of this prediction which is required for real-world medical applications.展开更多
In a recent study published in Nature Medicine,Wang,Shao,and colleagues successfully addressed two critical issues of lung cancer(LC)screening with low-dose computed tomography(LDCT)whose widespread implementation,des...In a recent study published in Nature Medicine,Wang,Shao,and colleagues successfully addressed two critical issues of lung cancer(LC)screening with low-dose computed tomography(LDCT)whose widespread implementation,despite its capacity to decrease LC mortality,remains challenging:(1)the difficulty in accurately distinguishing malignant nodules from the far more common benign nodules detected on LDCT,and(2)the insufficient coverage of LC screening in resource-limited areas.1 To perform nodule risk stratification,Wang et al.developed and validated a multi-step,multidimensional artificial intelligence(AI)-based system(Fig.1)and introduced a data-driven Chinese Lung Nodules Reporting and Data System(C-Lung-RADS).1 A Lung-RADS system was developed in the US to stratify lung nodules into categories of increasing risk of LC and to provide corresponding management recommendations.展开更多
As the cornerstone for the safe operation of energy systems,short-term voltage stability(STVS)has been assessed effectively with the advance of artificial intelligence(AI).However,the black-box models of traditional A...As the cornerstone for the safe operation of energy systems,short-term voltage stability(STVS)has been assessed effectively with the advance of artificial intelligence(AI).However,the black-box models of traditional AI barely identify what the specific key factors in power systems are and how they influence STVS,thus providing limited practical information for engineers in on-site dispatch centers.Enlightened by the latest explainable artificial intelligence(XAI)techniques,this paper aims to unveil the mechanism underlying the complex STVS problem.First,the ground truth for STVS is established via qualitative analysis.Based on this,an explainability score is then devised to measure the trustworthiness of different XAI techniques,among which Local Interpretable Model-agnostic Explanations(LIME)exhibits the best performance in this study.Finally,a sequential approach is proposed to extend the local interpretation of LIME to a broader scope,which is applied to enhance STVS performance before a fault occurs in distribution system load shedding,serving as an example to demonstrate the application merits of the explored mechanism.Numerical results on a modified IEEE system demonstrate that this finding facilitates the identification of the most suitable XAI technique for STVS,while also providing an interpretable mechanism for the STVS,offering accessible guidance for stability-aware dispatch.展开更多
Background:Liver disease(LD)significantly impacts global health,requiring accurate diagnostic methods.This study aims to develop an automated system for LD prediction using machine learning(ML)and explainable artifici...Background:Liver disease(LD)significantly impacts global health,requiring accurate diagnostic methods.This study aims to develop an automated system for LD prediction using machine learning(ML)and explainable artificial intelligence(XAI),enhancing diagnostic precision and interpretability.Methods:This research systematically analyzes two distinct datasets encompassing liver health indicators.A combination of preprocessing techniques,including feature optimization methods such as Forward Feature Selection(FFS),Backward Feature Selection(BFS),and Recursive Feature Elimination(RFE),is applied to enhance data quality.After that,ML models,namely Support Vector Machines(SVM),Naive Bayes(NB),Random Forest(RF),K-nearest neighbors(KNN),Decision Trees(DT),and a novel Tree Selection and Stacking Ensemble-based RF(TSRF),are assessed in the dataset to diagnose LD.Finally,the ultimate model is selected based on incorporating cross-validation and evaluation through performance metrics like accuracy,precision,specificity,etc.,and efficient XAI methods express the ultimate model’s interoperability.Findings:The analysis reveals TSRF as the most effective model,achieving a peak accuracy of 99.92%on Dataset-1 without feature optimization and 88.88%on Dataset-2 with RFE optimization.XAI techniques,including SHAP and LIME plots,highlight key features influencing model predictions,providing insights into the reasoning behind classification outcomes.Interpretation:The findings highlight TSRF’s potential in improving LD diagnosis,using XAI to enhance transparency and trust in ML models.Despite high accuracy and interpretability,limitations such as dataset bias and lack of clinical validation remain.Future work focuses on integrating advanced XAI,diversifying datasets,and applying the approach in clinical settings for reliable diagnostics.展开更多
Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios suc...Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios such as financial analysis,traffic predictions,and drug discovery.Despite their great potential in benefiting humans in the real world,recent study shows that GNNs can leak private information,are vulnerable to adversarial attacks,can inherit and magnify societal bias from training data and lack inter-pretability,which have risk of causing unintentional harm to the users and society.For example,existing works demonstrate that at-tackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.GNNs trained on social networks may embed the discrimination in their decision process,strengthening the undesirable societal bias.Consequently,trust-worthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users'trust in GNNs.In this pa-per,we give a comprehensive survey of GNNs in the computational aspects of privacy,robustness,fairness,and explainability.For each aspect,we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs.We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthi-ness.展开更多
Electric Load Forecasting(ELF)is the central instrument for planning and controlling demand response programs,electricity trading,and consumption optimization.Due to the increasing automation of these processes,meanin...Electric Load Forecasting(ELF)is the central instrument for planning and controlling demand response programs,electricity trading,and consumption optimization.Due to the increasing automation of these processes,meaningful and transparent forecasts become more and more important.Still,at the same time,the complexity of the used machine learning models and architectures increases.Because there is an increasing interest in interpretable and explainable load forecasting methods,this work conducts a literature review to present already applied approaches regarding explainability and interpretability for load forecasts using Machine Learning.Based on extensive literature research covering eight publication portals,recurring modeling approaches,trends,and modeling techniques are identified and clustered by properties to achieve more interpretable and explainable load forecasts.The results on interpretability show an increase in the use of probabilistic models,methods for time series decomposition and the use of fuzzy logic in addition to classically interpretable models.Dominant explainable approaches are Feature Importance and Attention mechanisms.The discussion shows that a lot of knowledge from the related field of time series forecasting still needs to be adapted to the problems in ELF.Compared to other applications of explainable and interpretable methods such as clustering,there are currently relatively few research results,but with an increasing trend.展开更多
Advanced machine learning(ML)algorithms have outperformed traditional approaches in various forecasting applications,especially electricity price forecasting(EPF).However,the prediction accuracy of ML reduces substant...Advanced machine learning(ML)algorithms have outperformed traditional approaches in various forecasting applications,especially electricity price forecasting(EPF).However,the prediction accuracy of ML reduces substantially if the input data is not similar to the ones seen by the model during training.This is often observed in EPF problems when market dynamics change owing to a rise in fuel prices,an increase in renewable penetration,a change in operational policies,etc.While the dip in model accuracy for unseen data is a cause for concern,what is more,challenging is not knowing when the ML model would respond in such a manner.Such uncertainty makes the power market participants,like bidding agents and retailers,vulnerable to substantial financial loss caused by the prediction errors of EPF models.Therefore,it becomes essential to identify whether or not the model prediction at a given instance is trustworthy.In this light,this paper proposes a trust algorithm for EPF users based on explainable artificial intelligence techniques.The suggested algorithm generates trust scores that reflect the model’s prediction quality for each new input.These scores are formulated in two stages:in the first stage,the coarse version of the score is formed using correlations of local and global explanations,and in the second stage,the score is fine-tuned further by the Shapley additive explanations values of different features.Such score-based explanations are more straightforward than feature-based visual explanations for EPF users like asset managers and traders.A dataset from Italy’s and ERCOT’s electricity market validates the efficacy of the proposed algorithm.Results show that the algorithm has more than 85%accuracy in identifying good predictions when the data distribution is similar to the training dataset.In the case of distribution shift,the algorithm shows the same accuracy level in identifying bad predictions.展开更多
We conducted a study to evaluate the potential and robustness of gradient boosting algorithms in rock burst assessment,established a variational autoencoder(VAE)to address the imbalance rock burst dataset,and proposed...We conducted a study to evaluate the potential and robustness of gradient boosting algorithms in rock burst assessment,established a variational autoencoder(VAE)to address the imbalance rock burst dataset,and proposed a multilevel explainable artificial intelligence(XAI)tailored for tree-based ensemble learning.We collected 537 data from real-world rock burst records and selected four critical features contributing to rock burst occurrences.Initially,we employed data visualization to gain insight into the data’s structure and performed correlation analysis to explore the data distribution and feature relationships.Then,we set up a VAE model to generate samples for the minority class due to the imbalanced class distribution.In conjunction with the VAE,we compared and evaluated six state-of-theart ensemble models,including gradient boosting algorithms and the classical logistic regression model,for rock burst prediction.The results indicated that gradient boosting algorithms outperformed the classical single models,and the VAE-classifier outperformed the original classifier,with the VAE-NGBoost model yielding the most favorable results.Compared to other resampling methods combined with NGBoost for imbalanced datasets,such as synthetic minority oversampling technique(SMOTE),SMOTE-edited nearest neighbours(SMOTE-ENN),and SMOTE-tomek links(SMOTE-Tomek),the VAE-NGBoost model yielded the best performance.Finally,we developed a multilevel XAI model using feature sensitivity analysis,Tree Shapley Additive exPlanations(Tree SHAP),and Anchor to provide an in-depth exploration of the decision-making mechanics of VAE-NGBoost,further enhancing the accountability of treebased ensemble models in predicting rock burst occurrences.展开更多
As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in ...As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in digital payments.However,the task of anomaly detection in blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions.Although several studies have been conducted in the field,a limitation persists:the lack of explanations for the model’s predictions.This study seeks to overcome this limitation by integrating explainable artificial intelligence(XAI)techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions.The shapley additive explanation(SHAP)method is employed to measure the contribution of each feature,and it is compatible with ensemble models.Moreover,we present rules for interpreting whether a Bitcoin transaction is anomalous or not.Additionally,we introduce an under-sampling algorithm named XGBCLUS,designed to balance anomalous and non-anomalous transaction data.This algorithm is compared against other commonly used under-sampling and over-sampling techniques.Finally,the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers.Our experimental results demonstrate that:(i)XGBCLUS enhances true positive rate(TPR)and receiver operating characteristic-area under curve(ROC-AUC)scores compared to state-of-the-art under-sampling and over-sampling techniques,and(ii)our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy,TPR,and false positive rate(FPR)scores.展开更多
Cardiovascular disease(CVD)remains a leading global health challenge due to its high mortality rate and the complexity of early diagnosis,driven by risk factors such as hypertension,high cholesterol,and irregular puls...Cardiovascular disease(CVD)remains a leading global health challenge due to its high mortality rate and the complexity of early diagnosis,driven by risk factors such as hypertension,high cholesterol,and irregular pulse rates.Traditional diagnostic methods often struggle with the nuanced interplay of these risk factors,making early detection difficult.In this research,we propose a novel artificial intelligence-enabled(AI-enabled)framework for CVD risk prediction that integrates machine learning(ML)with eXplainable AI(XAI)to provide both high-accuracy predictions and transparent,interpretable insights.Compared to existing studies that typically focus on either optimizing ML performance or using XAI separately for local or global explanations,our approach uniquely combines both local and global interpretability using Local Interpretable Model-Agnostic Explanations(LIME)and SHapley Additive exPlanations(SHAP).This dual integration enhances the interpretability of the model and facilitates clinicians to comprehensively understand not just what the model predicts but also why those predictions are made by identifying the contribution of different risk factors,which is crucial for transparent and informed decision-making in healthcare.The framework uses ML techniques such as K-nearest neighbors(KNN),gradient boosting,random forest,and decision tree,trained on a cardiovascular dataset.Additionally,the integration of LIME and SHAP provides patient-specific insights alongside global trends,ensuring that clinicians receive comprehensive and actionable information.Our experimental results achieve 98%accuracy with the Random Forest model,with precision,recall,and F1-scores of 97%,98%,and 98%,respectively.The innovative combination of SHAP and LIME sets a new benchmark in CVD prediction by integrating advanced ML accuracy with robust interpretability,fills a critical gap in existing approaches.This framework paves the way for more explainable and transparent decision-making in healthcare,ensuring that the model is not only accurate but also trustworthy and actionable for clinicians.展开更多
Machine learning models are increasingly used to correct the vertical biases(mainly due to vegetation and buildings)in global Digital Elevation Models(DEMs),for downstream applications which need‘‘bare earth"el...Machine learning models are increasingly used to correct the vertical biases(mainly due to vegetation and buildings)in global Digital Elevation Models(DEMs),for downstream applications which need‘‘bare earth"elevations.The predictive accuracy of these models has improved significantly as more flexible model architectures are developed and new explanatory datasets produced,leading to the recent release of three model-corrected DEMs(FABDEM,DiluviumDEM and FathomDEM).However,there has been relatively little focus so far on explaining or interrogating these models,especially important in this context given their downstream impact on many other applications(including natural hazard simulations).In this study we train five separate models(by land cover environment)to correct vertical biases in the Copernicus DEM and then explain them using SHapley Additive exPlanation(SHAP)values.Comparing the models,we find significant variation in terms of the specific input variables selected and their relative importance,suggesting that an ensemble of models(specialising by land cover)is likely preferable to a general model applied everywhere.Visualising the patterns learned by the models(using SHAP dependence plots)provides further insights,building confidence in some cases(where patterns are consistent with domain knowledge and past studies)and highlighting potentially problematic variables in others(such as proxy relationships which may not apply in new application sites).Our results have implications for future DEM error prediction studies,particularly in evaluating a very wide range of potential input variables(160 candidates)drawn from topographic,multispectral,Synthetic Aperture Radar,vegetation,climate and urbanisation datasets.展开更多
Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adaptin...Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adapting to diverse industrial environments and ensuring the transparency and fairness of their predictions.This paper presents a novel predictive maintenance framework that integrates deep learning and optimization techniques while addressing key ethical considerations,such as transparency,fairness,and explainability,in artificial intelligence driven decision-making.The framework employs an Autoencoder for feature reduction,a Convolutional Neural Network for pattern recognition,and a Long Short-Term Memory network for temporal analysis.To enhance transparency,the decision-making process of the framework is made interpretable,allowing stakeholders to understand and trust the model’s predictions.Additionally,Particle Swarm Optimization is used to refine hyperparameters for optimal performance and mitigate potential biases in the model.Experiments are conducted on multiple datasets from different industrial scenarios,with performance validated using accuracy,precision,recall,F1-score,and training time metrics.The results demonstrate an impressive accuracy of up to 99.92%and 99.45%across different datasets,highlighting the framework’s effectiveness in enhancing predictive maintenance strategies.Furthermore,the model’s explainability ensures that the decisions can be audited for fairness and accountability,aligning with ethical standards for critical systems.By addressing transparency and reducing potential biases,this framework contributes to the responsible and trustworthy deployment of artificial intelligence in industrial environments,particularly in safety-critical applications.The results underscore its potential for wide application across various industrial contexts,enhancing both performance and ethical decision-making.展开更多
In this study,a machine learning-based predictive model was developed for the Musa petti Wind Farm in Sri Lanka to address the need for localized forecasting solutions.Using data on wind speed,air temperature,nacelle ...In this study,a machine learning-based predictive model was developed for the Musa petti Wind Farm in Sri Lanka to address the need for localized forecasting solutions.Using data on wind speed,air temperature,nacelle position,and actual power,lagged features were generated to capture temporal dependencies.Among 24 evaluated models,the ensemble bagging approach achieved the best performance,with R^(2) values of 0.89 at 0 min and 0.75 at 60 min.Shapley Additive exPlanations(SHAP)analysis revealed that while wind speed is the primary driver for short-term predictions,air temperature and nacelle position become more influential at longer forecasting horizons.These findings underscore the reliability of short-term predictions and the potential benefits of integrating hybrid AI and probabilistic models for extended forecasts.Our work contributes a robust and explainable framework to support Sri Lanka’s renewable energy transition,and future research will focus on real-time deployment and uncertainty quantification.展开更多
Research on the application of machine learning(ML)models to landslide susceptibility assessments has gained popularity in recent years,with a focus primarily on topographic factors derived from digital elevation mode...Research on the application of machine learning(ML)models to landslide susceptibility assessments has gained popularity in recent years,with a focus primarily on topographic factors derived from digital elevation models(DEMs).However,few studies have focused on the explanatory effects of these factors on different models,i.e.whether DEM-based factors affect different models in the same way.This study investigated whether different ML models could yield consistent interpretations of DEM-based factors using explanatory algorithms.Six ML models,including a support vector machine,a neural network,extreme gradient boosting,a random forest,linear regression,and K-nearest neighbors,were trained and evaluated on five geospatial datasets derived from different DEMs.Each dataset contained eight DEM-based and six non-DEM-based factors from 8912 landslide samples.Model performance was assessed using accuracy,precision,recall rate,F1-score,kappa coefficient,and receiver operating characteristic curves.Explanatory analyses,including Shapley additive explanations and partial dependence plots,were also employed to investigate the effects of topographic factors on landslide susceptibility.The results indicate that DEM-based factors consistently influenced different ML models across the datasets.Furthermore,tree-based models outperformed the other models in almost all datasets,while the most suitable DEMs were obtained from Copernicus and TanDEM-X.In addition,the concave surface without potholes on steep slopes are ideal topographic conditions for landslide formation in the study area.This study can benefit the wider landslide research community by clarifying how topographic factors affect ML models.展开更多
Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,y...Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,yet the key conditioning factors(e.g.,topography,hydrology)remain insufficiently understood.This study proposes a novel artificial intelligence(AI)framework that integrates four machine learning(ML)models with Shapley Additive Explanations(SHAP)method,offering a hierarchical perspective from global to local on the dominant factors controlling gully distribution in wildfireaffected areas.In a case study of Xiangjiao catchment burned on March 28,2020,in Muli County in Sichuan Province of Southwest China,we derived 21 geoenvironmental factors to assess the susceptibility of post-fire gully erosion using logistic regression(LR),support vector machine(SVM),random forest(RF),and convolutional neural network(CNN)models.SHAP-based model interpretation revealed eight key conditioning factors:topographic position index(TPI),topographic wetness index(TWI),distance to stream,mean annual precipitation,differenced normalized burn ratio(d NBR),land use/cover,soil type,and distance to road.Comparative model evaluation demonstrated that reduced-variable models incorporating these dominant factors achieved accuracy comparable to that of the initial-variable models,with AUC values exceeding 0.868 across all ML algorithms.These findings provide critical insights into gully erosion behavior in wildfire-affected areas,supporting the decision-making process behind environmental management and hazard mitigation.展开更多
The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF str...The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.展开更多
基金Supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region,No.2022D01D17State Key Laboratory of Pathogenesis,Prevention and Treatment of High Incidence Diseases in Central Asia,No.SKL-HIDCA-2024-2.
文摘BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surgery combined with antiparasitic therapy is ideal,many patients present late,missing hepatectomy opportunities.Ex vivo liver resection and autotransplantation(ELRA)offers hope for such patients.Traditional surgical decision-making,relying on clinical experience,is prone to bias.Machine learning can enhance decision-making by identifying key factors influencing surgical choices.This study innovatively employs multiple machine learning methods by integrating various feature selection techniques and SHapley Additive exPlanations(SHAP)interpretive analysis to deeply explore the key decision factors influencing surgical strategies.AIM To determine the key preoperative factors influencing surgical decision-making in hepatic AE(HAE)using machine learning.METHODS This was a retrospective cohort study at the First Affiliated Hospital of Xinjiang Medical University(July 2010 to August 2024).There were 710 HAE patients(545 hepatectomy and 165 ELRA)with complete clinical data.Data included demographics,laboratory indicators,imaging,and pathology.Feature selection was performed using recursive feature elimination,minimum redundancy maximum relevance,and least absolute shrinkage and selection operator regression,with the intersection of these methods yielding 10 critical features.Eleven machinelearning algorithms were compared,with eXtreme Gradient Boosting(XGBoost)optimized using Bayesian optimization.Model interpretability was assessed using SHAP analysis.RESULTS The XGBoost model achieved an area under the curve of 0.935 in the training set and 0.734 in the validation set.The optimal threshold(0.28)yielded sensitivity of 93.6%and specificity of 90.9%.SHAP analysis identified type of vascular invasion as the most important feature,followed by platelet count and prothrombin time.Lesions invading the hepatic vein,inferior vena cava,or multiple vessels significantly increased the likelihood of ELRA.Calibration curves showed good agreement between predicted and observed probabilities(0.2-0.7 range).The model demonstrated high net clinical benefit in Decision Curve Analysis,with accuracy of 0.837,recall of 0.745,and F1 score of 0.788.CONCLUSION Vascular invasion is the dominant factor influencing the choice of surgical approach in HAE.Machine-learning models,particularly XGBoost,can provide transparent and data-driven support for personalized decision-making.
文摘With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.
文摘Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.
基金Supported by the Laoshan Laboratory(No.LSKJ202202402)the National Natural Science Foundation of China(No.42030410)+2 种基金the Startup Foundation for Introducing Talent of Nanjing University of Information Science&Technology,and Jiangsu Innovation Research Group(No.JSSCTD 202346)supported by the China National Postdoctoral Program for Innovative Talents(No.BX20240169)the China Postdoctoral Science Foundation(No.2141062400101)。
文摘Deep learning(DL)has become a crucial technique for predicting the El Niño-Southern Oscillation(ENSO)and evaluating its predictability.While various DL-based models have been developed for ENSO predictions,many fail to capture the coherent multivariate evolution within the coupled ocean-atmosphere system of the tropical Pacific.To address this three-dimensional(3D)limitation and represent ENSO-related ocean-atmosphere interactions more accurately,a novel this 3D multivariate prediction model was proposed based on a Transformer architecture,which incorporates a spatiotemporal self-attention mechanism.This model,named 3D-Geoformer,offers several advantages,enabling accurate ENSO predictions up to one and a half years in advance.Furthermore,an integrated gradient method was introduced into the model to identify the sources of predictability for sea surface temperature(SST)variability in the eastern equatorial Pacific.Results reveal that the 3D-Geoformer effectively captures ENSO-related precursors during the evolution of ENSO events,particularly the thermocline feedback processes and ocean temperature anomaly pathways on and off the equator.By extending DL-based ENSO predictions from one-dimensional Niño time series to 3D multivariate fields,the 3D-Geoformer represents a significant advancement in ENSO prediction.This study provides details in the model formulation,analysis procedures,sensitivity experiments,and illustrative examples,offering practical guidance for the application of the model in ENSO research.
文摘In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce overtreatment.However,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and transparency.This paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present state-of-the-artmodels.Our study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction models.In the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the dataset.The original dataset is used in trainingmachine learning models,and further used in generating SHAP values fromthesemodels.In the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based analysis.This new integrated dataset is used in re-training the machine learning models.The new SHAP values generated from these models help in validating the contributions of feature sets in predicting malignancy.The conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making systems.In this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the predictions.The study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of explainability.The proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area under the receiver operating characteristic(AUROC)are also higher than the baseline models.The results of the proposed model help us identify the dominant feature sets that impact thyroid cancer classification and prediction.The features{calcification}and{shape}consistently emerged as the top-ranked features associated with thyroid malignancy,in both association-rule based interestingnessmetric values and SHAPmethods.The paper highlights the potential of the rule-based integrated models with SHAP in bridging the gap between the machine learning predictions and the interpretability of this prediction which is required for real-world medical applications.
基金funding from the European Union-NextGenerationEU through the Italian Ministry of University and Research under PNRR-M4C2-I1.3 Project PE_00000019“HEAL ITALIA”to Stefano Diciotti-CUP J33C22002920006.
文摘In a recent study published in Nature Medicine,Wang,Shao,and colleagues successfully addressed two critical issues of lung cancer(LC)screening with low-dose computed tomography(LDCT)whose widespread implementation,despite its capacity to decrease LC mortality,remains challenging:(1)the difficulty in accurately distinguishing malignant nodules from the far more common benign nodules detected on LDCT,and(2)the insufficient coverage of LC screening in resource-limited areas.1 To perform nodule risk stratification,Wang et al.developed and validated a multi-step,multidimensional artificial intelligence(AI)-based system(Fig.1)and introduced a data-driven Chinese Lung Nodules Reporting and Data System(C-Lung-RADS).1 A Lung-RADS system was developed in the US to stratify lung nodules into categories of increasing risk of LC and to provide corresponding management recommendations.
基金supported in part by the National Natural Science Foundation of China under Grant U23B6008in part by the Guangdong Basic and Applied Basic Research Foundation under Grants 2022A1515240075in part by the Italian Ministry of University and Research,Project NEST,Code PE0000021,CUP J33C22002890007.
文摘As the cornerstone for the safe operation of energy systems,short-term voltage stability(STVS)has been assessed effectively with the advance of artificial intelligence(AI).However,the black-box models of traditional AI barely identify what the specific key factors in power systems are and how they influence STVS,thus providing limited practical information for engineers in on-site dispatch centers.Enlightened by the latest explainable artificial intelligence(XAI)techniques,this paper aims to unveil the mechanism underlying the complex STVS problem.First,the ground truth for STVS is established via qualitative analysis.Based on this,an explainability score is then devised to measure the trustworthiness of different XAI techniques,among which Local Interpretable Model-agnostic Explanations(LIME)exhibits the best performance in this study.Finally,a sequential approach is proposed to extend the local interpretation of LIME to a broader scope,which is applied to enhance STVS performance before a fault occurs in distribution system load shedding,serving as an example to demonstrate the application merits of the explored mechanism.Numerical results on a modified IEEE system demonstrate that this finding facilitates the identification of the most suitable XAI technique for STVS,while also providing an interpretable mechanism for the STVS,offering accessible guidance for stability-aware dispatch.
文摘Background:Liver disease(LD)significantly impacts global health,requiring accurate diagnostic methods.This study aims to develop an automated system for LD prediction using machine learning(ML)and explainable artificial intelligence(XAI),enhancing diagnostic precision and interpretability.Methods:This research systematically analyzes two distinct datasets encompassing liver health indicators.A combination of preprocessing techniques,including feature optimization methods such as Forward Feature Selection(FFS),Backward Feature Selection(BFS),and Recursive Feature Elimination(RFE),is applied to enhance data quality.After that,ML models,namely Support Vector Machines(SVM),Naive Bayes(NB),Random Forest(RF),K-nearest neighbors(KNN),Decision Trees(DT),and a novel Tree Selection and Stacking Ensemble-based RF(TSRF),are assessed in the dataset to diagnose LD.Finally,the ultimate model is selected based on incorporating cross-validation and evaluation through performance metrics like accuracy,precision,specificity,etc.,and efficient XAI methods express the ultimate model’s interoperability.Findings:The analysis reveals TSRF as the most effective model,achieving a peak accuracy of 99.92%on Dataset-1 without feature optimization and 88.88%on Dataset-2 with RFE optimization.XAI techniques,including SHAP and LIME plots,highlight key features influencing model predictions,providing insights into the reasoning behind classification outcomes.Interpretation:The findings highlight TSRF’s potential in improving LD diagnosis,using XAI to enhance transparency and trust in ML models.Despite high accuracy and interpretability,limitations such as dataset bias and lack of clinical validation remain.Future work focuses on integrating advanced XAI,diversifying datasets,and applying the approach in clinical settings for reliable diagnostics.
基金National Science Foundation(NSF),USA(No.IIS-1909702)Army Research Office(ARO),USA(No.W911NF21-1-0198)Department of Homeland Security(DNS)CINA,USA(No.E205949D).
文摘Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios such as financial analysis,traffic predictions,and drug discovery.Despite their great potential in benefiting humans in the real world,recent study shows that GNNs can leak private information,are vulnerable to adversarial attacks,can inherit and magnify societal bias from training data and lack inter-pretability,which have risk of causing unintentional harm to the users and society.For example,existing works demonstrate that at-tackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.GNNs trained on social networks may embed the discrimination in their decision process,strengthening the undesirable societal bias.Consequently,trust-worthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users'trust in GNNs.In this pa-per,we give a comprehensive survey of GNNs in the computational aspects of privacy,robustness,fairness,and explainability.For each aspect,we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs.We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthi-ness.
基金supported by the German Federal Ministry of Economic Affairs and Climate Action(BMWK)through the project“FlexGUIde”(grant number 03EI6065D).
文摘Electric Load Forecasting(ELF)is the central instrument for planning and controlling demand response programs,electricity trading,and consumption optimization.Due to the increasing automation of these processes,meaningful and transparent forecasts become more and more important.Still,at the same time,the complexity of the used machine learning models and architectures increases.Because there is an increasing interest in interpretable and explainable load forecasting methods,this work conducts a literature review to present already applied approaches regarding explainability and interpretability for load forecasts using Machine Learning.Based on extensive literature research covering eight publication portals,recurring modeling approaches,trends,and modeling techniques are identified and clustered by properties to achieve more interpretable and explainable load forecasts.The results on interpretability show an increase in the use of probabilistic models,methods for time series decomposition and the use of fuzzy logic in addition to classically interpretable models.Dominant explainable approaches are Feature Importance and Attention mechanisms.The discussion shows that a lot of knowledge from the related field of time series forecasting still needs to be adapted to the problems in ELF.Compared to other applications of explainable and interpretable methods such as clustering,there are currently relatively few research results,but with an increasing trend.
文摘Advanced machine learning(ML)algorithms have outperformed traditional approaches in various forecasting applications,especially electricity price forecasting(EPF).However,the prediction accuracy of ML reduces substantially if the input data is not similar to the ones seen by the model during training.This is often observed in EPF problems when market dynamics change owing to a rise in fuel prices,an increase in renewable penetration,a change in operational policies,etc.While the dip in model accuracy for unseen data is a cause for concern,what is more,challenging is not knowing when the ML model would respond in such a manner.Such uncertainty makes the power market participants,like bidding agents and retailers,vulnerable to substantial financial loss caused by the prediction errors of EPF models.Therefore,it becomes essential to identify whether or not the model prediction at a given instance is trustworthy.In this light,this paper proposes a trust algorithm for EPF users based on explainable artificial intelligence techniques.The suggested algorithm generates trust scores that reflect the model’s prediction quality for each new input.These scores are formulated in two stages:in the first stage,the coarse version of the score is formed using correlations of local and global explanations,and in the second stage,the score is fine-tuned further by the Shapley additive explanations values of different features.Such score-based explanations are more straightforward than feature-based visual explanations for EPF users like asset managers and traders.A dataset from Italy’s and ERCOT’s electricity market validates the efficacy of the proposed algorithm.Results show that the algorithm has more than 85%accuracy in identifying good predictions when the data distribution is similar to the training dataset.In the case of distribution shift,the algorithm shows the same accuracy level in identifying bad predictions.
基金supported by the National Natural Science Foundation of China(Grant Nos.42107214 and 52130905).
文摘We conducted a study to evaluate the potential and robustness of gradient boosting algorithms in rock burst assessment,established a variational autoencoder(VAE)to address the imbalance rock burst dataset,and proposed a multilevel explainable artificial intelligence(XAI)tailored for tree-based ensemble learning.We collected 537 data from real-world rock burst records and selected four critical features contributing to rock burst occurrences.Initially,we employed data visualization to gain insight into the data’s structure and performed correlation analysis to explore the data distribution and feature relationships.Then,we set up a VAE model to generate samples for the minority class due to the imbalanced class distribution.In conjunction with the VAE,we compared and evaluated six state-of-theart ensemble models,including gradient boosting algorithms and the classical logistic regression model,for rock burst prediction.The results indicated that gradient boosting algorithms outperformed the classical single models,and the VAE-classifier outperformed the original classifier,with the VAE-NGBoost model yielding the most favorable results.Compared to other resampling methods combined with NGBoost for imbalanced datasets,such as synthetic minority oversampling technique(SMOTE),SMOTE-edited nearest neighbours(SMOTE-ENN),and SMOTE-tomek links(SMOTE-Tomek),the VAE-NGBoost model yielded the best performance.Finally,we developed a multilevel XAI model using feature sensitivity analysis,Tree Shapley Additive exPlanations(Tree SHAP),and Anchor to provide an in-depth exploration of the decision-making mechanics of VAE-NGBoost,further enhancing the accountability of treebased ensemble models in predicting rock burst occurrences.
文摘As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in digital payments.However,the task of anomaly detection in blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions.Although several studies have been conducted in the field,a limitation persists:the lack of explanations for the model’s predictions.This study seeks to overcome this limitation by integrating explainable artificial intelligence(XAI)techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions.The shapley additive explanation(SHAP)method is employed to measure the contribution of each feature,and it is compatible with ensemble models.Moreover,we present rules for interpreting whether a Bitcoin transaction is anomalous or not.Additionally,we introduce an under-sampling algorithm named XGBCLUS,designed to balance anomalous and non-anomalous transaction data.This algorithm is compared against other commonly used under-sampling and over-sampling techniques.Finally,the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers.Our experimental results demonstrate that:(i)XGBCLUS enhances true positive rate(TPR)and receiver operating characteristic-area under curve(ROC-AUC)scores compared to state-of-the-art under-sampling and over-sampling techniques,and(ii)our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy,TPR,and false positive rate(FPR)scores.
基金funded by Researchers Supporting Project Number(RSPD2025R947),King Saud University,Riyadh,Saudi Arabia.
文摘Cardiovascular disease(CVD)remains a leading global health challenge due to its high mortality rate and the complexity of early diagnosis,driven by risk factors such as hypertension,high cholesterol,and irregular pulse rates.Traditional diagnostic methods often struggle with the nuanced interplay of these risk factors,making early detection difficult.In this research,we propose a novel artificial intelligence-enabled(AI-enabled)framework for CVD risk prediction that integrates machine learning(ML)with eXplainable AI(XAI)to provide both high-accuracy predictions and transparent,interpretable insights.Compared to existing studies that typically focus on either optimizing ML performance or using XAI separately for local or global explanations,our approach uniquely combines both local and global interpretability using Local Interpretable Model-Agnostic Explanations(LIME)and SHapley Additive exPlanations(SHAP).This dual integration enhances the interpretability of the model and facilitates clinicians to comprehensively understand not just what the model predicts but also why those predictions are made by identifying the contribution of different risk factors,which is crucial for transparent and informed decision-making in healthcare.The framework uses ML techniques such as K-nearest neighbors(KNN),gradient boosting,random forest,and decision tree,trained on a cardiovascular dataset.Additionally,the integration of LIME and SHAP provides patient-specific insights alongside global trends,ensuring that clinicians receive comprehensive and actionable information.Our experimental results achieve 98%accuracy with the Random Forest model,with precision,recall,and F1-scores of 97%,98%,and 98%,respectively.The innovative combination of SHAP and LIME sets a new benchmark in CVD prediction by integrating advanced ML accuracy with robust interpretability,fills a critical gap in existing approaches.This framework paves the way for more explainable and transparent decision-making in healthcare,ensuring that the model is not only accurate but also trustworthy and actionable for clinicians.
文摘Machine learning models are increasingly used to correct the vertical biases(mainly due to vegetation and buildings)in global Digital Elevation Models(DEMs),for downstream applications which need‘‘bare earth"elevations.The predictive accuracy of these models has improved significantly as more flexible model architectures are developed and new explanatory datasets produced,leading to the recent release of three model-corrected DEMs(FABDEM,DiluviumDEM and FathomDEM).However,there has been relatively little focus so far on explaining or interrogating these models,especially important in this context given their downstream impact on many other applications(including natural hazard simulations).In this study we train five separate models(by land cover environment)to correct vertical biases in the Copernicus DEM and then explain them using SHapley Additive exPlanation(SHAP)values.Comparing the models,we find significant variation in terms of the specific input variables selected and their relative importance,suggesting that an ensemble of models(specialising by land cover)is likely preferable to a general model applied everywhere.Visualising the patterns learned by the models(using SHAP dependence plots)provides further insights,building confidence in some cases(where patterns are consistent with domain knowledge and past studies)and highlighting potentially problematic variables in others(such as proxy relationships which may not apply in new application sites).Our results have implications for future DEM error prediction studies,particularly in evaluating a very wide range of potential input variables(160 candidates)drawn from topographic,multispectral,Synthetic Aperture Radar,vegetation,climate and urbanisation datasets.
文摘Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adapting to diverse industrial environments and ensuring the transparency and fairness of their predictions.This paper presents a novel predictive maintenance framework that integrates deep learning and optimization techniques while addressing key ethical considerations,such as transparency,fairness,and explainability,in artificial intelligence driven decision-making.The framework employs an Autoencoder for feature reduction,a Convolutional Neural Network for pattern recognition,and a Long Short-Term Memory network for temporal analysis.To enhance transparency,the decision-making process of the framework is made interpretable,allowing stakeholders to understand and trust the model’s predictions.Additionally,Particle Swarm Optimization is used to refine hyperparameters for optimal performance and mitigate potential biases in the model.Experiments are conducted on multiple datasets from different industrial scenarios,with performance validated using accuracy,precision,recall,F1-score,and training time metrics.The results demonstrate an impressive accuracy of up to 99.92%and 99.45%across different datasets,highlighting the framework’s effectiveness in enhancing predictive maintenance strategies.Furthermore,the model’s explainability ensures that the decisions can be audited for fairness and accountability,aligning with ethical standards for critical systems.By addressing transparency and reducing potential biases,this framework contributes to the responsible and trustworthy deployment of artificial intelligence in industrial environments,particularly in safety-critical applications.The results underscore its potential for wide application across various industrial contexts,enhancing both performance and ethical decision-making.
文摘In this study,a machine learning-based predictive model was developed for the Musa petti Wind Farm in Sri Lanka to address the need for localized forecasting solutions.Using data on wind speed,air temperature,nacelle position,and actual power,lagged features were generated to capture temporal dependencies.Among 24 evaluated models,the ensemble bagging approach achieved the best performance,with R^(2) values of 0.89 at 0 min and 0.75 at 60 min.Shapley Additive exPlanations(SHAP)analysis revealed that while wind speed is the primary driver for short-term predictions,air temperature and nacelle position become more influential at longer forecasting horizons.These findings underscore the reliability of short-term predictions and the potential benefits of integrating hybrid AI and probabilistic models for extended forecasts.Our work contributes a robust and explainable framework to support Sri Lanka’s renewable energy transition,and future research will focus on real-time deployment and uncertainty quantification.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFC3003205)the Chengdu University of Technology Postgraduate Innovative Cultivation Program(Grant No.10800-000510-01-022)+1 种基金the Sichuan Science and Technology Program(Grant No.2025ZNSFSC1206)the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection Independent Research Project(Grant No.SKLGP2023Z026).
文摘Research on the application of machine learning(ML)models to landslide susceptibility assessments has gained popularity in recent years,with a focus primarily on topographic factors derived from digital elevation models(DEMs).However,few studies have focused on the explanatory effects of these factors on different models,i.e.whether DEM-based factors affect different models in the same way.This study investigated whether different ML models could yield consistent interpretations of DEM-based factors using explanatory algorithms.Six ML models,including a support vector machine,a neural network,extreme gradient boosting,a random forest,linear regression,and K-nearest neighbors,were trained and evaluated on five geospatial datasets derived from different DEMs.Each dataset contained eight DEM-based and six non-DEM-based factors from 8912 landslide samples.Model performance was assessed using accuracy,precision,recall rate,F1-score,kappa coefficient,and receiver operating characteristic curves.Explanatory analyses,including Shapley additive explanations and partial dependence plots,were also employed to investigate the effects of topographic factors on landslide susceptibility.The results indicate that DEM-based factors consistently influenced different ML models across the datasets.Furthermore,tree-based models outperformed the other models in almost all datasets,while the most suitable DEMs were obtained from Copernicus and TanDEM-X.In addition,the concave surface without potholes on steep slopes are ideal topographic conditions for landslide formation in the study area.This study can benefit the wider landslide research community by clarifying how topographic factors affect ML models.
基金the National Natural Science Foundation of China(42377170,42407212)the National Funded Postdoctoral Researcher Program(GZB20230606)+3 种基金the Postdoctoral Research Foundation of China(2024M752679)the Sichuan Natural Science Foundation(2025ZNSFSC1205)the National Key R&D Program of China(2022YFC3005704)the Sichuan Province Science and Technology Support Program(2024NSFSC0100)。
文摘Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,yet the key conditioning factors(e.g.,topography,hydrology)remain insufficiently understood.This study proposes a novel artificial intelligence(AI)framework that integrates four machine learning(ML)models with Shapley Additive Explanations(SHAP)method,offering a hierarchical perspective from global to local on the dominant factors controlling gully distribution in wildfireaffected areas.In a case study of Xiangjiao catchment burned on March 28,2020,in Muli County in Sichuan Province of Southwest China,we derived 21 geoenvironmental factors to assess the susceptibility of post-fire gully erosion using logistic regression(LR),support vector machine(SVM),random forest(RF),and convolutional neural network(CNN)models.SHAP-based model interpretation revealed eight key conditioning factors:topographic position index(TPI),topographic wetness index(TWI),distance to stream,mean annual precipitation,differenced normalized burn ratio(d NBR),land use/cover,soil type,and distance to road.Comparative model evaluation demonstrated that reduced-variable models incorporating these dominant factors achieved accuracy comparable to that of the initial-variable models,with AUC values exceeding 0.868 across all ML algorithms.These findings provide critical insights into gully erosion behavior in wildfire-affected areas,supporting the decision-making process behind environmental management and hazard mitigation.
基金financial support from the National Key Research and Development Program of China(2021YFB 3501501)the National Natural Science Foundation of China(No.22225803,22038001,22108007 and 22278011)+1 种基金Beijing Natural Science Foundation(No.Z230023)Beijing Science and Technology Commission(No.Z211100004321001).
文摘The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.