With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Neve...With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.展开更多
In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroi...In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce overtreatment.However,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and transparency.This paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present state-of-the-artmodels.Our study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction models.In the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the dataset.The original dataset is used in trainingmachine learning models,and further used in generating SHAP values fromthesemodels.In the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based analysis.This new integrated dataset is used in re-training the machine learning models.The new SHAP values generated from these models help in validating the contributions of feature sets in predicting malignancy.The conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making systems.In this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the predictions.The study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of explainability.The proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area under the receiver operating characteristic(AUROC)are also higher than the baseline models.The results of the proposed model help us identify the dominant feature sets that impact thyroid cancer classification and prediction.The features{calcification}and{shape}consistently emerged as the top-ranked features associated with thyroid malignancy,in both association-rule based interestingnessmetric values and SHAPmethods.The paper highlights the potential of the rule-based integrated models with SHAP in bridging the gap between the machine learning predictions and the interpretability of this prediction which is required for real-world medical applications.展开更多
In a recent study published in Nature Medicine,Wang,Shao,and colleagues successfully addressed two critical issues of lung cancer(LC)screening with low-dose computed tomography(LDCT)whose widespread implementation,des...In a recent study published in Nature Medicine,Wang,Shao,and colleagues successfully addressed two critical issues of lung cancer(LC)screening with low-dose computed tomography(LDCT)whose widespread implementation,despite its capacity to decrease LC mortality,remains challenging:(1)the difficulty in accurately distinguishing malignant nodules from the far more common benign nodules detected on LDCT,and(2)the insufficient coverage of LC screening in resource-limited areas.1 To perform nodule risk stratification,Wang et al.developed and validated a multi-step,multidimensional artificial intelligence(AI)-based system(Fig.1)and introduced a data-driven Chinese Lung Nodules Reporting and Data System(C-Lung-RADS).1 A Lung-RADS system was developed in the US to stratify lung nodules into categories of increasing risk of LC and to provide corresponding management recommendations.展开更多
Background:Liver disease(LD)significantly impacts global health,requiring accurate diagnostic methods.This study aims to develop an automated system for LD prediction using machine learning(ML)and explainable artifici...Background:Liver disease(LD)significantly impacts global health,requiring accurate diagnostic methods.This study aims to develop an automated system for LD prediction using machine learning(ML)and explainable artificial intelligence(XAI),enhancing diagnostic precision and interpretability.Methods:This research systematically analyzes two distinct datasets encompassing liver health indicators.A combination of preprocessing techniques,including feature optimization methods such as Forward Feature Selection(FFS),Backward Feature Selection(BFS),and Recursive Feature Elimination(RFE),is applied to enhance data quality.After that,ML models,namely Support Vector Machines(SVM),Naive Bayes(NB),Random Forest(RF),K-nearest neighbors(KNN),Decision Trees(DT),and a novel Tree Selection and Stacking Ensemble-based RF(TSRF),are assessed in the dataset to diagnose LD.Finally,the ultimate model is selected based on incorporating cross-validation and evaluation through performance metrics like accuracy,precision,specificity,etc.,and efficient XAI methods express the ultimate model’s interoperability.Findings:The analysis reveals TSRF as the most effective model,achieving a peak accuracy of 99.92%on Dataset-1 without feature optimization and 88.88%on Dataset-2 with RFE optimization.XAI techniques,including SHAP and LIME plots,highlight key features influencing model predictions,providing insights into the reasoning behind classification outcomes.Interpretation:The findings highlight TSRF’s potential in improving LD diagnosis,using XAI to enhance transparency and trust in ML models.Despite high accuracy and interpretability,limitations such as dataset bias and lack of clinical validation remain.Future work focuses on integrating advanced XAI,diversifying datasets,and applying the approach in clinical settings for reliable diagnostics.展开更多
BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surger...BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surgery combined with antiparasitic therapy is ideal,many patients present late,missing hepatectomy opportunities.Ex vivo liver resection and autotransplantation(ELRA)offers hope for such patients.Traditional surgical decision-making,relying on clinical experience,is prone to bias.Machine learning can enhance decision-making by identifying key factors influencing surgical choices.This study innovatively employs multiple machine learning methods by integrating various feature selection techniques and SHapley Additive exPlanations(SHAP)interpretive analysis to deeply explore the key decision factors influencing surgical strategies.AIM To determine the key preoperative factors influencing surgical decision-making in hepatic AE(HAE)using machine learning.METHODS This was a retrospective cohort study at the First Affiliated Hospital of Xinjiang Medical University(July 2010 to August 2024).There were 710 HAE patients(545 hepatectomy and 165 ELRA)with complete clinical data.Data included demographics,laboratory indicators,imaging,and pathology.Feature selection was performed using recursive feature elimination,minimum redundancy maximum relevance,and least absolute shrinkage and selection operator regression,with the intersection of these methods yielding 10 critical features.Eleven machinelearning algorithms were compared,with eXtreme Gradient Boosting(XGBoost)optimized using Bayesian optimization.Model interpretability was assessed using SHAP analysis.RESULTS The XGBoost model achieved an area under the curve of 0.935 in the training set and 0.734 in the validation set.The optimal threshold(0.28)yielded sensitivity of 93.6%and specificity of 90.9%.SHAP analysis identified type of vascular invasion as the most important feature,followed by platelet count and prothrombin time.Lesions invading the hepatic vein,inferior vena cava,or multiple vessels significantly increased the likelihood of ELRA.Calibration curves showed good agreement between predicted and observed probabilities(0.2-0.7 range).The model demonstrated high net clinical benefit in Decision Curve Analysis,with accuracy of 0.837,recall of 0.745,and F1 score of 0.788.CONCLUSION Vascular invasion is the dominant factor influencing the choice of surgical approach in HAE.Machine-learning models,particularly XGBoost,can provide transparent and data-driven support for personalized decision-making.展开更多
Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios suc...Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios such as financial analysis,traffic predictions,and drug discovery.Despite their great potential in benefiting humans in the real world,recent study shows that GNNs can leak private information,are vulnerable to adversarial attacks,can inherit and magnify societal bias from training data and lack inter-pretability,which have risk of causing unintentional harm to the users and society.For example,existing works demonstrate that at-tackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.GNNs trained on social networks may embed the discrimination in their decision process,strengthening the undesirable societal bias.Consequently,trust-worthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users'trust in GNNs.In this pa-per,we give a comprehensive survey of GNNs in the computational aspects of privacy,robustness,fairness,and explainability.For each aspect,we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs.We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthi-ness.展开更多
Electric Load Forecasting(ELF)is the central instrument for planning and controlling demand response programs,electricity trading,and consumption optimization.Due to the increasing automation of these processes,meanin...Electric Load Forecasting(ELF)is the central instrument for planning and controlling demand response programs,electricity trading,and consumption optimization.Due to the increasing automation of these processes,meaningful and transparent forecasts become more and more important.Still,at the same time,the complexity of the used machine learning models and architectures increases.Because there is an increasing interest in interpretable and explainable load forecasting methods,this work conducts a literature review to present already applied approaches regarding explainability and interpretability for load forecasts using Machine Learning.Based on extensive literature research covering eight publication portals,recurring modeling approaches,trends,and modeling techniques are identified and clustered by properties to achieve more interpretable and explainable load forecasts.The results on interpretability show an increase in the use of probabilistic models,methods for time series decomposition and the use of fuzzy logic in addition to classically interpretable models.Dominant explainable approaches are Feature Importance and Attention mechanisms.The discussion shows that a lot of knowledge from the related field of time series forecasting still needs to be adapted to the problems in ELF.Compared to other applications of explainable and interpretable methods such as clustering,there are currently relatively few research results,but with an increasing trend.展开更多
Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartph...Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.展开更多
We conducted a study to evaluate the potential and robustness of gradient boosting algorithms in rock burst assessment,established a variational autoencoder(VAE)to address the imbalance rock burst dataset,and proposed...We conducted a study to evaluate the potential and robustness of gradient boosting algorithms in rock burst assessment,established a variational autoencoder(VAE)to address the imbalance rock burst dataset,and proposed a multilevel explainable artificial intelligence(XAI)tailored for tree-based ensemble learning.We collected 537 data from real-world rock burst records and selected four critical features contributing to rock burst occurrences.Initially,we employed data visualization to gain insight into the data’s structure and performed correlation analysis to explore the data distribution and feature relationships.Then,we set up a VAE model to generate samples for the minority class due to the imbalanced class distribution.In conjunction with the VAE,we compared and evaluated six state-of-theart ensemble models,including gradient boosting algorithms and the classical logistic regression model,for rock burst prediction.The results indicated that gradient boosting algorithms outperformed the classical single models,and the VAE-classifier outperformed the original classifier,with the VAE-NGBoost model yielding the most favorable results.Compared to other resampling methods combined with NGBoost for imbalanced datasets,such as synthetic minority oversampling technique(SMOTE),SMOTE-edited nearest neighbours(SMOTE-ENN),and SMOTE-tomek links(SMOTE-Tomek),the VAE-NGBoost model yielded the best performance.Finally,we developed a multilevel XAI model using feature sensitivity analysis,Tree Shapley Additive exPlanations(Tree SHAP),and Anchor to provide an in-depth exploration of the decision-making mechanics of VAE-NGBoost,further enhancing the accountability of treebased ensemble models in predicting rock burst occurrences.展开更多
As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in ...As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in digital payments.However,the task of anomaly detection in blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions.Although several studies have been conducted in the field,a limitation persists:the lack of explanations for the model’s predictions.This study seeks to overcome this limitation by integrating explainable artificial intelligence(XAI)techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions.The shapley additive explanation(SHAP)method is employed to measure the contribution of each feature,and it is compatible with ensemble models.Moreover,we present rules for interpreting whether a Bitcoin transaction is anomalous or not.Additionally,we introduce an under-sampling algorithm named XGBCLUS,designed to balance anomalous and non-anomalous transaction data.This algorithm is compared against other commonly used under-sampling and over-sampling techniques.Finally,the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers.Our experimental results demonstrate that:(i)XGBCLUS enhances true positive rate(TPR)and receiver operating characteristic-area under curve(ROC-AUC)scores compared to state-of-the-art under-sampling and over-sampling techniques,and(ii)our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy,TPR,and false positive rate(FPR)scores.展开更多
Cardiovascular disease(CVD)remains a leading global health challenge due to its high mortality rate and the complexity of early diagnosis,driven by risk factors such as hypertension,high cholesterol,and irregular puls...Cardiovascular disease(CVD)remains a leading global health challenge due to its high mortality rate and the complexity of early diagnosis,driven by risk factors such as hypertension,high cholesterol,and irregular pulse rates.Traditional diagnostic methods often struggle with the nuanced interplay of these risk factors,making early detection difficult.In this research,we propose a novel artificial intelligence-enabled(AI-enabled)framework for CVD risk prediction that integrates machine learning(ML)with eXplainable AI(XAI)to provide both high-accuracy predictions and transparent,interpretable insights.Compared to existing studies that typically focus on either optimizing ML performance or using XAI separately for local or global explanations,our approach uniquely combines both local and global interpretability using Local Interpretable Model-Agnostic Explanations(LIME)and SHapley Additive exPlanations(SHAP).This dual integration enhances the interpretability of the model and facilitates clinicians to comprehensively understand not just what the model predicts but also why those predictions are made by identifying the contribution of different risk factors,which is crucial for transparent and informed decision-making in healthcare.The framework uses ML techniques such as K-nearest neighbors(KNN),gradient boosting,random forest,and decision tree,trained on a cardiovascular dataset.Additionally,the integration of LIME and SHAP provides patient-specific insights alongside global trends,ensuring that clinicians receive comprehensive and actionable information.Our experimental results achieve 98%accuracy with the Random Forest model,with precision,recall,and F1-scores of 97%,98%,and 98%,respectively.The innovative combination of SHAP and LIME sets a new benchmark in CVD prediction by integrating advanced ML accuracy with robust interpretability,fills a critical gap in existing approaches.This framework paves the way for more explainable and transparent decision-making in healthcare,ensuring that the model is not only accurate but also trustworthy and actionable for clinicians.展开更多
Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adaptin...Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adapting to diverse industrial environments and ensuring the transparency and fairness of their predictions.This paper presents a novel predictive maintenance framework that integrates deep learning and optimization techniques while addressing key ethical considerations,such as transparency,fairness,and explainability,in artificial intelligence driven decision-making.The framework employs an Autoencoder for feature reduction,a Convolutional Neural Network for pattern recognition,and a Long Short-Term Memory network for temporal analysis.To enhance transparency,the decision-making process of the framework is made interpretable,allowing stakeholders to understand and trust the model’s predictions.Additionally,Particle Swarm Optimization is used to refine hyperparameters for optimal performance and mitigate potential biases in the model.Experiments are conducted on multiple datasets from different industrial scenarios,with performance validated using accuracy,precision,recall,F1-score,and training time metrics.The results demonstrate an impressive accuracy of up to 99.92%and 99.45%across different datasets,highlighting the framework’s effectiveness in enhancing predictive maintenance strategies.Furthermore,the model’s explainability ensures that the decisions can be audited for fairness and accountability,aligning with ethical standards for critical systems.By addressing transparency and reducing potential biases,this framework contributes to the responsible and trustworthy deployment of artificial intelligence in industrial environments,particularly in safety-critical applications.The results underscore its potential for wide application across various industrial contexts,enhancing both performance and ethical decision-making.展开更多
In this study,a machine learning-based predictive model was developed for the Musa petti Wind Farm in Sri Lanka to address the need for localized forecasting solutions.Using data on wind speed,air temperature,nacelle ...In this study,a machine learning-based predictive model was developed for the Musa petti Wind Farm in Sri Lanka to address the need for localized forecasting solutions.Using data on wind speed,air temperature,nacelle position,and actual power,lagged features were generated to capture temporal dependencies.Among 24 evaluated models,the ensemble bagging approach achieved the best performance,with R^(2) values of 0.89 at 0 min and 0.75 at 60 min.Shapley Additive exPlanations(SHAP)analysis revealed that while wind speed is the primary driver for short-term predictions,air temperature and nacelle position become more influential at longer forecasting horizons.These findings underscore the reliability of short-term predictions and the potential benefits of integrating hybrid AI and probabilistic models for extended forecasts.Our work contributes a robust and explainable framework to support Sri Lanka’s renewable energy transition,and future research will focus on real-time deployment and uncertainty quantification.展开更多
The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF str...The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.展开更多
The increasing use of cloud-based devices has reached the critical point of cybersecurity and unwanted network traffic.Cloud environments pose significant challenges in maintaining privacy and security.Global approach...The increasing use of cloud-based devices has reached the critical point of cybersecurity and unwanted network traffic.Cloud environments pose significant challenges in maintaining privacy and security.Global approaches,such as IDS,have been developed to tackle these issues.However,most conventional Intrusion Detection System(IDS)models struggle with unseen cyberattacks and complex high-dimensional data.In fact,this paper introduces the idea of a novel distributed explainable and heterogeneous transformer-based intrusion detection system,named INTRUMER,which offers balanced accuracy,reliability,and security in cloud settings bymultiplemodulesworking together within it.The traffic captured from cloud devices is first passed to the TC&TM module in which the Falcon Optimization Algorithm optimizes the feature selection process,and Naie Bayes algorithm performs the classification of features.The selected features are classified further and are forwarded to the Heterogeneous Attention Transformer(HAT)module.In this module,the contextual interactions of the network traffic are taken into account to classify them as normal or malicious traffic.The classified results are further analyzed by the Explainable Prevention Module(XPM)to ensure trustworthiness by providing interpretable decisions.With the explanations fromthe classifier,emergency alarms are transmitted to nearby IDSmodules,servers,and underlying cloud devices for the enhancement of preventive measures.Extensive experiments on benchmark IDS datasets CICIDS 2017,Honeypots,and NSL-KDD were conducted to demonstrate the efficiency of the INTRUMER model in detecting network trafficwith high accuracy for different types.Theproposedmodel outperforms state-of-the-art approaches,obtaining better performance metrics:98.7%accuracy,97.5%precision,96.3%recall,and 97.8%F1-score.Such results validate the robustness and effectiveness of INTRUMER in securing diverse cloud environments against sophisticated cyber threats.展开更多
Advanced machine learning(ML)algorithms have outperformed traditional approaches in various forecasting applications,especially electricity price forecasting(EPF).However,the prediction accuracy of ML reduces substant...Advanced machine learning(ML)algorithms have outperformed traditional approaches in various forecasting applications,especially electricity price forecasting(EPF).However,the prediction accuracy of ML reduces substantially if the input data is not similar to the ones seen by the model during training.This is often observed in EPF problems when market dynamics change owing to a rise in fuel prices,an increase in renewable penetration,a change in operational policies,etc.While the dip in model accuracy for unseen data is a cause for concern,what is more,challenging is not knowing when the ML model would respond in such a manner.Such uncertainty makes the power market participants,like bidding agents and retailers,vulnerable to substantial financial loss caused by the prediction errors of EPF models.Therefore,it becomes essential to identify whether or not the model prediction at a given instance is trustworthy.In this light,this paper proposes a trust algorithm for EPF users based on explainable artificial intelligence techniques.The suggested algorithm generates trust scores that reflect the model’s prediction quality for each new input.These scores are formulated in two stages:in the first stage,the coarse version of the score is formed using correlations of local and global explanations,and in the second stage,the score is fine-tuned further by the Shapley additive explanations values of different features.Such score-based explanations are more straightforward than feature-based visual explanations for EPF users like asset managers and traders.A dataset from Italy’s and ERCOT’s electricity market validates the efficacy of the proposed algorithm.Results show that the algorithm has more than 85%accuracy in identifying good predictions when the data distribution is similar to the training dataset.In the case of distribution shift,the algorithm shows the same accuracy level in identifying bad predictions.展开更多
In the field of precision healthcare,where accurate decision-making is paramount,this study underscores the indispensability of eXplainable Artificial Intelligence(XAI)in the context of epilepsy management within the ...In the field of precision healthcare,where accurate decision-making is paramount,this study underscores the indispensability of eXplainable Artificial Intelligence(XAI)in the context of epilepsy management within the Internet of Medical Things(IoMT).The methodology entails meticulous preprocessing,involving the application of a band-pass filter and epoch segmentation to optimize the quality of Electroencephalograph(EEG)data.The subsequent extraction of statistical features facilitates the differentiation between seizure and non-seizure patterns.The classification phase integrates Support Vector Machine(SVM),K-Nearest Neighbor(KNN),and Random Forest classifiers.Notably,SVM attains an accuracy of 97.26%,excelling in the precision,recall,specificity,and F1 score for identifying seizures and non-seizure instances.Conversely,KNN achieves an accuracy of 72.69%,accompanied by certain trade-offs.The Random Forest classifierstands out with a remarkable accuracy of 99.89%,coupled with an exceptional precision(99.73%),recall(100%),specificity(99.80%),and F1 score(99.86%),surpassing both SVM and KNN performances.XAI techniques,namely Local Interpretable ModelAgnostic Explanations(LIME)and SHapley Additive exPlanation(SHAP),enhance the system’s transparency.This combination of machine learning and XAI not only improves the reliability and accuracy of the seizure detection system but also enhances trust and interpretability.Healthcare professionals can leverage the identified important features and their dependencies to gain deeper insights into the decision-making process,aiding in informed diagnosis and treatment decisions for patients with epilepsy.展开更多
Incorporation of explainability features in the decision-making web-based systems is considered a primary concern to enhance accountability,transparency,and trust in the community.Multi-domain Sentiment Analysis is a ...Incorporation of explainability features in the decision-making web-based systems is considered a primary concern to enhance accountability,transparency,and trust in the community.Multi-domain Sentiment Analysis is a significant web-based system where the explainability feature is essential for achieving user satisfaction.Conventional design methodologies such as object-oriented design methodology(OODM)have been proposed for web-based application development,which facilitates code reuse,quantification,and security at the design level.However,OODM did not provide the feature of explainability in web-based decision-making systems.X-OODM modifies the OODM with added explainable models to introduce the explainability feature for such systems.This research introduces an explainable model leveraging X-OODM for designing transparent applications for multidomain sentiment analysis.The proposed design is evaluated using the design quality metrics defined for the evaluation of the X-OODM explainable model under user context.The design quality metrics,transferability,simulatability,informativeness,and decomposability were introduced one after another over time to the evaluation of the X-OODM user context.Auxiliary metrics of accessibility and algorithmic transparency were added to increase the degree of explainability for the design.The study results reveal that introducing such explainability parameters with X-OODM appropriately increases system transparency,trustworthiness,and user understanding.The experimental results validate the enhancement of decision-making for multi-domain sentiment analysis with integration at the design level of explainability.Future work can be built in this direction by extending this work to apply the proposed X-OODM framework over different datasets and sentiment analysis applications to further scrutinize its effectiveness in real-world scenarios.展开更多
This study introduces an innovative computational framework leveraging the transformer architecture to address a critical challenge in chemical process engineering:predicting and optimizing light olefin yields in indu...This study introduces an innovative computational framework leveraging the transformer architecture to address a critical challenge in chemical process engineering:predicting and optimizing light olefin yields in industrial methanol-to-olefins(MTO)processes.Our approach integrates advanced machine learning techniques with chemical engineering principles to tackle the complexities of non-stationary,highly volatile production data in large-scale chemical manufacturing.The framework employs the maximal information coefficient(MIC)algorithm to analyze and select the significant variables from MTO process parameters,forming a robust dataset for model development.We implement a transformer-based time series forecasting model,enhanced through positional encoding and hyperparameter optimization,significantly improving predictive accuracy for ethylene and propylene yields.The model's interpretability is augmented by applying SHapley additive exPlanations(SHAP)to quantify and visualize the impact of reaction control variables on olefin yields,providing valuable insights for process optimization.Experimental results demonstrate that our model outperforms traditional statistical and machine learning methods in accuracy and interpretability,effectively handling nonlinear,non-stationary,highvolatility,and long-sequence data challenges in olefin yield prediction.This research contributes to chemical engineering by providing a novel computerized methodology for solving complex production optimization problems in the chemical industry,offering significant potential for enhancing decisionmaking in MTO system production control and fostering the intelligent transformation of manufacturing processes.展开更多
Self-Explaining Autonomous Systems(SEAS)have emerged as a strategic frontier within Artificial Intelligence(AI),responding to growing demands for transparency and interpretability in autonomous decisionmaking.This stu...Self-Explaining Autonomous Systems(SEAS)have emerged as a strategic frontier within Artificial Intelligence(AI),responding to growing demands for transparency and interpretability in autonomous decisionmaking.This study presents a comprehensive bibliometric analysis of SEAS research published between 2020 and February 2025,drawing upon 1380 documents indexed in Scopus.The analysis applies co-citation mapping,keyword co-occurrence,and author collaboration networks using VOSviewer,MASHA,and Python to examine scientific production,intellectual structure,and global collaboration patterns.The results indicate a sustained annual growth rate of 41.38%,with an h-index of 57 and an average of 21.97 citations per document.A normalized citation rate was computed to address temporal bias,enabling balanced evaluation across publication cohorts.Thematic analysis reveals four consolidated research fronts:interpretability in machine learning,explainability in deep neural networks,transparency in generative models,and optimization strategies in autonomous control.Author co-citation analysis identifies four distinct research communities,and keyword evolution shows growing interdisciplinary links with medicine,cybersecurity,and industrial automation.The United States leads in scientific output and citation impact at the geographical level,while countries like India and China show high productivity with varied influence.However,international collaboration remains limited at 7.39%,reflecting a fragmented research landscape.As discussed in this study,SEAS research is expanding rapidly yet remains epistemologically dispersed,with uneven integration of ethical and human-centered perspectives.This work offers a structured and data-driven perspective on SEAS development,highlights key contributors and thematic trends,and outlines critical directions for advancing responsible and transparent autonomous systems.展开更多
文摘With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.
文摘In the era of advanced machine learning techniques,the development of accurate predictive models for complex medical conditions,such as thyroid cancer,has shown remarkable progress.Accurate predictivemodels for thyroid cancer enhance early detection,improve resource allocation,and reduce overtreatment.However,the widespread adoption of these models in clinical practice demands predictive performance along with interpretability and transparency.This paper proposes a novel association-rule based feature-integratedmachine learning model which shows better classification and prediction accuracy than present state-of-the-artmodels.Our study also focuses on the application of SHapley Additive exPlanations(SHAP)values as a powerful tool for explaining thyroid cancer prediction models.In the proposed method,the association-rule based feature integration framework identifies frequently occurring attribute combinations in the dataset.The original dataset is used in trainingmachine learning models,and further used in generating SHAP values fromthesemodels.In the next phase,the dataset is integrated with the dominant feature sets identified through association-rule based analysis.This new integrated dataset is used in re-training the machine learning models.The new SHAP values generated from these models help in validating the contributions of feature sets in predicting malignancy.The conventional machine learning models lack interpretability,which can hinder their integration into clinical decision-making systems.In this study,the SHAP values are introduced along with association-rule based feature integration as a comprehensive framework for understanding the contributions of feature sets inmodelling the predictions.The study discusses the importance of reliable predictive models for early diagnosis of thyroid cancer,and a validation framework of explainability.The proposed model shows an accuracy of 93.48%.Performance metrics such as precision,recall,F1-score,and the area under the receiver operating characteristic(AUROC)are also higher than the baseline models.The results of the proposed model help us identify the dominant feature sets that impact thyroid cancer classification and prediction.The features{calcification}and{shape}consistently emerged as the top-ranked features associated with thyroid malignancy,in both association-rule based interestingnessmetric values and SHAPmethods.The paper highlights the potential of the rule-based integrated models with SHAP in bridging the gap between the machine learning predictions and the interpretability of this prediction which is required for real-world medical applications.
基金funding from the European Union-NextGenerationEU through the Italian Ministry of University and Research under PNRR-M4C2-I1.3 Project PE_00000019“HEAL ITALIA”to Stefano Diciotti-CUP J33C22002920006.
文摘In a recent study published in Nature Medicine,Wang,Shao,and colleagues successfully addressed two critical issues of lung cancer(LC)screening with low-dose computed tomography(LDCT)whose widespread implementation,despite its capacity to decrease LC mortality,remains challenging:(1)the difficulty in accurately distinguishing malignant nodules from the far more common benign nodules detected on LDCT,and(2)the insufficient coverage of LC screening in resource-limited areas.1 To perform nodule risk stratification,Wang et al.developed and validated a multi-step,multidimensional artificial intelligence(AI)-based system(Fig.1)and introduced a data-driven Chinese Lung Nodules Reporting and Data System(C-Lung-RADS).1 A Lung-RADS system was developed in the US to stratify lung nodules into categories of increasing risk of LC and to provide corresponding management recommendations.
文摘Background:Liver disease(LD)significantly impacts global health,requiring accurate diagnostic methods.This study aims to develop an automated system for LD prediction using machine learning(ML)and explainable artificial intelligence(XAI),enhancing diagnostic precision and interpretability.Methods:This research systematically analyzes two distinct datasets encompassing liver health indicators.A combination of preprocessing techniques,including feature optimization methods such as Forward Feature Selection(FFS),Backward Feature Selection(BFS),and Recursive Feature Elimination(RFE),is applied to enhance data quality.After that,ML models,namely Support Vector Machines(SVM),Naive Bayes(NB),Random Forest(RF),K-nearest neighbors(KNN),Decision Trees(DT),and a novel Tree Selection and Stacking Ensemble-based RF(TSRF),are assessed in the dataset to diagnose LD.Finally,the ultimate model is selected based on incorporating cross-validation and evaluation through performance metrics like accuracy,precision,specificity,etc.,and efficient XAI methods express the ultimate model’s interoperability.Findings:The analysis reveals TSRF as the most effective model,achieving a peak accuracy of 99.92%on Dataset-1 without feature optimization and 88.88%on Dataset-2 with RFE optimization.XAI techniques,including SHAP and LIME plots,highlight key features influencing model predictions,providing insights into the reasoning behind classification outcomes.Interpretation:The findings highlight TSRF’s potential in improving LD diagnosis,using XAI to enhance transparency and trust in ML models.Despite high accuracy and interpretability,limitations such as dataset bias and lack of clinical validation remain.Future work focuses on integrating advanced XAI,diversifying datasets,and applying the approach in clinical settings for reliable diagnostics.
基金Supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region,No.2022D01D17State Key Laboratory of Pathogenesis,Prevention and Treatment of High Incidence Diseases in Central Asia,No.SKL-HIDCA-2024-2.
文摘BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surgery combined with antiparasitic therapy is ideal,many patients present late,missing hepatectomy opportunities.Ex vivo liver resection and autotransplantation(ELRA)offers hope for such patients.Traditional surgical decision-making,relying on clinical experience,is prone to bias.Machine learning can enhance decision-making by identifying key factors influencing surgical choices.This study innovatively employs multiple machine learning methods by integrating various feature selection techniques and SHapley Additive exPlanations(SHAP)interpretive analysis to deeply explore the key decision factors influencing surgical strategies.AIM To determine the key preoperative factors influencing surgical decision-making in hepatic AE(HAE)using machine learning.METHODS This was a retrospective cohort study at the First Affiliated Hospital of Xinjiang Medical University(July 2010 to August 2024).There were 710 HAE patients(545 hepatectomy and 165 ELRA)with complete clinical data.Data included demographics,laboratory indicators,imaging,and pathology.Feature selection was performed using recursive feature elimination,minimum redundancy maximum relevance,and least absolute shrinkage and selection operator regression,with the intersection of these methods yielding 10 critical features.Eleven machinelearning algorithms were compared,with eXtreme Gradient Boosting(XGBoost)optimized using Bayesian optimization.Model interpretability was assessed using SHAP analysis.RESULTS The XGBoost model achieved an area under the curve of 0.935 in the training set and 0.734 in the validation set.The optimal threshold(0.28)yielded sensitivity of 93.6%and specificity of 90.9%.SHAP analysis identified type of vascular invasion as the most important feature,followed by platelet count and prothrombin time.Lesions invading the hepatic vein,inferior vena cava,or multiple vessels significantly increased the likelihood of ELRA.Calibration curves showed good agreement between predicted and observed probabilities(0.2-0.7 range).The model demonstrated high net clinical benefit in Decision Curve Analysis,with accuracy of 0.837,recall of 0.745,and F1 score of 0.788.CONCLUSION Vascular invasion is the dominant factor influencing the choice of surgical approach in HAE.Machine-learning models,particularly XGBoost,can provide transparent and data-driven support for personalized decision-making.
基金National Science Foundation(NSF),USA(No.IIS-1909702)Army Research Office(ARO),USA(No.W911NF21-1-0198)Department of Homeland Security(DNS)CINA,USA(No.E205949D).
文摘Graph neural networks(GNNs)have made rapid developments in the recent years.Due to their great ability in modeling graph-structured data,GNNs are vastly used in various applications,including high-stakes scenarios such as financial analysis,traffic predictions,and drug discovery.Despite their great potential in benefiting humans in the real world,recent study shows that GNNs can leak private information,are vulnerable to adversarial attacks,can inherit and magnify societal bias from training data and lack inter-pretability,which have risk of causing unintentional harm to the users and society.For example,existing works demonstrate that at-tackers can fool the GNNs to give the outcome they desire with unnoticeable perturbation on training graph.GNNs trained on social networks may embed the discrimination in their decision process,strengthening the undesirable societal bias.Consequently,trust-worthy GNNs in various aspects are emerging to prevent the harm from GNN models and increase the users'trust in GNNs.In this pa-per,we give a comprehensive survey of GNNs in the computational aspects of privacy,robustness,fairness,and explainability.For each aspect,we give the taxonomy of the related methods and formulate the general frameworks for the multiple categories of trustworthy GNNs.We also discuss the future research directions of each aspect and connections between these aspects to help achieve trustworthi-ness.
基金supported by the German Federal Ministry of Economic Affairs and Climate Action(BMWK)through the project“FlexGUIde”(grant number 03EI6065D).
文摘Electric Load Forecasting(ELF)is the central instrument for planning and controlling demand response programs,electricity trading,and consumption optimization.Due to the increasing automation of these processes,meaningful and transparent forecasts become more and more important.Still,at the same time,the complexity of the used machine learning models and architectures increases.Because there is an increasing interest in interpretable and explainable load forecasting methods,this work conducts a literature review to present already applied approaches regarding explainability and interpretability for load forecasts using Machine Learning.Based on extensive literature research covering eight publication portals,recurring modeling approaches,trends,and modeling techniques are identified and clustered by properties to achieve more interpretable and explainable load forecasts.The results on interpretability show an increase in the use of probabilistic models,methods for time series decomposition and the use of fuzzy logic in addition to classically interpretable models.Dominant explainable approaches are Feature Importance and Attention mechanisms.The discussion shows that a lot of knowledge from the related field of time series forecasting still needs to be adapted to the problems in ELF.Compared to other applications of explainable and interpretable methods such as clustering,there are currently relatively few research results,but with an increasing trend.
文摘Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.
基金supported by the National Natural Science Foundation of China(Grant Nos.42107214 and 52130905).
文摘We conducted a study to evaluate the potential and robustness of gradient boosting algorithms in rock burst assessment,established a variational autoencoder(VAE)to address the imbalance rock burst dataset,and proposed a multilevel explainable artificial intelligence(XAI)tailored for tree-based ensemble learning.We collected 537 data from real-world rock burst records and selected four critical features contributing to rock burst occurrences.Initially,we employed data visualization to gain insight into the data’s structure and performed correlation analysis to explore the data distribution and feature relationships.Then,we set up a VAE model to generate samples for the minority class due to the imbalanced class distribution.In conjunction with the VAE,we compared and evaluated six state-of-theart ensemble models,including gradient boosting algorithms and the classical logistic regression model,for rock burst prediction.The results indicated that gradient boosting algorithms outperformed the classical single models,and the VAE-classifier outperformed the original classifier,with the VAE-NGBoost model yielding the most favorable results.Compared to other resampling methods combined with NGBoost for imbalanced datasets,such as synthetic minority oversampling technique(SMOTE),SMOTE-edited nearest neighbours(SMOTE-ENN),and SMOTE-tomek links(SMOTE-Tomek),the VAE-NGBoost model yielded the best performance.Finally,we developed a multilevel XAI model using feature sensitivity analysis,Tree Shapley Additive exPlanations(Tree SHAP),and Anchor to provide an in-depth exploration of the decision-making mechanics of VAE-NGBoost,further enhancing the accountability of treebased ensemble models in predicting rock burst occurrences.
文摘As the use of blockchain for digital payments continues to rise,it becomes susceptible to various malicious attacks.Successfully detecting anomalies within blockchain transactions is essential for bolstering trust in digital payments.However,the task of anomaly detection in blockchain transaction data is challenging due to the infrequent occurrence of illicit transactions.Although several studies have been conducted in the field,a limitation persists:the lack of explanations for the model’s predictions.This study seeks to overcome this limitation by integrating explainable artificial intelligence(XAI)techniques and anomaly rules into tree-based ensemble classifiers for detecting anomalous Bitcoin transactions.The shapley additive explanation(SHAP)method is employed to measure the contribution of each feature,and it is compatible with ensemble models.Moreover,we present rules for interpreting whether a Bitcoin transaction is anomalous or not.Additionally,we introduce an under-sampling algorithm named XGBCLUS,designed to balance anomalous and non-anomalous transaction data.This algorithm is compared against other commonly used under-sampling and over-sampling techniques.Finally,the outcomes of various tree-based single classifiers are compared with those of stacking and voting ensemble classifiers.Our experimental results demonstrate that:(i)XGBCLUS enhances true positive rate(TPR)and receiver operating characteristic-area under curve(ROC-AUC)scores compared to state-of-the-art under-sampling and over-sampling techniques,and(ii)our proposed ensemble classifiers outperform traditional single tree-based machine learning classifiers in terms of accuracy,TPR,and false positive rate(FPR)scores.
基金funded by Researchers Supporting Project Number(RSPD2025R947),King Saud University,Riyadh,Saudi Arabia.
文摘Cardiovascular disease(CVD)remains a leading global health challenge due to its high mortality rate and the complexity of early diagnosis,driven by risk factors such as hypertension,high cholesterol,and irregular pulse rates.Traditional diagnostic methods often struggle with the nuanced interplay of these risk factors,making early detection difficult.In this research,we propose a novel artificial intelligence-enabled(AI-enabled)framework for CVD risk prediction that integrates machine learning(ML)with eXplainable AI(XAI)to provide both high-accuracy predictions and transparent,interpretable insights.Compared to existing studies that typically focus on either optimizing ML performance or using XAI separately for local or global explanations,our approach uniquely combines both local and global interpretability using Local Interpretable Model-Agnostic Explanations(LIME)and SHapley Additive exPlanations(SHAP).This dual integration enhances the interpretability of the model and facilitates clinicians to comprehensively understand not just what the model predicts but also why those predictions are made by identifying the contribution of different risk factors,which is crucial for transparent and informed decision-making in healthcare.The framework uses ML techniques such as K-nearest neighbors(KNN),gradient boosting,random forest,and decision tree,trained on a cardiovascular dataset.Additionally,the integration of LIME and SHAP provides patient-specific insights alongside global trends,ensuring that clinicians receive comprehensive and actionable information.Our experimental results achieve 98%accuracy with the Random Forest model,with precision,recall,and F1-scores of 97%,98%,and 98%,respectively.The innovative combination of SHAP and LIME sets a new benchmark in CVD prediction by integrating advanced ML accuracy with robust interpretability,fills a critical gap in existing approaches.This framework paves the way for more explainable and transparent decision-making in healthcare,ensuring that the model is not only accurate but also trustworthy and actionable for clinicians.
文摘Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adapting to diverse industrial environments and ensuring the transparency and fairness of their predictions.This paper presents a novel predictive maintenance framework that integrates deep learning and optimization techniques while addressing key ethical considerations,such as transparency,fairness,and explainability,in artificial intelligence driven decision-making.The framework employs an Autoencoder for feature reduction,a Convolutional Neural Network for pattern recognition,and a Long Short-Term Memory network for temporal analysis.To enhance transparency,the decision-making process of the framework is made interpretable,allowing stakeholders to understand and trust the model’s predictions.Additionally,Particle Swarm Optimization is used to refine hyperparameters for optimal performance and mitigate potential biases in the model.Experiments are conducted on multiple datasets from different industrial scenarios,with performance validated using accuracy,precision,recall,F1-score,and training time metrics.The results demonstrate an impressive accuracy of up to 99.92%and 99.45%across different datasets,highlighting the framework’s effectiveness in enhancing predictive maintenance strategies.Furthermore,the model’s explainability ensures that the decisions can be audited for fairness and accountability,aligning with ethical standards for critical systems.By addressing transparency and reducing potential biases,this framework contributes to the responsible and trustworthy deployment of artificial intelligence in industrial environments,particularly in safety-critical applications.The results underscore its potential for wide application across various industrial contexts,enhancing both performance and ethical decision-making.
文摘In this study,a machine learning-based predictive model was developed for the Musa petti Wind Farm in Sri Lanka to address the need for localized forecasting solutions.Using data on wind speed,air temperature,nacelle position,and actual power,lagged features were generated to capture temporal dependencies.Among 24 evaluated models,the ensemble bagging approach achieved the best performance,with R^(2) values of 0.89 at 0 min and 0.75 at 60 min.Shapley Additive exPlanations(SHAP)analysis revealed that while wind speed is the primary driver for short-term predictions,air temperature and nacelle position become more influential at longer forecasting horizons.These findings underscore the reliability of short-term predictions and the potential benefits of integrating hybrid AI and probabilistic models for extended forecasts.Our work contributes a robust and explainable framework to support Sri Lanka’s renewable energy transition,and future research will focus on real-time deployment and uncertainty quantification.
基金financial support from the National Key Research and Development Program of China(2021YFB 3501501)the National Natural Science Foundation of China(No.22225803,22038001,22108007 and 22278011)+1 种基金Beijing Natural Science Foundation(No.Z230023)Beijing Science and Technology Commission(No.Z211100004321001).
文摘The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.
文摘The increasing use of cloud-based devices has reached the critical point of cybersecurity and unwanted network traffic.Cloud environments pose significant challenges in maintaining privacy and security.Global approaches,such as IDS,have been developed to tackle these issues.However,most conventional Intrusion Detection System(IDS)models struggle with unseen cyberattacks and complex high-dimensional data.In fact,this paper introduces the idea of a novel distributed explainable and heterogeneous transformer-based intrusion detection system,named INTRUMER,which offers balanced accuracy,reliability,and security in cloud settings bymultiplemodulesworking together within it.The traffic captured from cloud devices is first passed to the TC&TM module in which the Falcon Optimization Algorithm optimizes the feature selection process,and Naie Bayes algorithm performs the classification of features.The selected features are classified further and are forwarded to the Heterogeneous Attention Transformer(HAT)module.In this module,the contextual interactions of the network traffic are taken into account to classify them as normal or malicious traffic.The classified results are further analyzed by the Explainable Prevention Module(XPM)to ensure trustworthiness by providing interpretable decisions.With the explanations fromthe classifier,emergency alarms are transmitted to nearby IDSmodules,servers,and underlying cloud devices for the enhancement of preventive measures.Extensive experiments on benchmark IDS datasets CICIDS 2017,Honeypots,and NSL-KDD were conducted to demonstrate the efficiency of the INTRUMER model in detecting network trafficwith high accuracy for different types.Theproposedmodel outperforms state-of-the-art approaches,obtaining better performance metrics:98.7%accuracy,97.5%precision,96.3%recall,and 97.8%F1-score.Such results validate the robustness and effectiveness of INTRUMER in securing diverse cloud environments against sophisticated cyber threats.
文摘Advanced machine learning(ML)algorithms have outperformed traditional approaches in various forecasting applications,especially electricity price forecasting(EPF).However,the prediction accuracy of ML reduces substantially if the input data is not similar to the ones seen by the model during training.This is often observed in EPF problems when market dynamics change owing to a rise in fuel prices,an increase in renewable penetration,a change in operational policies,etc.While the dip in model accuracy for unseen data is a cause for concern,what is more,challenging is not knowing when the ML model would respond in such a manner.Such uncertainty makes the power market participants,like bidding agents and retailers,vulnerable to substantial financial loss caused by the prediction errors of EPF models.Therefore,it becomes essential to identify whether or not the model prediction at a given instance is trustworthy.In this light,this paper proposes a trust algorithm for EPF users based on explainable artificial intelligence techniques.The suggested algorithm generates trust scores that reflect the model’s prediction quality for each new input.These scores are formulated in two stages:in the first stage,the coarse version of the score is formed using correlations of local and global explanations,and in the second stage,the score is fine-tuned further by the Shapley additive explanations values of different features.Such score-based explanations are more straightforward than feature-based visual explanations for EPF users like asset managers and traders.A dataset from Italy’s and ERCOT’s electricity market validates the efficacy of the proposed algorithm.Results show that the algorithm has more than 85%accuracy in identifying good predictions when the data distribution is similar to the training dataset.In the case of distribution shift,the algorithm shows the same accuracy level in identifying bad predictions.
文摘In the field of precision healthcare,where accurate decision-making is paramount,this study underscores the indispensability of eXplainable Artificial Intelligence(XAI)in the context of epilepsy management within the Internet of Medical Things(IoMT).The methodology entails meticulous preprocessing,involving the application of a band-pass filter and epoch segmentation to optimize the quality of Electroencephalograph(EEG)data.The subsequent extraction of statistical features facilitates the differentiation between seizure and non-seizure patterns.The classification phase integrates Support Vector Machine(SVM),K-Nearest Neighbor(KNN),and Random Forest classifiers.Notably,SVM attains an accuracy of 97.26%,excelling in the precision,recall,specificity,and F1 score for identifying seizures and non-seizure instances.Conversely,KNN achieves an accuracy of 72.69%,accompanied by certain trade-offs.The Random Forest classifierstands out with a remarkable accuracy of 99.89%,coupled with an exceptional precision(99.73%),recall(100%),specificity(99.80%),and F1 score(99.86%),surpassing both SVM and KNN performances.XAI techniques,namely Local Interpretable ModelAgnostic Explanations(LIME)and SHapley Additive exPlanation(SHAP),enhance the system’s transparency.This combination of machine learning and XAI not only improves the reliability and accuracy of the seizure detection system but also enhances trust and interpretability.Healthcare professionals can leverage the identified important features and their dependencies to gain deeper insights into the decision-making process,aiding in informed diagnosis and treatment decisions for patients with epilepsy.
基金support of the Deanship of Research and Graduate Studies at Ajman University under Projects 2024-IRG-ENiT-36 and 2024-IRG-ENIT-29.
文摘Incorporation of explainability features in the decision-making web-based systems is considered a primary concern to enhance accountability,transparency,and trust in the community.Multi-domain Sentiment Analysis is a significant web-based system where the explainability feature is essential for achieving user satisfaction.Conventional design methodologies such as object-oriented design methodology(OODM)have been proposed for web-based application development,which facilitates code reuse,quantification,and security at the design level.However,OODM did not provide the feature of explainability in web-based decision-making systems.X-OODM modifies the OODM with added explainable models to introduce the explainability feature for such systems.This research introduces an explainable model leveraging X-OODM for designing transparent applications for multidomain sentiment analysis.The proposed design is evaluated using the design quality metrics defined for the evaluation of the X-OODM explainable model under user context.The design quality metrics,transferability,simulatability,informativeness,and decomposability were introduced one after another over time to the evaluation of the X-OODM user context.Auxiliary metrics of accessibility and algorithmic transparency were added to increase the degree of explainability for the design.The study results reveal that introducing such explainability parameters with X-OODM appropriately increases system transparency,trustworthiness,and user understanding.The experimental results validate the enhancement of decision-making for multi-domain sentiment analysis with integration at the design level of explainability.Future work can be built in this direction by extending this work to apply the proposed X-OODM framework over different datasets and sentiment analysis applications to further scrutinize its effectiveness in real-world scenarios.
基金supported by the Humanities and Social Sciences Foundation of the Ministry of Education(22YJC910011)the China Postdoctoral Science Foundation(2023M733444)the Key Research and Development Program in Artificial Intelligence of Liaoning Province(2023JH26/10200012).
文摘This study introduces an innovative computational framework leveraging the transformer architecture to address a critical challenge in chemical process engineering:predicting and optimizing light olefin yields in industrial methanol-to-olefins(MTO)processes.Our approach integrates advanced machine learning techniques with chemical engineering principles to tackle the complexities of non-stationary,highly volatile production data in large-scale chemical manufacturing.The framework employs the maximal information coefficient(MIC)algorithm to analyze and select the significant variables from MTO process parameters,forming a robust dataset for model development.We implement a transformer-based time series forecasting model,enhanced through positional encoding and hyperparameter optimization,significantly improving predictive accuracy for ethylene and propylene yields.The model's interpretability is augmented by applying SHapley additive exPlanations(SHAP)to quantify and visualize the impact of reaction control variables on olefin yields,providing valuable insights for process optimization.Experimental results demonstrate that our model outperforms traditional statistical and machine learning methods in accuracy and interpretability,effectively handling nonlinear,non-stationary,highvolatility,and long-sequence data challenges in olefin yield prediction.This research contributes to chemical engineering by providing a novel computerized methodology for solving complex production optimization problems in the chemical industry,offering significant potential for enhancing decisionmaking in MTO system production control and fostering the intelligent transformation of manufacturing processes.
基金partially funded by the Programa Nacional de Becas y Crédito Educativo of Peru and the Universitat de València,Spain.
文摘Self-Explaining Autonomous Systems(SEAS)have emerged as a strategic frontier within Artificial Intelligence(AI),responding to growing demands for transparency and interpretability in autonomous decisionmaking.This study presents a comprehensive bibliometric analysis of SEAS research published between 2020 and February 2025,drawing upon 1380 documents indexed in Scopus.The analysis applies co-citation mapping,keyword co-occurrence,and author collaboration networks using VOSviewer,MASHA,and Python to examine scientific production,intellectual structure,and global collaboration patterns.The results indicate a sustained annual growth rate of 41.38%,with an h-index of 57 and an average of 21.97 citations per document.A normalized citation rate was computed to address temporal bias,enabling balanced evaluation across publication cohorts.Thematic analysis reveals four consolidated research fronts:interpretability in machine learning,explainability in deep neural networks,transparency in generative models,and optimization strategies in autonomous control.Author co-citation analysis identifies four distinct research communities,and keyword evolution shows growing interdisciplinary links with medicine,cybersecurity,and industrial automation.The United States leads in scientific output and citation impact at the geographical level,while countries like India and China show high productivity with varied influence.However,international collaboration remains limited at 7.39%,reflecting a fragmented research landscape.As discussed in this study,SEAS research is expanding rapidly yet remains epistemologically dispersed,with uneven integration of ethical and human-centered perspectives.This work offers a structured and data-driven perspective on SEAS development,highlights key contributors and thematic trends,and outlines critical directions for advancing responsible and transparent autonomous systems.