Network attacks have become a critical issue in the internet security domain.Artificial intelligence technology-based detection methodologies have attracted attention;however,recent studies have struggled to adapt to ...Network attacks have become a critical issue in the internet security domain.Artificial intelligence technology-based detection methodologies have attracted attention;however,recent studies have struggled to adapt to changing attack patterns and complex network environments.In addition,it is difficult to explain the detection results logically using artificial intelligence.We propose a method for classifying network attacks using graph models to explain the detection results.First,we reconstruct the network packet data into a graphical structure.We then use a graph model to predict network attacks using edge classification.To explain the prediction results,we observed numerical changes by randomly masking and calculating the importance of neighbors,allowing us to extract significant subgraphs.Our experiments on six public datasets demonstrate superior performance with an average F1-score of 0.960 and accuracy of 0.964,outperforming traditional machine learning and other graph models.The visual representation of the extracted subgraphs highlights the neighboring nodes that have the greatest impact on the results,thus explaining detection.In conclusion,this study demonstrates that graph-based models are suitable for network attack detection in complex environments,and the importance of graph neighbors can be calculated to efficiently analyze the results.This approach can contribute to real-world network security analyses and provide a new direction in the field.展开更多
Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes ev...Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.展开更多
Self-Explaining Autonomous Systems(SEAS)have emerged as a strategic frontier within Artificial Intelligence(AI),responding to growing demands for transparency and interpretability in autonomous decisionmaking.This stu...Self-Explaining Autonomous Systems(SEAS)have emerged as a strategic frontier within Artificial Intelligence(AI),responding to growing demands for transparency and interpretability in autonomous decisionmaking.This study presents a comprehensive bibliometric analysis of SEAS research published between 2020 and February 2025,drawing upon 1380 documents indexed in Scopus.The analysis applies co-citation mapping,keyword co-occurrence,and author collaboration networks using VOSviewer,MASHA,and Python to examine scientific production,intellectual structure,and global collaboration patterns.The results indicate a sustained annual growth rate of 41.38%,with an h-index of 57 and an average of 21.97 citations per document.A normalized citation rate was computed to address temporal bias,enabling balanced evaluation across publication cohorts.Thematic analysis reveals four consolidated research fronts:interpretability in machine learning,explainability in deep neural networks,transparency in generative models,and optimization strategies in autonomous control.Author co-citation analysis identifies four distinct research communities,and keyword evolution shows growing interdisciplinary links with medicine,cybersecurity,and industrial automation.The United States leads in scientific output and citation impact at the geographical level,while countries like India and China show high productivity with varied influence.However,international collaboration remains limited at 7.39%,reflecting a fragmented research landscape.As discussed in this study,SEAS research is expanding rapidly yet remains epistemologically dispersed,with uneven integration of ethical and human-centered perspectives.This work offers a structured and data-driven perspective on SEAS development,highlights key contributors and thematic trends,and outlines critical directions for advancing responsible and transparent autonomous systems.展开更多
Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,y...Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,yet the key conditioning factors(e.g.,topography,hydrology)remain insufficiently understood.This study proposes a novel artificial intelligence(AI)framework that integrates four machine learning(ML)models with Shapley Additive Explanations(SHAP)method,offering a hierarchical perspective from global to local on the dominant factors controlling gully distribution in wildfireaffected areas.In a case study of Xiangjiao catchment burned on March 28,2020,in Muli County in Sichuan Province of Southwest China,we derived 21 geoenvironmental factors to assess the susceptibility of post-fire gully erosion using logistic regression(LR),support vector machine(SVM),random forest(RF),and convolutional neural network(CNN)models.SHAP-based model interpretation revealed eight key conditioning factors:topographic position index(TPI),topographic wetness index(TWI),distance to stream,mean annual precipitation,differenced normalized burn ratio(d NBR),land use/cover,soil type,and distance to road.Comparative model evaluation demonstrated that reduced-variable models incorporating these dominant factors achieved accuracy comparable to that of the initial-variable models,with AUC values exceeding 0.868 across all ML algorithms.These findings provide critical insights into gully erosion behavior in wildfire-affected areas,supporting the decision-making process behind environmental management and hazard mitigation.展开更多
The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF str...The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.展开更多
The increasing use of cloud-based devices has reached the critical point of cybersecurity and unwanted network traffic.Cloud environments pose significant challenges in maintaining privacy and security.Global approach...The increasing use of cloud-based devices has reached the critical point of cybersecurity and unwanted network traffic.Cloud environments pose significant challenges in maintaining privacy and security.Global approaches,such as IDS,have been developed to tackle these issues.However,most conventional Intrusion Detection System(IDS)models struggle with unseen cyberattacks and complex high-dimensional data.In fact,this paper introduces the idea of a novel distributed explainable and heterogeneous transformer-based intrusion detection system,named INTRUMER,which offers balanced accuracy,reliability,and security in cloud settings bymultiplemodulesworking together within it.The traffic captured from cloud devices is first passed to the TC&TM module in which the Falcon Optimization Algorithm optimizes the feature selection process,and Naie Bayes algorithm performs the classification of features.The selected features are classified further and are forwarded to the Heterogeneous Attention Transformer(HAT)module.In this module,the contextual interactions of the network traffic are taken into account to classify them as normal or malicious traffic.The classified results are further analyzed by the Explainable Prevention Module(XPM)to ensure trustworthiness by providing interpretable decisions.With the explanations fromthe classifier,emergency alarms are transmitted to nearby IDSmodules,servers,and underlying cloud devices for the enhancement of preventive measures.Extensive experiments on benchmark IDS datasets CICIDS 2017,Honeypots,and NSL-KDD were conducted to demonstrate the efficiency of the INTRUMER model in detecting network trafficwith high accuracy for different types.Theproposedmodel outperforms state-of-the-art approaches,obtaining better performance metrics:98.7%accuracy,97.5%precision,96.3%recall,and 97.8%F1-score.Such results validate the robustness and effectiveness of INTRUMER in securing diverse cloud environments against sophisticated cyber threats.展开更多
Machine learning models are increasingly used to correct the vertical biases(mainly due to vegetation and buildings)in global Digital Elevation Models(DEMs),for downstream applications which need‘‘bare earth"el...Machine learning models are increasingly used to correct the vertical biases(mainly due to vegetation and buildings)in global Digital Elevation Models(DEMs),for downstream applications which need‘‘bare earth"elevations.The predictive accuracy of these models has improved significantly as more flexible model architectures are developed and new explanatory datasets produced,leading to the recent release of three model-corrected DEMs(FABDEM,DiluviumDEM and FathomDEM).However,there has been relatively little focus so far on explaining or interrogating these models,especially important in this context given their downstream impact on many other applications(including natural hazard simulations).In this study we train five separate models(by land cover environment)to correct vertical biases in the Copernicus DEM and then explain them using SHapley Additive exPlanation(SHAP)values.Comparing the models,we find significant variation in terms of the specific input variables selected and their relative importance,suggesting that an ensemble of models(specialising by land cover)is likely preferable to a general model applied everywhere.Visualising the patterns learned by the models(using SHAP dependence plots)provides further insights,building confidence in some cases(where patterns are consistent with domain knowledge and past studies)and highlighting potentially problematic variables in others(such as proxy relationships which may not apply in new application sites).Our results have implications for future DEM error prediction studies,particularly in evaluating a very wide range of potential input variables(160 candidates)drawn from topographic,multispectral,Synthetic Aperture Radar,vegetation,climate and urbanisation datasets.展开更多
Incorporation of explainability features in the decision-making web-based systems is considered a primary concern to enhance accountability,transparency,and trust in the community.Multi-domain Sentiment Analysis is a ...Incorporation of explainability features in the decision-making web-based systems is considered a primary concern to enhance accountability,transparency,and trust in the community.Multi-domain Sentiment Analysis is a significant web-based system where the explainability feature is essential for achieving user satisfaction.Conventional design methodologies such as object-oriented design methodology(OODM)have been proposed for web-based application development,which facilitates code reuse,quantification,and security at the design level.However,OODM did not provide the feature of explainability in web-based decision-making systems.X-OODM modifies the OODM with added explainable models to introduce the explainability feature for such systems.This research introduces an explainable model leveraging X-OODM for designing transparent applications for multidomain sentiment analysis.The proposed design is evaluated using the design quality metrics defined for the evaluation of the X-OODM explainable model under user context.The design quality metrics,transferability,simulatability,informativeness,and decomposability were introduced one after another over time to the evaluation of the X-OODM user context.Auxiliary metrics of accessibility and algorithmic transparency were added to increase the degree of explainability for the design.The study results reveal that introducing such explainability parameters with X-OODM appropriately increases system transparency,trustworthiness,and user understanding.The experimental results validate the enhancement of decision-making for multi-domain sentiment analysis with integration at the design level of explainability.Future work can be built in this direction by extending this work to apply the proposed X-OODM framework over different datasets and sentiment analysis applications to further scrutinize its effectiveness in real-world scenarios.展开更多
In the field of precision healthcare,where accurate decision-making is paramount,this study underscores the indispensability of eXplainable Artificial Intelligence(XAI)in the context of epilepsy management within the ...In the field of precision healthcare,where accurate decision-making is paramount,this study underscores the indispensability of eXplainable Artificial Intelligence(XAI)in the context of epilepsy management within the Internet of Medical Things(IoMT).The methodology entails meticulous preprocessing,involving the application of a band-pass filter and epoch segmentation to optimize the quality of Electroencephalograph(EEG)data.The subsequent extraction of statistical features facilitates the differentiation between seizure and non-seizure patterns.The classification phase integrates Support Vector Machine(SVM),K-Nearest Neighbor(KNN),and Random Forest classifiers.Notably,SVM attains an accuracy of 97.26%,excelling in the precision,recall,specificity,and F1 score for identifying seizures and non-seizure instances.Conversely,KNN achieves an accuracy of 72.69%,accompanied by certain trade-offs.The Random Forest classifierstands out with a remarkable accuracy of 99.89%,coupled with an exceptional precision(99.73%),recall(100%),specificity(99.80%),and F1 score(99.86%),surpassing both SVM and KNN performances.XAI techniques,namely Local Interpretable ModelAgnostic Explanations(LIME)and SHapley Additive exPlanation(SHAP),enhance the system’s transparency.This combination of machine learning and XAI not only improves the reliability and accuracy of the seizure detection system but also enhances trust and interpretability.Healthcare professionals can leverage the identified important features and their dependencies to gain deeper insights into the decision-making process,aiding in informed diagnosis and treatment decisions for patients with epilepsy.展开更多
Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adaptin...Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adapting to diverse industrial environments and ensuring the transparency and fairness of their predictions.This paper presents a novel predictive maintenance framework that integrates deep learning and optimization techniques while addressing key ethical considerations,such as transparency,fairness,and explainability,in artificial intelligence driven decision-making.The framework employs an Autoencoder for feature reduction,a Convolutional Neural Network for pattern recognition,and a Long Short-Term Memory network for temporal analysis.To enhance transparency,the decision-making process of the framework is made interpretable,allowing stakeholders to understand and trust the model’s predictions.Additionally,Particle Swarm Optimization is used to refine hyperparameters for optimal performance and mitigate potential biases in the model.Experiments are conducted on multiple datasets from different industrial scenarios,with performance validated using accuracy,precision,recall,F1-score,and training time metrics.The results demonstrate an impressive accuracy of up to 99.92%and 99.45%across different datasets,highlighting the framework’s effectiveness in enhancing predictive maintenance strategies.Furthermore,the model’s explainability ensures that the decisions can be audited for fairness and accountability,aligning with ethical standards for critical systems.By addressing transparency and reducing potential biases,this framework contributes to the responsible and trustworthy deployment of artificial intelligence in industrial environments,particularly in safety-critical applications.The results underscore its potential for wide application across various industrial contexts,enhancing both performance and ethical decision-making.展开更多
Predicting the health status of stroke patients at different stages of the disease is a critical clinical task.The onset and development of stroke are affected by an array of factors,encompassing genetic predispositio...Predicting the health status of stroke patients at different stages of the disease is a critical clinical task.The onset and development of stroke are affected by an array of factors,encompassing genetic predisposition,environmental exposure,unhealthy lifestyle habits,and existing medical conditions.Although existing machine learning-based methods for predicting stroke patients’health status have made significant progress,limitations remain in terms of prediction accuracy,model explainability,and system optimization.This paper proposes a multi-task learning approach based on Explainable Artificial Intelligence(XAI)for predicting the health status of stroke patients.First,we design a comprehensive multi-task learning framework that utilizes the task correlation of predicting various health status indicators in patients,enabling the parallel prediction of multiple health indicators.Second,we develop a multi-task Area Under Curve(AUC)optimization algorithm based on adaptive low-rank representation,which removes irrelevant information from the model structure to enhance the performance of multi-task AUC optimization.Additionally,the model’s explainability is analyzed through the stability analysis of SHAP values.Experimental results demonstrate that our approach outperforms comparison algorithms in key prognostic metrics F1 score and Efficiency.展开更多
With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Neve...With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.展开更多
Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartph...Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.展开更多
Skin cancer is the most prevalent cancer globally,primarily due to extensive exposure to Ultraviolet(UV)radiation.Early identification of skin cancer enhances the likelihood of effective treatment,as delays may lead t...Skin cancer is the most prevalent cancer globally,primarily due to extensive exposure to Ultraviolet(UV)radiation.Early identification of skin cancer enhances the likelihood of effective treatment,as delays may lead to severe tumor advancement.This study proposes a novel hybrid deep learning strategy to address the complex issue of skin cancer diagnosis,with an architecture that integrates a Vision Transformer,a bespoke convolutional neural network(CNN),and an Xception module.They were evaluated using two benchmark datasets,HAM10000 and Skin Cancer ISIC.On the HAM10000,the model achieves a precision of 95.46%,an accuracy of 96.74%,a recall of 96.27%,specificity of 96.00%and an F1-Score of 95.86%.It obtains an accuracy of 93.19%,a precision of 93.25%,a recall of 92.80%,a specificity of 92.89%and an F1-Score of 93.19%on the Skin Cancer ISIC dataset.The findings demonstrate that the model that was proposed is robust and trustworthy when it comes to the classification of skin lesions.In addition,the utilization of Explainable AI techniques,such as Grad-CAM visualizations,assists in highlighting the most significant lesion areas that have an impact on the decisions that are made by the model.展开更多
In the evolving landscape of cyber threats,phishing attacks pose significant challenges,particularly through deceptive webpages designed to extract sensitive information under the guise of legitimacy.Conventional and ...In the evolving landscape of cyber threats,phishing attacks pose significant challenges,particularly through deceptive webpages designed to extract sensitive information under the guise of legitimacy.Conventional and machine learning(ML)-based detection systems struggle to detect phishing websites owing to their constantly changing tactics.Furthermore,newer phishing websites exhibit subtle and expertly concealed indicators that are not readily detectable.Hence,effective detection depends on identifying the most critical features.Traditional feature selection(FS)methods often struggle to enhance ML model performance and instead decrease it.To combat these issues,we propose an innovative method using explainable AI(XAI)to enhance FS in ML models and improve the identification of phishing websites.Specifically,we employ SHapley Additive exPlanations(SHAP)for global perspective and aggregated local interpretable model-agnostic explanations(LIME)to deter-mine specific localized patterns.The proposed SHAP and LIME-aggregated FS(SLA-FS)framework pinpoints the most informative features,enabling more precise,swift,and adaptable phishing detection.Applying this approach to an up-to-date web phishing dataset,we evaluate the performance of three ML models before and after FS to assess their effectiveness.Our findings reveal that random forest(RF),with an accuracy of 97.41%and XGBoost(XGB)at 97.21%significantly benefit from the SLA-FS framework,while k-nearest neighbors lags.Our framework increases the accuracy of RF and XGB by 0.65%and 0.41%,respectively,outperforming traditional filter or wrapper methods and any prior methods evaluated on this dataset,showcasing its potential.展开更多
BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surger...BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surgery combined with antiparasitic therapy is ideal,many patients present late,missing hepatectomy opportunities.Ex vivo liver resection and autotransplantation(ELRA)offers hope for such patients.Traditional surgical decision-making,relying on clinical experience,is prone to bias.Machine learning can enhance decision-making by identifying key factors influencing surgical choices.This study innovatively employs multiple machine learning methods by integrating various feature selection techniques and SHapley Additive exPlanations(SHAP)interpretive analysis to deeply explore the key decision factors influencing surgical strategies.AIM To determine the key preoperative factors influencing surgical decision-making in hepatic AE(HAE)using machine learning.METHODS This was a retrospective cohort study at the First Affiliated Hospital of Xinjiang Medical University(July 2010 to August 2024).There were 710 HAE patients(545 hepatectomy and 165 ELRA)with complete clinical data.Data included demographics,laboratory indicators,imaging,and pathology.Feature selection was performed using recursive feature elimination,minimum redundancy maximum relevance,and least absolute shrinkage and selection operator regression,with the intersection of these methods yielding 10 critical features.Eleven machinelearning algorithms were compared,with eXtreme Gradient Boosting(XGBoost)optimized using Bayesian optimization.Model interpretability was assessed using SHAP analysis.RESULTS The XGBoost model achieved an area under the curve of 0.935 in the training set and 0.734 in the validation set.The optimal threshold(0.28)yielded sensitivity of 93.6%and specificity of 90.9%.SHAP analysis identified type of vascular invasion as the most important feature,followed by platelet count and prothrombin time.Lesions invading the hepatic vein,inferior vena cava,or multiple vessels significantly increased the likelihood of ELRA.Calibration curves showed good agreement between predicted and observed probabilities(0.2-0.7 range).The model demonstrated high net clinical benefit in Decision Curve Analysis,with accuracy of 0.837,recall of 0.745,and F1 score of 0.788.CONCLUSION Vascular invasion is the dominant factor influencing the choice of surgical approach in HAE.Machine-learning models,particularly XGBoost,can provide transparent and data-driven support for personalized decision-making.展开更多
Retaining walls are utilized to support the earth and prevent the soil from spreading with natural slope angles where there are differences in the elevation of ground surfaces.As the need for retaining structures incr...Retaining walls are utilized to support the earth and prevent the soil from spreading with natural slope angles where there are differences in the elevation of ground surfaces.As the need for retaining structures increases,the use of retaining walls is increasing.The retaining walls,which increase the stability of levels,are economical and meet existing adverse conditions.A considerable amount of retaining walls is made from steel-reinforced concrete.The construction of reinforced concrete retaining walls can be costly due to its components.For this reason,the optimum cost should be targeted in the design of retaining walls.This study presents an artificial neural network(ANN)model developed to predict the optimum dimensions of a retaining wall using soil properties,material properties,and external loading conditions.The dataset utilized to train the ANN model is generated with the Flower Pollination Algorithm.The target variables in the dataset are the length of the heel(y1),length of the toe(y2),thickness of the stem(top)(y3),thickness of the stem(bottom)(y4),foundation base thickness(y5)and cost(y6)and these are estimated by utilizing an ANN model based on the height of the wall(x1),material unit weight(x2),wall friction angle(x3),surcharge load(x4),concrete cost per m3(x5),steel cost per ton(x6)and the soil class(x7).The model is formulated and trained as a multi-output regression model,as all outputs are numeric and continuous.The training and evaluation of the model results in a high prediction performance(R20.99).In addition,the impacts of different input features on the model>predictions are revealed using the SHapley Additive exPlanations(SHAP)algorithm.The study demonstrates that when trained with a large dataset,ANN models perform very well by predicting the optimal cost with high performance.展开更多
Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modelling, and optimization. In this work, an enhanced framework for pure component property prediction by...Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modelling, and optimization. In this work, an enhanced framework for pure component property prediction by using explainable machine learning methods is proposed. In this framework, the molecular representation method based on the connectivity matrix effectively considers atomic bonding relationships to automatically generate features. The supervised machine learning model random forest is applied for feature ranking and pooling. The adjusted R^(2) is introduced to penalize the inclusion of additional features, providing an assessment of the true contribution of features. The prediction results for normal boiling point (T_(b)), liquid molar volume (L_(mv)), critical temperature (T_(c)) and critical pressure (P_(c)) obtained using Artificial Neural Network and Gaussian Process Regression models confirm the accuracy of the molecular representation method. Comparison with GC based models shows that the root-mean-square error on the test set can be reduced by up to 83.8%. To enhance the interpretability of the model, a feature analysis method based on Shapley values is employed to determine the contribution of each feature to the property predictions. The results indicate that using the feature pooling method reduces the number of features from 13316 to 100 without compromising model accuracy. The feature analysis results for Tb, Lmv, Tc, and Pc confirms that different molecular properties are influenced by different structural features, aligning with mechanistic interpretations. In conclusion, the proposed framework is demonstrated to be feasible and provides a solid foundation for mixture component reconstruction and process integration modelling.展开更多
The relationship between the neighborhood environment and well-being is attracting increasingly attention from researchers and policymakers,as the goal of development has shift from economy to well-being.However,exist...The relationship between the neighborhood environment and well-being is attracting increasingly attention from researchers and policymakers,as the goal of development has shift from economy to well-being.However,existing literature predominantly adopts the utilitarian approach,understanding well-being as people’s feelings about their lives and viewing the neighborhood environment as resources that benefit well-being.The Capability Approach,a novel approach that conceptualize well-being as the freedoms to do or to be and regard environment as conversion factors that influence well-being,can offer new lens by incorporating human development in-to these topics.This paper proposes an alternative theoretical framework:well-being is conceptualized and measured by capability;neighborhood environment affects well-being by providing spatial services,functioning as environmental conversion factors,and serving as social conversion factors.We conducted a case study of Changshu City located in eastern China,utilizing multiple resource data,applying explainable artificial intelligence(XAI),namely eXtreme Gradient Boosting(XGBoost)and SHapley Additive exPlana-tions(SHAP).Our findings highlight the significance of viewing the neighborhood environment as a set of conversion factors,as it provides more explanatory power than providing spatial services.Compared to conventional research based on linear relationship as-sumption,our results demonstrate that the effects of neighborhood environment on well-being are non-linear,characterized by threshold effects and interaction effects.These insights are crucial for informing urban planning and public policy.This research enriches our un-derstanding of well-being,neighborhood environment,and their relationship as well as provides empirical evidence for the core concept of conversion factors in the capability approach.展开更多
Diabetic Retinopathy(DR)is a critical disorder that affects the retina due to the constant rise in diabetics and remains the major cause of blindness across the world.Early detection and timely treatment are essential...Diabetic Retinopathy(DR)is a critical disorder that affects the retina due to the constant rise in diabetics and remains the major cause of blindness across the world.Early detection and timely treatment are essential to mitigate the effects of DR,such as retinal damage and vision impairment.Several conventional approaches have been proposed to detect DR early and accurately,but they are limited by data imbalance,interpretability,overfitting,convergence time,and other issues.To address these drawbacks and improve DR detection accurately,a distributed Explainable Convolutional Neural network-enabled Light Gradient Boosting Machine(DE-ExLNN)is proposed in this research.The model combines an explainable Convolutional Neural Network(CNN)and Light Gradient Boosting Machine(LightGBM),achieving highly accurate outcomes in DR detection.LightGBM serves as the detection model,and the inclusion of an explainable CNN addresses issues that conventional CNN classifiers could not resolve.A custom dataset was created for this research,containing both fundus and OCTA images collected from a realtime environment,providing more accurate results compared to standard conventional DR datasets.The custom dataset demonstrates notable accuracy,sensitivity,specificity,and Matthews Correlation Coefficient(MCC)scores,underscoring the effectiveness of this approach.Evaluations against other standard datasets achieved an accuracy of 93.94%,sensitivity of 93.90%,specificity of 93.99%,and MCC of 93.88%for fundus images.For OCTA images,the results obtained an accuracy of 95.30%,sensitivity of 95.50%,specificity of 95.09%,andMCC of 95%.Results prove that the combination of explainable CNN and LightGBMoutperforms othermethods.The inclusion of distributed learning enhances the model’s efficiency by reducing time consumption and complexity while facilitating feature extraction.展开更多
基金supported by the MSIT(Ministry of Science and ICT),Republic of Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)support program(IITP-2025-RS-2023-00259497)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)and was supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Republic of Korea government(MSIT)(No.IITP-2025-RS-2023-00254129+1 种基金Graduate School of Metaverse Convergence(Sungkyunkwan University))was supported by the Basic Science Research Program of the National Research Foundation(NRF)funded by the Republic of Korean government(MSIT)(No.RS-2024-00346737).
文摘Network attacks have become a critical issue in the internet security domain.Artificial intelligence technology-based detection methodologies have attracted attention;however,recent studies have struggled to adapt to changing attack patterns and complex network environments.In addition,it is difficult to explain the detection results logically using artificial intelligence.We propose a method for classifying network attacks using graph models to explain the detection results.First,we reconstruct the network packet data into a graphical structure.We then use a graph model to predict network attacks using edge classification.To explain the prediction results,we observed numerical changes by randomly masking and calculating the importance of neighbors,allowing us to extract significant subgraphs.Our experiments on six public datasets demonstrate superior performance with an average F1-score of 0.960 and accuracy of 0.964,outperforming traditional machine learning and other graph models.The visual representation of the extracted subgraphs highlights the neighboring nodes that have the greatest impact on the results,thus explaining detection.In conclusion,this study demonstrates that graph-based models are suitable for network attack detection in complex environments,and the importance of graph neighbors can be calculated to efficiently analyze the results.This approach can contribute to real-world network security analyses and provide a new direction in the field.
文摘Hepatocellular carcinoma(HCC)remains a leading cause of cancer-related mortality globally,necessitating advanced diagnostic tools to improve early detection and personalized targeted therapy.This review synthesizes evidence on explainable ensemble learning approaches for HCC classification,emphasizing their integration with clinical workflows and multi-omics data.A systematic analysis[including datasets such as The Cancer Genome Atlas,Gene Expression Omnibus,and the Surveillance,Epidemiology,and End Results(SEER)datasets]revealed that explainable ensemble learning models achieve high diagnostic accuracy by combining clinical features,serum biomarkers such as alpha-fetoprotein,imaging features such as computed tomography and magnetic resonance imaging,and genomic data.For instance,SHapley Additive exPlanations(SHAP)-based random forests trained on NCBI GSE14520 microarray data(n=445)achieved 96.53%accuracy,while stacking ensembles applied to the SEER program data(n=1897)demonstrated an area under the receiver operating characteristic curve of 0.779 for mortality prediction.Despite promising results,challenges persist,including the computational costs of SHAP and local interpretable model-agnostic explanations analyses(e.g.,TreeSHAP requiring distributed computing for metabolomics datasets)and dataset biases(e.g.,SEER’s Western population dominance limiting generalizability).Future research must address inter-cohort heterogeneity,standardize explainability metrics,and prioritize lightweight surrogate models for resource-limited settings.This review presents the potential of explainable ensemble learning frameworks to bridge the gap between predictive accuracy and clinical interpretability,though rigorous validation in independent,multi-center cohorts is critical for real-world deployment.
基金partially funded by the Programa Nacional de Becas y Crédito Educativo of Peru and the Universitat de València,Spain.
文摘Self-Explaining Autonomous Systems(SEAS)have emerged as a strategic frontier within Artificial Intelligence(AI),responding to growing demands for transparency and interpretability in autonomous decisionmaking.This study presents a comprehensive bibliometric analysis of SEAS research published between 2020 and February 2025,drawing upon 1380 documents indexed in Scopus.The analysis applies co-citation mapping,keyword co-occurrence,and author collaboration networks using VOSviewer,MASHA,and Python to examine scientific production,intellectual structure,and global collaboration patterns.The results indicate a sustained annual growth rate of 41.38%,with an h-index of 57 and an average of 21.97 citations per document.A normalized citation rate was computed to address temporal bias,enabling balanced evaluation across publication cohorts.Thematic analysis reveals four consolidated research fronts:interpretability in machine learning,explainability in deep neural networks,transparency in generative models,and optimization strategies in autonomous control.Author co-citation analysis identifies four distinct research communities,and keyword evolution shows growing interdisciplinary links with medicine,cybersecurity,and industrial automation.The United States leads in scientific output and citation impact at the geographical level,while countries like India and China show high productivity with varied influence.However,international collaboration remains limited at 7.39%,reflecting a fragmented research landscape.As discussed in this study,SEAS research is expanding rapidly yet remains epistemologically dispersed,with uneven integration of ethical and human-centered perspectives.This work offers a structured and data-driven perspective on SEAS development,highlights key contributors and thematic trends,and outlines critical directions for advancing responsible and transparent autonomous systems.
基金the National Natural Science Foundation of China(42377170,42407212)the National Funded Postdoctoral Researcher Program(GZB20230606)+3 种基金the Postdoctoral Research Foundation of China(2024M752679)the Sichuan Natural Science Foundation(2025ZNSFSC1205)the National Key R&D Program of China(2022YFC3005704)the Sichuan Province Science and Technology Support Program(2024NSFSC0100)。
文摘Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,yet the key conditioning factors(e.g.,topography,hydrology)remain insufficiently understood.This study proposes a novel artificial intelligence(AI)framework that integrates four machine learning(ML)models with Shapley Additive Explanations(SHAP)method,offering a hierarchical perspective from global to local on the dominant factors controlling gully distribution in wildfireaffected areas.In a case study of Xiangjiao catchment burned on March 28,2020,in Muli County in Sichuan Province of Southwest China,we derived 21 geoenvironmental factors to assess the susceptibility of post-fire gully erosion using logistic regression(LR),support vector machine(SVM),random forest(RF),and convolutional neural network(CNN)models.SHAP-based model interpretation revealed eight key conditioning factors:topographic position index(TPI),topographic wetness index(TWI),distance to stream,mean annual precipitation,differenced normalized burn ratio(d NBR),land use/cover,soil type,and distance to road.Comparative model evaluation demonstrated that reduced-variable models incorporating these dominant factors achieved accuracy comparable to that of the initial-variable models,with AUC values exceeding 0.868 across all ML algorithms.These findings provide critical insights into gully erosion behavior in wildfire-affected areas,supporting the decision-making process behind environmental management and hazard mitigation.
基金financial support from the National Key Research and Development Program of China(2021YFB 3501501)the National Natural Science Foundation of China(No.22225803,22038001,22108007 and 22278011)+1 种基金Beijing Natural Science Foundation(No.Z230023)Beijing Science and Technology Commission(No.Z211100004321001).
文摘The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.
文摘The increasing use of cloud-based devices has reached the critical point of cybersecurity and unwanted network traffic.Cloud environments pose significant challenges in maintaining privacy and security.Global approaches,such as IDS,have been developed to tackle these issues.However,most conventional Intrusion Detection System(IDS)models struggle with unseen cyberattacks and complex high-dimensional data.In fact,this paper introduces the idea of a novel distributed explainable and heterogeneous transformer-based intrusion detection system,named INTRUMER,which offers balanced accuracy,reliability,and security in cloud settings bymultiplemodulesworking together within it.The traffic captured from cloud devices is first passed to the TC&TM module in which the Falcon Optimization Algorithm optimizes the feature selection process,and Naie Bayes algorithm performs the classification of features.The selected features are classified further and are forwarded to the Heterogeneous Attention Transformer(HAT)module.In this module,the contextual interactions of the network traffic are taken into account to classify them as normal or malicious traffic.The classified results are further analyzed by the Explainable Prevention Module(XPM)to ensure trustworthiness by providing interpretable decisions.With the explanations fromthe classifier,emergency alarms are transmitted to nearby IDSmodules,servers,and underlying cloud devices for the enhancement of preventive measures.Extensive experiments on benchmark IDS datasets CICIDS 2017,Honeypots,and NSL-KDD were conducted to demonstrate the efficiency of the INTRUMER model in detecting network trafficwith high accuracy for different types.Theproposedmodel outperforms state-of-the-art approaches,obtaining better performance metrics:98.7%accuracy,97.5%precision,96.3%recall,and 97.8%F1-score.Such results validate the robustness and effectiveness of INTRUMER in securing diverse cloud environments against sophisticated cyber threats.
文摘Machine learning models are increasingly used to correct the vertical biases(mainly due to vegetation and buildings)in global Digital Elevation Models(DEMs),for downstream applications which need‘‘bare earth"elevations.The predictive accuracy of these models has improved significantly as more flexible model architectures are developed and new explanatory datasets produced,leading to the recent release of three model-corrected DEMs(FABDEM,DiluviumDEM and FathomDEM).However,there has been relatively little focus so far on explaining or interrogating these models,especially important in this context given their downstream impact on many other applications(including natural hazard simulations).In this study we train five separate models(by land cover environment)to correct vertical biases in the Copernicus DEM and then explain them using SHapley Additive exPlanation(SHAP)values.Comparing the models,we find significant variation in terms of the specific input variables selected and their relative importance,suggesting that an ensemble of models(specialising by land cover)is likely preferable to a general model applied everywhere.Visualising the patterns learned by the models(using SHAP dependence plots)provides further insights,building confidence in some cases(where patterns are consistent with domain knowledge and past studies)and highlighting potentially problematic variables in others(such as proxy relationships which may not apply in new application sites).Our results have implications for future DEM error prediction studies,particularly in evaluating a very wide range of potential input variables(160 candidates)drawn from topographic,multispectral,Synthetic Aperture Radar,vegetation,climate and urbanisation datasets.
基金support of the Deanship of Research and Graduate Studies at Ajman University under Projects 2024-IRG-ENiT-36 and 2024-IRG-ENIT-29.
文摘Incorporation of explainability features in the decision-making web-based systems is considered a primary concern to enhance accountability,transparency,and trust in the community.Multi-domain Sentiment Analysis is a significant web-based system where the explainability feature is essential for achieving user satisfaction.Conventional design methodologies such as object-oriented design methodology(OODM)have been proposed for web-based application development,which facilitates code reuse,quantification,and security at the design level.However,OODM did not provide the feature of explainability in web-based decision-making systems.X-OODM modifies the OODM with added explainable models to introduce the explainability feature for such systems.This research introduces an explainable model leveraging X-OODM for designing transparent applications for multidomain sentiment analysis.The proposed design is evaluated using the design quality metrics defined for the evaluation of the X-OODM explainable model under user context.The design quality metrics,transferability,simulatability,informativeness,and decomposability were introduced one after another over time to the evaluation of the X-OODM user context.Auxiliary metrics of accessibility and algorithmic transparency were added to increase the degree of explainability for the design.The study results reveal that introducing such explainability parameters with X-OODM appropriately increases system transparency,trustworthiness,and user understanding.The experimental results validate the enhancement of decision-making for multi-domain sentiment analysis with integration at the design level of explainability.Future work can be built in this direction by extending this work to apply the proposed X-OODM framework over different datasets and sentiment analysis applications to further scrutinize its effectiveness in real-world scenarios.
文摘In the field of precision healthcare,where accurate decision-making is paramount,this study underscores the indispensability of eXplainable Artificial Intelligence(XAI)in the context of epilepsy management within the Internet of Medical Things(IoMT).The methodology entails meticulous preprocessing,involving the application of a band-pass filter and epoch segmentation to optimize the quality of Electroencephalograph(EEG)data.The subsequent extraction of statistical features facilitates the differentiation between seizure and non-seizure patterns.The classification phase integrates Support Vector Machine(SVM),K-Nearest Neighbor(KNN),and Random Forest classifiers.Notably,SVM attains an accuracy of 97.26%,excelling in the precision,recall,specificity,and F1 score for identifying seizures and non-seizure instances.Conversely,KNN achieves an accuracy of 72.69%,accompanied by certain trade-offs.The Random Forest classifierstands out with a remarkable accuracy of 99.89%,coupled with an exceptional precision(99.73%),recall(100%),specificity(99.80%),and F1 score(99.86%),surpassing both SVM and KNN performances.XAI techniques,namely Local Interpretable ModelAgnostic Explanations(LIME)and SHapley Additive exPlanation(SHAP),enhance the system’s transparency.This combination of machine learning and XAI not only improves the reliability and accuracy of the seizure detection system but also enhances trust and interpretability.Healthcare professionals can leverage the identified important features and their dependencies to gain deeper insights into the decision-making process,aiding in informed diagnosis and treatment decisions for patients with epilepsy.
文摘Predictive maintenance plays a crucial role in preventing equipment failures and minimizing operational downtime in modern industries.However,traditional predictive maintenance methods often face challenges in adapting to diverse industrial environments and ensuring the transparency and fairness of their predictions.This paper presents a novel predictive maintenance framework that integrates deep learning and optimization techniques while addressing key ethical considerations,such as transparency,fairness,and explainability,in artificial intelligence driven decision-making.The framework employs an Autoencoder for feature reduction,a Convolutional Neural Network for pattern recognition,and a Long Short-Term Memory network for temporal analysis.To enhance transparency,the decision-making process of the framework is made interpretable,allowing stakeholders to understand and trust the model’s predictions.Additionally,Particle Swarm Optimization is used to refine hyperparameters for optimal performance and mitigate potential biases in the model.Experiments are conducted on multiple datasets from different industrial scenarios,with performance validated using accuracy,precision,recall,F1-score,and training time metrics.The results demonstrate an impressive accuracy of up to 99.92%and 99.45%across different datasets,highlighting the framework’s effectiveness in enhancing predictive maintenance strategies.Furthermore,the model’s explainability ensures that the decisions can be audited for fairness and accountability,aligning with ethical standards for critical systems.By addressing transparency and reducing potential biases,this framework contributes to the responsible and trustworthy deployment of artificial intelligence in industrial environments,particularly in safety-critical applications.The results underscore its potential for wide application across various industrial contexts,enhancing both performance and ethical decision-making.
基金funded by the Excellent Talent Training Funding Project in Dongcheng District,Beijing,with project number 2024-dchrcpyzz-9.
文摘Predicting the health status of stroke patients at different stages of the disease is a critical clinical task.The onset and development of stroke are affected by an array of factors,encompassing genetic predisposition,environmental exposure,unhealthy lifestyle habits,and existing medical conditions.Although existing machine learning-based methods for predicting stroke patients’health status have made significant progress,limitations remain in terms of prediction accuracy,model explainability,and system optimization.This paper proposes a multi-task learning approach based on Explainable Artificial Intelligence(XAI)for predicting the health status of stroke patients.First,we design a comprehensive multi-task learning framework that utilizes the task correlation of predicting various health status indicators in patients,enabling the parallel prediction of multiple health indicators.Second,we develop a multi-task Area Under Curve(AUC)optimization algorithm based on adaptive low-rank representation,which removes irrelevant information from the model structure to enhance the performance of multi-task AUC optimization.Additionally,the model’s explainability is analyzed through the stability analysis of SHAP values.Experimental results demonstrate that our approach outperforms comparison algorithms in key prognostic metrics F1 score and Efficiency.
文摘With the ongoing digitalization and intelligence of power systems,there is an increasing reliance on large-scale data-driven intelligent technologies for tasks such as scheduling optimization and load forecasting.Nevertheless,power data often contains sensitive information,making it a critical industry challenge to efficiently utilize this data while ensuring privacy.Traditional Federated Learning(FL)methods can mitigate data leakage by training models locally instead of transmitting raw data.Despite this,FL still has privacy concerns,especially gradient leakage,which might expose users’sensitive information.Therefore,integrating Differential Privacy(DP)techniques is essential for stronger privacy protection.Even so,the noise from DP may reduce the performance of federated learning models.To address this challenge,this paper presents an explainability-driven power data privacy federated learning framework.It incorporates DP technology and,based on model explainability,adaptively adjusts privacy budget allocation and model aggregation,thus balancing privacy protection and model performance.The key innovations of this paper are as follows:(1)We propose an explainability-driven power data privacy federated learning framework.(2)We detail a privacy budget allocation strategy:assigning budgets per training round by gradient effectiveness and at model granularity by layer importance.(3)We design a weighted aggregation strategy that considers the SHAP value and model accuracy for quality knowledge sharing.(4)Experiments show the proposed framework outperforms traditional methods in balancing privacy protection and model performance in power load forecasting tasks.
文摘Short Message Service(SMS)is a widely used and cost-effective communication medium that has unfortunately become a frequent target for unsolicited messages-commonly known as SMS spam.With the rapid adoption of smartphones and increased Internet connectivity,SMS spam has emerged as a prevalent threat.Spammers have recognized the critical role SMS plays in today’s modern communication,making it a prime target for abuse.As cybersecurity threats continue to evolve,the volume of SMS spam has increased substantially in recent years.Moreover,the unstructured format of SMS data creates significant challenges for SMS spam detection,making it more difficult to successfully combat spam attacks.In this paper,we present an optimized and fine-tuned transformer-based Language Model to address the problem of SMS spam detection.We use a benchmark SMS spam dataset to analyze this spam detection model.Additionally,we utilize pre-processing techniques to obtain clean and noise-free data and address class imbalance problem by leveraging text augmentation techniques.The overall experiment showed that our optimized fine-tuned BERT(Bidirectional Encoder Representations from Transformers)variant model RoBERTa obtained high accuracy with 99.84%.To further enhance model transparency,we incorporate Explainable Artificial Intelligence(XAI)techniques that compute positive and negative coefficient scores,offering insight into the model’s decision-making process.Additionally,we evaluate the performance of traditional machine learning models as a baseline for comparison.This comprehensive analysis demonstrates the significant impact language models can have on addressing complex text-based challenges within the cybersecurity landscape.
文摘Skin cancer is the most prevalent cancer globally,primarily due to extensive exposure to Ultraviolet(UV)radiation.Early identification of skin cancer enhances the likelihood of effective treatment,as delays may lead to severe tumor advancement.This study proposes a novel hybrid deep learning strategy to address the complex issue of skin cancer diagnosis,with an architecture that integrates a Vision Transformer,a bespoke convolutional neural network(CNN),and an Xception module.They were evaluated using two benchmark datasets,HAM10000 and Skin Cancer ISIC.On the HAM10000,the model achieves a precision of 95.46%,an accuracy of 96.74%,a recall of 96.27%,specificity of 96.00%and an F1-Score of 95.86%.It obtains an accuracy of 93.19%,a precision of 93.25%,a recall of 92.80%,a specificity of 92.89%and an F1-Score of 93.19%on the Skin Cancer ISIC dataset.The findings demonstrate that the model that was proposed is robust and trustworthy when it comes to the classification of skin lesions.In addition,the utilization of Explainable AI techniques,such as Grad-CAM visualizations,assists in highlighting the most significant lesion areas that have an impact on the decisions that are made by the model.
文摘In the evolving landscape of cyber threats,phishing attacks pose significant challenges,particularly through deceptive webpages designed to extract sensitive information under the guise of legitimacy.Conventional and machine learning(ML)-based detection systems struggle to detect phishing websites owing to their constantly changing tactics.Furthermore,newer phishing websites exhibit subtle and expertly concealed indicators that are not readily detectable.Hence,effective detection depends on identifying the most critical features.Traditional feature selection(FS)methods often struggle to enhance ML model performance and instead decrease it.To combat these issues,we propose an innovative method using explainable AI(XAI)to enhance FS in ML models and improve the identification of phishing websites.Specifically,we employ SHapley Additive exPlanations(SHAP)for global perspective and aggregated local interpretable model-agnostic explanations(LIME)to deter-mine specific localized patterns.The proposed SHAP and LIME-aggregated FS(SLA-FS)framework pinpoints the most informative features,enabling more precise,swift,and adaptable phishing detection.Applying this approach to an up-to-date web phishing dataset,we evaluate the performance of three ML models before and after FS to assess their effectiveness.Our findings reveal that random forest(RF),with an accuracy of 97.41%and XGBoost(XGB)at 97.21%significantly benefit from the SLA-FS framework,while k-nearest neighbors lags.Our framework increases the accuracy of RF and XGB by 0.65%and 0.41%,respectively,outperforming traditional filter or wrapper methods and any prior methods evaluated on this dataset,showcasing its potential.
基金Supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region,No.2022D01D17State Key Laboratory of Pathogenesis,Prevention and Treatment of High Incidence Diseases in Central Asia,No.SKL-HIDCA-2024-2.
文摘BACKGROUND Echinococcosis,caused by Echinococcus parasites,includes alveolar echinococcosis(AE),the most lethal form,primarily affecting the liver with a 90%mortality rate without prompt treatment.While radical surgery combined with antiparasitic therapy is ideal,many patients present late,missing hepatectomy opportunities.Ex vivo liver resection and autotransplantation(ELRA)offers hope for such patients.Traditional surgical decision-making,relying on clinical experience,is prone to bias.Machine learning can enhance decision-making by identifying key factors influencing surgical choices.This study innovatively employs multiple machine learning methods by integrating various feature selection techniques and SHapley Additive exPlanations(SHAP)interpretive analysis to deeply explore the key decision factors influencing surgical strategies.AIM To determine the key preoperative factors influencing surgical decision-making in hepatic AE(HAE)using machine learning.METHODS This was a retrospective cohort study at the First Affiliated Hospital of Xinjiang Medical University(July 2010 to August 2024).There were 710 HAE patients(545 hepatectomy and 165 ELRA)with complete clinical data.Data included demographics,laboratory indicators,imaging,and pathology.Feature selection was performed using recursive feature elimination,minimum redundancy maximum relevance,and least absolute shrinkage and selection operator regression,with the intersection of these methods yielding 10 critical features.Eleven machinelearning algorithms were compared,with eXtreme Gradient Boosting(XGBoost)optimized using Bayesian optimization.Model interpretability was assessed using SHAP analysis.RESULTS The XGBoost model achieved an area under the curve of 0.935 in the training set and 0.734 in the validation set.The optimal threshold(0.28)yielded sensitivity of 93.6%and specificity of 90.9%.SHAP analysis identified type of vascular invasion as the most important feature,followed by platelet count and prothrombin time.Lesions invading the hepatic vein,inferior vena cava,or multiple vessels significantly increased the likelihood of ELRA.Calibration curves showed good agreement between predicted and observed probabilities(0.2-0.7 range).The model demonstrated high net clinical benefit in Decision Curve Analysis,with accuracy of 0.837,recall of 0.745,and F1 score of 0.788.CONCLUSION Vascular invasion is the dominant factor influencing the choice of surgical approach in HAE.Machine-learning models,particularly XGBoost,can provide transparent and data-driven support for personalized decision-making.
文摘Retaining walls are utilized to support the earth and prevent the soil from spreading with natural slope angles where there are differences in the elevation of ground surfaces.As the need for retaining structures increases,the use of retaining walls is increasing.The retaining walls,which increase the stability of levels,are economical and meet existing adverse conditions.A considerable amount of retaining walls is made from steel-reinforced concrete.The construction of reinforced concrete retaining walls can be costly due to its components.For this reason,the optimum cost should be targeted in the design of retaining walls.This study presents an artificial neural network(ANN)model developed to predict the optimum dimensions of a retaining wall using soil properties,material properties,and external loading conditions.The dataset utilized to train the ANN model is generated with the Flower Pollination Algorithm.The target variables in the dataset are the length of the heel(y1),length of the toe(y2),thickness of the stem(top)(y3),thickness of the stem(bottom)(y4),foundation base thickness(y5)and cost(y6)and these are estimated by utilizing an ANN model based on the height of the wall(x1),material unit weight(x2),wall friction angle(x3),surcharge load(x4),concrete cost per m3(x5),steel cost per ton(x6)and the soil class(x7).The model is formulated and trained as a multi-output regression model,as all outputs are numeric and continuous.The training and evaluation of the model results in a high prediction performance(R20.99).In addition,the impacts of different input features on the model>predictions are revealed using the SHapley Additive exPlanations(SHAP)algorithm.The study demonstrates that when trained with a large dataset,ANN models perform very well by predicting the optimal cost with high performance.
基金support from China Scholarship Council(CSC)(202406440073).
文摘Accurate prediction of pure component physiochemical properties is crucial for process integration, multiscale modelling, and optimization. In this work, an enhanced framework for pure component property prediction by using explainable machine learning methods is proposed. In this framework, the molecular representation method based on the connectivity matrix effectively considers atomic bonding relationships to automatically generate features. The supervised machine learning model random forest is applied for feature ranking and pooling. The adjusted R^(2) is introduced to penalize the inclusion of additional features, providing an assessment of the true contribution of features. The prediction results for normal boiling point (T_(b)), liquid molar volume (L_(mv)), critical temperature (T_(c)) and critical pressure (P_(c)) obtained using Artificial Neural Network and Gaussian Process Regression models confirm the accuracy of the molecular representation method. Comparison with GC based models shows that the root-mean-square error on the test set can be reduced by up to 83.8%. To enhance the interpretability of the model, a feature analysis method based on Shapley values is employed to determine the contribution of each feature to the property predictions. The results indicate that using the feature pooling method reduces the number of features from 13316 to 100 without compromising model accuracy. The feature analysis results for Tb, Lmv, Tc, and Pc confirms that different molecular properties are influenced by different structural features, aligning with mechanistic interpretations. In conclusion, the proposed framework is demonstrated to be feasible and provides a solid foundation for mixture component reconstruction and process integration modelling.
基金Under the auspices of National Natural Science Foundation of China(No.42271230,42330510)。
文摘The relationship between the neighborhood environment and well-being is attracting increasingly attention from researchers and policymakers,as the goal of development has shift from economy to well-being.However,existing literature predominantly adopts the utilitarian approach,understanding well-being as people’s feelings about their lives and viewing the neighborhood environment as resources that benefit well-being.The Capability Approach,a novel approach that conceptualize well-being as the freedoms to do or to be and regard environment as conversion factors that influence well-being,can offer new lens by incorporating human development in-to these topics.This paper proposes an alternative theoretical framework:well-being is conceptualized and measured by capability;neighborhood environment affects well-being by providing spatial services,functioning as environmental conversion factors,and serving as social conversion factors.We conducted a case study of Changshu City located in eastern China,utilizing multiple resource data,applying explainable artificial intelligence(XAI),namely eXtreme Gradient Boosting(XGBoost)and SHapley Additive exPlana-tions(SHAP).Our findings highlight the significance of viewing the neighborhood environment as a set of conversion factors,as it provides more explanatory power than providing spatial services.Compared to conventional research based on linear relationship as-sumption,our results demonstrate that the effects of neighborhood environment on well-being are non-linear,characterized by threshold effects and interaction effects.These insights are crucial for informing urban planning and public policy.This research enriches our un-derstanding of well-being,neighborhood environment,and their relationship as well as provides empirical evidence for the core concept of conversion factors in the capability approach.
基金funded by the Centre for Advanced Modelling and Geospatial Information Systems(CAMGIS),Faculty of Engineering and IT,University of Technology Sydneysupported by the Research Funding Program,King Saud University,Riyadh,Saudi Arabia,under Project Ongoing Research Funding program(ORF-2025-14).
文摘Diabetic Retinopathy(DR)is a critical disorder that affects the retina due to the constant rise in diabetics and remains the major cause of blindness across the world.Early detection and timely treatment are essential to mitigate the effects of DR,such as retinal damage and vision impairment.Several conventional approaches have been proposed to detect DR early and accurately,but they are limited by data imbalance,interpretability,overfitting,convergence time,and other issues.To address these drawbacks and improve DR detection accurately,a distributed Explainable Convolutional Neural network-enabled Light Gradient Boosting Machine(DE-ExLNN)is proposed in this research.The model combines an explainable Convolutional Neural Network(CNN)and Light Gradient Boosting Machine(LightGBM),achieving highly accurate outcomes in DR detection.LightGBM serves as the detection model,and the inclusion of an explainable CNN addresses issues that conventional CNN classifiers could not resolve.A custom dataset was created for this research,containing both fundus and OCTA images collected from a realtime environment,providing more accurate results compared to standard conventional DR datasets.The custom dataset demonstrates notable accuracy,sensitivity,specificity,and Matthews Correlation Coefficient(MCC)scores,underscoring the effectiveness of this approach.Evaluations against other standard datasets achieved an accuracy of 93.94%,sensitivity of 93.90%,specificity of 93.99%,and MCC of 93.88%for fundus images.For OCTA images,the results obtained an accuracy of 95.30%,sensitivity of 95.50%,specificity of 95.09%,andMCC of 95%.Results prove that the combination of explainable CNN and LightGBMoutperforms othermethods.The inclusion of distributed learning enhances the model’s efficiency by reducing time consumption and complexity while facilitating feature extraction.