The amount of explained variation R2 is an overall measure used to quantify the information in a model and especially how useful the model might be when predicting future observations, explained variation is useful in...The amount of explained variation R2 is an overall measure used to quantify the information in a model and especially how useful the model might be when predicting future observations, explained variation is useful in guiding model choice for all types of predictive regression models, including linear and generalized linear models and survival analysis. In this work we consider how individual observations in a data set can influence the value of various R2 measures proposed for survival analysis including local influence to assess mathematically the effect of small changes. We discuss methodologies for assessing influence on Graf et al.'s R2G measure, Harrell's C-index and Nagelkerke's R2N. The ideas are illustrated on data on 1391 patients diagnosed with Diffuse Large B-cell Lymphoma (DLBCL), a major subtype ofNon-Hodgkin's Lymphoma (NHL).展开更多
E. Stone in the article “18 Mysteries and Unanswered Questions About Our Solar System. Little Astronomy” wrote: One of the great things about astronomy is that there is still so much out there for us to discover. Th...E. Stone in the article “18 Mysteries and Unanswered Questions About Our Solar System. Little Astronomy” wrote: One of the great things about astronomy is that there is still so much out there for us to discover. There are so many unanswered questions and mysteries about the universe. There is always a puzzle to solve and that is part of beauty. Even in our own neighborhood, the Solar System, there are many questions we still have not been able to answer [1]. In the present paper, we explain the majority of these Mysteries and some other unexplained phenomena in the Solar System (SS) in frames of the developed Hypersphere World-Universe Model (WUM) [2].展开更多
The integration of machine learning(ML)into geohazard assessment has successfully instigated a paradigm shift,leading to the production of models that possess a level of predictive accuracy previously considered unatt...The integration of machine learning(ML)into geohazard assessment has successfully instigated a paradigm shift,leading to the production of models that possess a level of predictive accuracy previously considered unattainable.However,the black-box nature of these systems presents a significant barrier,hindering their operational adoption,regulatory approval,and full scientific validation.This paper provides a systematic review and synthesis of the emerging field of explainable artificial intelligence(XAI)as applied to geohazard science(GeoXAI),a domain that aims to resolve the long-standing trade-off between model performance and interpretability.A rigorous synthesis of 87 foundational studies is used to map the intellectual and methodological contours of this rapidly expanding field.The analysis reveals that current research efforts are concentrated predominantly on landslide and flood assessment.Methodologically,tree-based ensembles and deep learning models dominate the literature,with SHapley Additive exPlanations(SHAP)frequently adopted as the principal post-hoc explanation technique.More importantly,the review further documents how the role of XAI has shifted:rather than being used solely as a tool for interpreting models after training,it is increasingly integrated into the modeling cycle itself.Recent applications include its use in feature selection,adaptive sampling strategies,and model evaluation.The evidence also shows that GeoXAI extends beyond producing feature rankings.It reveals nonlinear thresholds and interaction effects that generate deeper mechanistic insights into hazard processes and mechanisms.Nevertheless,several key challenges remain unresolved within the field.These persistent issues are especially pronounced when considering the crucial necessity for interpretation stability,the demanding scholarly task of reliably distinguishing correlation from causation,and the development of appropriate methods for the treatment of complex spatio-temporal dynamics.展开更多
Network attacks have become a critical issue in the internet security domain.Artificial intelligence technology-based detection methodologies have attracted attention;however,recent studies have struggled to adapt to ...Network attacks have become a critical issue in the internet security domain.Artificial intelligence technology-based detection methodologies have attracted attention;however,recent studies have struggled to adapt to changing attack patterns and complex network environments.In addition,it is difficult to explain the detection results logically using artificial intelligence.We propose a method for classifying network attacks using graph models to explain the detection results.First,we reconstruct the network packet data into a graphical structure.We then use a graph model to predict network attacks using edge classification.To explain the prediction results,we observed numerical changes by randomly masking and calculating the importance of neighbors,allowing us to extract significant subgraphs.Our experiments on six public datasets demonstrate superior performance with an average F1-score of 0.960 and accuracy of 0.964,outperforming traditional machine learning and other graph models.The visual representation of the extracted subgraphs highlights the neighboring nodes that have the greatest impact on the results,thus explaining detection.In conclusion,this study demonstrates that graph-based models are suitable for network attack detection in complex environments,and the importance of graph neighbors can be calculated to efficiently analyze the results.This approach can contribute to real-world network security analyses and provide a new direction in the field.展开更多
Most Convolutional Neural Network(CNN)interpretation techniques visualize only the dominant cues that the model relies on,but there is no guarantee that these represent all the evidence the model uses for classificati...Most Convolutional Neural Network(CNN)interpretation techniques visualize only the dominant cues that the model relies on,but there is no guarantee that these represent all the evidence the model uses for classification.This limitation becomes critical when hidden secondary cues—potentially more meaningful than the visualized ones—remain undiscovered.This study introduces CasCAM(Cascaded Class Activation Mapping)to address this fundamental limitation through counterfactual reasoning.By asking“if this dominant cue were absent,what other evidence would the model use?”,CasCAM progressively masks the most salient features and systematically uncovers the hierarchy of classification evidence hidden beneath them.Experimental results demonstrate that CasCAM effectively discovers the full spectrum of reasoning evidence and can be universally applied with nine existing interpretation methods.展开更多
Objective:Deep learning is employed increasingly in Gastroenterology(GI)endoscopy computer-aided diagnostics for polyp segmentation and multi-class disease detection.In the real world,implementation requires high accu...Objective:Deep learning is employed increasingly in Gastroenterology(GI)endoscopy computer-aided diagnostics for polyp segmentation and multi-class disease detection.In the real world,implementation requires high accuracy,therapeutically relevant explanations,strong calibration,domain generalization,and efficiency.Current Convolutional Neural Network(CNN)and transformer models compromise border precision and global context,generate attention maps that fail to align with expert reasoning,deteriorate during cross-center changes,and exhibit inadequate calibration,hence diminishing clinical trust.Methods:HMA-DER is a hierarchical multi-attention architecture that uses dilation-enhanced residual blocks and an explainability-aware Cognitive Alignment Score(CAS)regularizer to directly align attribution maps with reasoning signals from experts.The framework has additions that make it more resilient and a way to test for accuracy,macro-averaged F1 score,Area Under the Receiver Operating Characteristic Curve(AUROC),calibration(Expected Calibration Error(ECE),Brier Score),explainability(CAS,insertion/deletion AUC),cross-dataset transfer,and throughput.Results:HMA-DER gets Dice Similarity Coefficient scores of 89.5%and 86.0%on Kvasir-SEG and CVC-ClinicDB,beating the strongest baseline by+1.9 and+1.7 points.It gets 86.4%and 85.3%macro-F1 and 94.0%and 93.4%AUROC on HyperKvasir and GastroVision,which is better than the baseline by+1.4/+1.6macro-F1 and+1.2/+1.1AUROC.Ablation study shows that hierarchical attention gives the highest(+3.0),followed by CAS regularization(+2–3),dilatation(+1.5–2.0),and residual connections(+2–3).Cross-dataset validation demonstrates competitive zero-shot transfer(e.g.,KS→CVC Dice 82.7%),whereas multi-dataset training diminishes the domain gap,yielding an 88.1%primary-metric average.HMA-DER’s mixed-precision inference can handle 155 pictures per second,which helps with calibration.Conclusion:HMA-DER strikes a compromise between accuracy,explainability,robustness,and efficiency for the use of reliable GI computer-aided diagnosis in real-world clinical settings.展开更多
The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threa...The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threats is both necessary and complex,yet these interconnected healthcare settings generate enormous amounts of heterogeneous data.Traditional Intrusion Detection Systems(IDS),which are generally centralized and machine learning-based,often fail to address the rapidly changing nature of cyberattacks and are challenged by ethical concerns related to patient data privacy.Moreover,traditional AI-driven IDS usually face challenges in handling large-scale,heterogeneous healthcare data while ensuring data privacy and operational efficiency.To address these issues,emerging technologies such as Big Data Analytics(BDA)and Federated Learning(FL)provide a hybrid framework for scalable,adaptive intrusion detection in IoT-driven healthcare systems.Big data techniques enable processing large-scale,highdimensional healthcare data,and FL can be used to train a model in a decentralized manner without transferring raw data,thereby maintaining privacy between institutions.This research proposes a privacy-preserving Federated Learning–based model that efficiently detects cyber threats in connected healthcare systems while ensuring distributed big data processing,privacy,and compliance with ethical regulations.To strengthen the reliability of the reported findings,the resultswere validated using cross-dataset testing and 95%confidence intervals derived frombootstrap analysis,confirming consistent performance across heterogeneous healthcare data distributions.This solution takes a significant step toward securing next-generation healthcare infrastructure by combining scalability,privacy,adaptability,and earlydetection capabilities.The proposed global model achieves a test accuracy of 99.93%±0.03(95%CI)and amiss-rate of only 0.07%±0.02,representing state-of-the-art performance in privacy-preserving intrusion detection.The proposed FL-driven IDS framework offers an efficient,privacy-preserving,and scalable solution for securing next-generation healthcare infrastructures by combining adaptability,early detection,and ethical data management.展开更多
Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an in...Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an interpretable machine learning approach to UCS prediction is presented,pairing five models(Random Forest(RF),Gradient Boosting(GB),Extreme Gradient Boosting(XGB),CatBoost,and K-Nearest Neighbors(KNN))with SHapley Additive exPlanations(SHAP)for enhanced interpretability and to guide feature removal.A complete dataset of 12 geotechnical and chemical parameters,i.e.,Atterberg limits,compaction properties,stabilizer chemistry,dosage,curing time,was used to train and test the models.R2,RMSE,MSE,and MAE were used to assess performance.Initial results with all 12 features indicated that boosting-based models(GB,XGB,CatBoost)exhibited the highest predictive accuracy(R^(2)=0.93)with satisfactory generalization on test data,followed by RF and KNN.SHAP analysis consistently picked CaO content,curing time,stabilizer dosage,and compaction parameters as the most important features,aligning with established soil stabilization mechanisms.Models were then re-trained on the top 8 and top 5 SHAP-ranked features.Interestingly,GB,XGB,and CatBoost maintained comparable accuracy with reduced input sets,while RF was moderately sensitive and KNN was somewhat better owing to reduced dimensionality.The findings confirm that feature reduction through SHAP enables cost-effective UCS prediction through the reduction of laboratory test requirements without significant accuracy loss.The suggested hybrid approach offers an explainable,interpretable,and cost-effective tool for geotechnical engineering practice.展开更多
The biological stabilization of soil using microbially induced carbonate precipitation(MICP)employs ureolytic bacteria to precipitate calcium carbonate(CaCO3),which binds soil particles,enhancing strength,stiffness,an...The biological stabilization of soil using microbially induced carbonate precipitation(MICP)employs ureolytic bacteria to precipitate calcium carbonate(CaCO3),which binds soil particles,enhancing strength,stiffness,and erosion resistance.The unconfinedcompressive strength(UCS),a key measure of soil strength,is critical in geotechnical engineering as it directly reflectsthe mechanical stability of treated soils.This study integrates explainable artificialintelligence(XAI)with geotechnical insights to model the UCS of MICP-treated sands.Using 517 experimental data points and a combination of various input variables—including median grain size(D50),coefficientof uniformity(Cu),void ratio(e),urea concentration(Mu),calcium concentration(Mc),optical density(OD)of bacterial solution,pH,and total injection volume(Vt)—fivemachine learning(ML)models,including eXtreme gradient boosting(XGBoost),Light gradient boosting machine(LightGBM),random forest(RF),gene expression programming(GEP),and multivariate adaptive regression splines(MARS),were developed and optimized.The ensemble models(XGBoost,LightGBM,and RF)were optimized using the Chernobyl disaster optimizer(CDO),a recently developed metaheuristic algorithm.Of these,LightGBM-CDO achieved the highest accuracy for UCS prediction.XAI techniques like feature importance analysis(FIA),SHapley additive exPlanations(SHAP),and partial dependence plots(PDPs)were also used to investigate the complex non-linear relationships between the input and output variables.The results obtained have demonstrated that the XAI-driven models can enhance the predictive accuracy and interpretability of MICP processes,offering a sustainable pathway for optimizing geotechnical applications.展开更多
Although machine learning models have achieved high enough accuracy in predicting shield position deviations,their“black box”nature makes the prediction mechanisms and decision-making processes opaque,leading to wea...Although machine learning models have achieved high enough accuracy in predicting shield position deviations,their“black box”nature makes the prediction mechanisms and decision-making processes opaque,leading to weaker explanations and practicability.This study introduces a novel explainable deep learning framework comprising the Informer model with enhanced attention mechanisms(EAMInfor)and deep learning important features(DeepLIFT),aimed at improving the prediction accuracy of shield position deviations and providing interpretability for predictive results.The EAMInfor model attempts to integrate channel attention,spatial attention,and simple attention modules to improve the Informer model's performance.The framework is tested with the four different geological conditions datasets generated from the Xiamen metro line 3,China.Results show that the EAMInfor model outperforms the traditional Informer and comparison models.The analysis with the DeepLIFT method indicates that the push thrust of push cylinder and the earth chamber pressure are the most significant features,while the stroke length of the push cylinder demonstrated lower importance.Furthermore,the variation trends in the significance of data points within input sequences exhibit substantial differences between single and composite strata.This framework not only improves predictive accuracy but also strengthens the credibility and reliability of the results.展开更多
In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although mu...In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although multiple instance learning(MIL)has dominated the WSVAD for a long time,its reliance solely on video-level labels without semantic grounding hinders a fine-grained understanding of visually similar yet semantically distinct events.In addition,insufficient temporal modeling obscures causal relationships between events,making anomaly decisions reactive rather than reasoning-based.To overcome the limitations above,this paper proposes an adaptive knowledgebased guidance method that integrates external structured knowledge.The approach combines hierarchical category information with learnable prompt vectors.It then constructs continuously updated contextual references within the feature space,enabling fine-grained meaning-based guidance over video content.Building on this,the work introduces an event relation analysis module.This module explicitly models temporal dependencies and causal correlations between video snippets.It constructs an evolving logic chain of anomalous events,revealing the process by which isolated anomalous snippets develop into a complete event.Experiments on multiple benchmark datasets show that the proposed method achieves highly competitive performance,achieving an AUC of 88.19%on UCF-Crime and an AP of 86.49%on XD-Violence.More importantly,the method provides temporal and causal explanations derived from event relationships alongside its detection results.This capability significantly advances WSVAD from a simple binary classification to a new level of interpretable behavior analysis.展开更多
Predicting the behavior of renewable energy systems requires models capable of generating accurate forecasts from limited historical data,a challenge that becomes especially pronounced when commissioning new facil-iti...Predicting the behavior of renewable energy systems requires models capable of generating accurate forecasts from limited historical data,a challenge that becomes especially pronounced when commissioning new facil-ities where operational records are scarce.This review aims to synthesize recent progress in data-efficient deep learning approaches for addressing such“cold-start”forecasting problems.It primarily covers three interrelated domains—solar photovoltaic(PV),wind power,and electrical load forecasting—where data scarcity and operational variability are most critical,while also including representative studies on hydropower and carbon emission prediction to provide a broader systems perspective.To this end,we examined trends from over 150 predominantly peer-reviewed studies published between 2019 and mid-2025,highlighting advances in zero-shot and few-shot meta-learning frameworks that enable rapid model adaptation with minimal labeled data.Moreover,transfer learning approaches combined with spatiotemporal graph neural networks have been employed to transfer knowledge from existing energy assets to new,data-sparse environments,effectively capturing hidden dependencies among geographic features,meteorological dynamics,and grid structures.Synthetic data generation has further proven valuable for expanding training samples and mitigating overfitting in cold-start scenarios.In addition,large language models and explainable artificial intelligence(XAI)—notably conversational XAI systems—have been used to interpret and communicate complex model behaviors in accessible terms,fostering operator trust from the earliest deployment stages.By consolidating methodological advances,unresolved challenges,and open-source resources,this review provides a coherent overview of deep learning strategies that can shorten the data-sparse ramp-up period of new energy infrastructures and accelerate the transition toward resilient,low-carbon electricity grids.展开更多
Automated detection of Motor Imagery(MI)tasks is extremely useful for prosthetic arms and legs of stroke patients for their rehabilitation.Prediction of MI tasks can be performed with the help of Electroencephalogram(...Automated detection of Motor Imagery(MI)tasks is extremely useful for prosthetic arms and legs of stroke patients for their rehabilitation.Prediction of MI tasks can be performed with the help of Electroencephalogram(EEG)signals recorded by placing electrodes on the scalp of subjects;however,accurate prediction of MI tasks remains a challenge due to noise that is incurred during the EEG signal recording process,the extraction of a feature vector with high interclass variance,and accurate classification.The proposed method consists of preprocessing,feature extraction,and classification.First,EEG signals are denoised using a bandpass filter followed by Independent Component Analysis(ICA).Multiple channels are combined to form a single surrogate channel.Short Time Fourier Transform(STFT)is then applied to convert time domain EEG signals into the frequency domain.Handcrafted and automated features are extracted from EEG signals and then concatenated to form a single feature vector.We propose a customized two-dimensional Convolutional Neural Network(CNN)for automated feature extraction with high interclass variance.Feature selection is performed using Particle Swarm Optimization(PSO)to obtain optimal features.The final feature vector is passed to three different classifiers:Support Vector Machine(SVM),Random Forest(RF),and Long Short-Term Memory(LSTM).The final decision is made using the Model-Agnostic Meta Learning(MAML).The Proposed method has been tested on two datasets,including PhysioNet and BCI Competition IV-2a,and it achieved better results in terms of accuracy and F1 score than existing state-of-the-art methods.The proposed framework achieved an accuracy and F1 score of 96%on the PhysioNet dataset and 95.5%on the BCI Competition IV,dataset 2a.We also present SHapley Additive exPlanations(SHAP)and Gradient-weighted Class Activation Mapping(Grad-CAM)explainable techniques to enhance model interpretability in a clinical setting.展开更多
Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learni...Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learning 2-based approach for detecting Parkinson’s disease before any of the overt symptoms develop during their prodromal stage.We used 5 publicly accessible datasets,including UCI Parkinson’s Voice,Spiral Drawings,PaHaW,NewHandPD,and PPMI,and implemented a dual stream CNN–BiLSTM architecture with Fisher-weighted feature merging and SHAP-based explanation.The findings reveal that the model’s performance was superior and achieved 98.2%,a F1-score of 0.981,and AUC of 0.991 on the UCI Voice dataset.The model’s performance on the remaining datasets was also comparable,with up to a 2–7 percent betterment in accuracy compared to existing strong models such as CNN–RNN–MLP,ILN–GNet,and CASENet.Across the evidence,the findings back the diagnostic promise of micro-tremor assessment and demonstrate that combining temporal and spatial features with a scatter-based segment for a multi-modal approach can be an effective and scalable platform for an“early,”interpretable PD screening system.展开更多
Artificial Intelligence(AI)is changing healthcare by helping with diagnosis.However,for doctors to trust AI tools,they need to be both accurate and easy to understand.In this study,we created a new machine learning sy...Artificial Intelligence(AI)is changing healthcare by helping with diagnosis.However,for doctors to trust AI tools,they need to be both accurate and easy to understand.In this study,we created a new machine learning system for the early detection of Autism Spectrum Disorder(ASD)in children.Our main goal was to build a model that is not only good at predicting ASD but also clear in its reasoning.For this,we combined several different models,including Random Forest,XGBoost,and Neural Networks,into a single,more powerful framework.We used two different types of datasets:(i)a standard behavioral dataset and(ii)a more complex multimodal dataset with images,audio,and physiological information.The datasets were carefully preprocessed for missing values,redundant features,and dataset imbalance to ensure fair learning.The results outperformed the state-of-the-art with a Regularized Neural Network,achieving 97.6%accuracy on behavioral data.Whereas,on the multimodal data,the accuracy is 98.2%.Other models also did well with accuracies consistently above 96%.We also used SHAP and LIME on a behavioral dataset for models’explainability.展开更多
阅读理解One large dinosaur hid in the thick jungle(热带雨林).With small,hungry eyes he watched a larger dinosaur,about 90 feet long!It was eating grass by a lake.The one in the jungle stood 20 feet tall on his powerfu...阅读理解One large dinosaur hid in the thick jungle(热带雨林).With small,hungry eyes he watched a larger dinosaur,about 90 feet long!It was eating grass by a lake.The one in the jungle stood 20 feet tall on his powerful back legs.His two front legs were short,with sharp claws on the feet.His teeth were like long knives.展开更多
Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,y...Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,yet the key conditioning factors(e.g.,topography,hydrology)remain insufficiently understood.This study proposes a novel artificial intelligence(AI)framework that integrates four machine learning(ML)models with Shapley Additive Explanations(SHAP)method,offering a hierarchical perspective from global to local on the dominant factors controlling gully distribution in wildfireaffected areas.In a case study of Xiangjiao catchment burned on March 28,2020,in Muli County in Sichuan Province of Southwest China,we derived 21 geoenvironmental factors to assess the susceptibility of post-fire gully erosion using logistic regression(LR),support vector machine(SVM),random forest(RF),and convolutional neural network(CNN)models.SHAP-based model interpretation revealed eight key conditioning factors:topographic position index(TPI),topographic wetness index(TWI),distance to stream,mean annual precipitation,differenced normalized burn ratio(d NBR),land use/cover,soil type,and distance to road.Comparative model evaluation demonstrated that reduced-variable models incorporating these dominant factors achieved accuracy comparable to that of the initial-variable models,with AUC values exceeding 0.868 across all ML algorithms.These findings provide critical insights into gully erosion behavior in wildfire-affected areas,supporting the decision-making process behind environmental management and hazard mitigation.展开更多
Deep learning(DL)has become a crucial technique for predicting the El Niño-Southern Oscillation(ENSO)and evaluating its predictability.While various DL-based models have been developed for ENSO predictions,many f...Deep learning(DL)has become a crucial technique for predicting the El Niño-Southern Oscillation(ENSO)and evaluating its predictability.While various DL-based models have been developed for ENSO predictions,many fail to capture the coherent multivariate evolution within the coupled ocean-atmosphere system of the tropical Pacific.To address this three-dimensional(3D)limitation and represent ENSO-related ocean-atmosphere interactions more accurately,a novel this 3D multivariate prediction model was proposed based on a Transformer architecture,which incorporates a spatiotemporal self-attention mechanism.This model,named 3D-Geoformer,offers several advantages,enabling accurate ENSO predictions up to one and a half years in advance.Furthermore,an integrated gradient method was introduced into the model to identify the sources of predictability for sea surface temperature(SST)variability in the eastern equatorial Pacific.Results reveal that the 3D-Geoformer effectively captures ENSO-related precursors during the evolution of ENSO events,particularly the thermocline feedback processes and ocean temperature anomaly pathways on and off the equator.By extending DL-based ENSO predictions from one-dimensional Niño time series to 3D multivariate fields,the 3D-Geoformer represents a significant advancement in ENSO prediction.This study provides details in the model formulation,analysis procedures,sensitivity experiments,and illustrative examples,offering practical guidance for the application of the model in ENSO research.展开更多
The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF str...The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.展开更多
文摘The amount of explained variation R2 is an overall measure used to quantify the information in a model and especially how useful the model might be when predicting future observations, explained variation is useful in guiding model choice for all types of predictive regression models, including linear and generalized linear models and survival analysis. In this work we consider how individual observations in a data set can influence the value of various R2 measures proposed for survival analysis including local influence to assess mathematically the effect of small changes. We discuss methodologies for assessing influence on Graf et al.'s R2G measure, Harrell's C-index and Nagelkerke's R2N. The ideas are illustrated on data on 1391 patients diagnosed with Diffuse Large B-cell Lymphoma (DLBCL), a major subtype ofNon-Hodgkin's Lymphoma (NHL).
文摘E. Stone in the article “18 Mysteries and Unanswered Questions About Our Solar System. Little Astronomy” wrote: One of the great things about astronomy is that there is still so much out there for us to discover. There are so many unanswered questions and mysteries about the universe. There is always a puzzle to solve and that is part of beauty. Even in our own neighborhood, the Solar System, there are many questions we still have not been able to answer [1]. In the present paper, we explain the majority of these Mysteries and some other unexplained phenomena in the Solar System (SS) in frames of the developed Hypersphere World-Universe Model (WUM) [2].
文摘The integration of machine learning(ML)into geohazard assessment has successfully instigated a paradigm shift,leading to the production of models that possess a level of predictive accuracy previously considered unattainable.However,the black-box nature of these systems presents a significant barrier,hindering their operational adoption,regulatory approval,and full scientific validation.This paper provides a systematic review and synthesis of the emerging field of explainable artificial intelligence(XAI)as applied to geohazard science(GeoXAI),a domain that aims to resolve the long-standing trade-off between model performance and interpretability.A rigorous synthesis of 87 foundational studies is used to map the intellectual and methodological contours of this rapidly expanding field.The analysis reveals that current research efforts are concentrated predominantly on landslide and flood assessment.Methodologically,tree-based ensembles and deep learning models dominate the literature,with SHapley Additive exPlanations(SHAP)frequently adopted as the principal post-hoc explanation technique.More importantly,the review further documents how the role of XAI has shifted:rather than being used solely as a tool for interpreting models after training,it is increasingly integrated into the modeling cycle itself.Recent applications include its use in feature selection,adaptive sampling strategies,and model evaluation.The evidence also shows that GeoXAI extends beyond producing feature rankings.It reveals nonlinear thresholds and interaction effects that generate deeper mechanistic insights into hazard processes and mechanisms.Nevertheless,several key challenges remain unresolved within the field.These persistent issues are especially pronounced when considering the crucial necessity for interpretation stability,the demanding scholarly task of reliably distinguishing correlation from causation,and the development of appropriate methods for the treatment of complex spatio-temporal dynamics.
基金supported by the MSIT(Ministry of Science and ICT),Republic of Korea,under the ICAN(ICT Challenge and Advanced Network of HRD)support program(IITP-2025-RS-2023-00259497)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation)and was supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Republic of Korea government(MSIT)(No.IITP-2025-RS-2023-00254129+1 种基金Graduate School of Metaverse Convergence(Sungkyunkwan University))was supported by the Basic Science Research Program of the National Research Foundation(NRF)funded by the Republic of Korean government(MSIT)(No.RS-2024-00346737).
文摘Network attacks have become a critical issue in the internet security domain.Artificial intelligence technology-based detection methodologies have attracted attention;however,recent studies have struggled to adapt to changing attack patterns and complex network environments.In addition,it is difficult to explain the detection results logically using artificial intelligence.We propose a method for classifying network attacks using graph models to explain the detection results.First,we reconstruct the network packet data into a graphical structure.We then use a graph model to predict network attacks using edge classification.To explain the prediction results,we observed numerical changes by randomly masking and calculating the importance of neighbors,allowing us to extract significant subgraphs.Our experiments on six public datasets demonstrate superior performance with an average F1-score of 0.960 and accuracy of 0.964,outperforming traditional machine learning and other graph models.The visual representation of the extracted subgraphs highlights the neighboring nodes that have the greatest impact on the results,thus explaining detection.In conclusion,this study demonstrates that graph-based models are suitable for network attack detection in complex environments,and the importance of graph neighbors can be calculated to efficiently analyze the results.This approach can contribute to real-world network security analyses and provide a new direction in the field.
基金supported by the Basic Science Research Program through the National Research Foundation of Korea(NRF),funded by the Ministry of Education(RS-2023-00249743).
文摘Most Convolutional Neural Network(CNN)interpretation techniques visualize only the dominant cues that the model relies on,but there is no guarantee that these represent all the evidence the model uses for classification.This limitation becomes critical when hidden secondary cues—potentially more meaningful than the visualized ones—remain undiscovered.This study introduces CasCAM(Cascaded Class Activation Mapping)to address this fundamental limitation through counterfactual reasoning.By asking“if this dominant cue were absent,what other evidence would the model use?”,CasCAM progressively masks the most salient features and systematically uncovers the hierarchy of classification evidence hidden beneath them.Experimental results demonstrate that CasCAM effectively discovers the full spectrum of reasoning evidence and can be universally applied with nine existing interpretation methods.
文摘Objective:Deep learning is employed increasingly in Gastroenterology(GI)endoscopy computer-aided diagnostics for polyp segmentation and multi-class disease detection.In the real world,implementation requires high accuracy,therapeutically relevant explanations,strong calibration,domain generalization,and efficiency.Current Convolutional Neural Network(CNN)and transformer models compromise border precision and global context,generate attention maps that fail to align with expert reasoning,deteriorate during cross-center changes,and exhibit inadequate calibration,hence diminishing clinical trust.Methods:HMA-DER is a hierarchical multi-attention architecture that uses dilation-enhanced residual blocks and an explainability-aware Cognitive Alignment Score(CAS)regularizer to directly align attribution maps with reasoning signals from experts.The framework has additions that make it more resilient and a way to test for accuracy,macro-averaged F1 score,Area Under the Receiver Operating Characteristic Curve(AUROC),calibration(Expected Calibration Error(ECE),Brier Score),explainability(CAS,insertion/deletion AUC),cross-dataset transfer,and throughput.Results:HMA-DER gets Dice Similarity Coefficient scores of 89.5%and 86.0%on Kvasir-SEG and CVC-ClinicDB,beating the strongest baseline by+1.9 and+1.7 points.It gets 86.4%and 85.3%macro-F1 and 94.0%and 93.4%AUROC on HyperKvasir and GastroVision,which is better than the baseline by+1.4/+1.6macro-F1 and+1.2/+1.1AUROC.Ablation study shows that hierarchical attention gives the highest(+3.0),followed by CAS regularization(+2–3),dilatation(+1.5–2.0),and residual connections(+2–3).Cross-dataset validation demonstrates competitive zero-shot transfer(e.g.,KS→CVC Dice 82.7%),whereas multi-dataset training diminishes the domain gap,yielding an 88.1%primary-metric average.HMA-DER’s mixed-precision inference can handle 155 pictures per second,which helps with calibration.Conclusion:HMA-DER strikes a compromise between accuracy,explainability,robustness,and efficiency for the use of reliable GI computer-aided diagnosis in real-world clinical settings.
文摘The increasing number of interconnected devices and the incorporation of smart technology into contemporary healthcare systems have significantly raised the attack surface of cyber threats.The early detection of threats is both necessary and complex,yet these interconnected healthcare settings generate enormous amounts of heterogeneous data.Traditional Intrusion Detection Systems(IDS),which are generally centralized and machine learning-based,often fail to address the rapidly changing nature of cyberattacks and are challenged by ethical concerns related to patient data privacy.Moreover,traditional AI-driven IDS usually face challenges in handling large-scale,heterogeneous healthcare data while ensuring data privacy and operational efficiency.To address these issues,emerging technologies such as Big Data Analytics(BDA)and Federated Learning(FL)provide a hybrid framework for scalable,adaptive intrusion detection in IoT-driven healthcare systems.Big data techniques enable processing large-scale,highdimensional healthcare data,and FL can be used to train a model in a decentralized manner without transferring raw data,thereby maintaining privacy between institutions.This research proposes a privacy-preserving Federated Learning–based model that efficiently detects cyber threats in connected healthcare systems while ensuring distributed big data processing,privacy,and compliance with ethical regulations.To strengthen the reliability of the reported findings,the resultswere validated using cross-dataset testing and 95%confidence intervals derived frombootstrap analysis,confirming consistent performance across heterogeneous healthcare data distributions.This solution takes a significant step toward securing next-generation healthcare infrastructure by combining scalability,privacy,adaptability,and earlydetection capabilities.The proposed global model achieves a test accuracy of 99.93%±0.03(95%CI)and amiss-rate of only 0.07%±0.02,representing state-of-the-art performance in privacy-preserving intrusion detection.The proposed FL-driven IDS framework offers an efficient,privacy-preserving,and scalable solution for securing next-generation healthcare infrastructures by combining adaptability,early detection,and ethical data management.
文摘Unconfined Compressive Strength(UCS)is a key parameter for the assessment of the stability and performance of stabilized soils,yet traditional laboratory testing is both time and resource intensive.In this study,an interpretable machine learning approach to UCS prediction is presented,pairing five models(Random Forest(RF),Gradient Boosting(GB),Extreme Gradient Boosting(XGB),CatBoost,and K-Nearest Neighbors(KNN))with SHapley Additive exPlanations(SHAP)for enhanced interpretability and to guide feature removal.A complete dataset of 12 geotechnical and chemical parameters,i.e.,Atterberg limits,compaction properties,stabilizer chemistry,dosage,curing time,was used to train and test the models.R2,RMSE,MSE,and MAE were used to assess performance.Initial results with all 12 features indicated that boosting-based models(GB,XGB,CatBoost)exhibited the highest predictive accuracy(R^(2)=0.93)with satisfactory generalization on test data,followed by RF and KNN.SHAP analysis consistently picked CaO content,curing time,stabilizer dosage,and compaction parameters as the most important features,aligning with established soil stabilization mechanisms.Models were then re-trained on the top 8 and top 5 SHAP-ranked features.Interestingly,GB,XGB,and CatBoost maintained comparable accuracy with reduced input sets,while RF was moderately sensitive and KNN was somewhat better owing to reduced dimensionality.The findings confirm that feature reduction through SHAP enables cost-effective UCS prediction through the reduction of laboratory test requirements without significant accuracy loss.The suggested hybrid approach offers an explainable,interpretable,and cost-effective tool for geotechnical engineering practice.
文摘The biological stabilization of soil using microbially induced carbonate precipitation(MICP)employs ureolytic bacteria to precipitate calcium carbonate(CaCO3),which binds soil particles,enhancing strength,stiffness,and erosion resistance.The unconfinedcompressive strength(UCS),a key measure of soil strength,is critical in geotechnical engineering as it directly reflectsthe mechanical stability of treated soils.This study integrates explainable artificialintelligence(XAI)with geotechnical insights to model the UCS of MICP-treated sands.Using 517 experimental data points and a combination of various input variables—including median grain size(D50),coefficientof uniformity(Cu),void ratio(e),urea concentration(Mu),calcium concentration(Mc),optical density(OD)of bacterial solution,pH,and total injection volume(Vt)—fivemachine learning(ML)models,including eXtreme gradient boosting(XGBoost),Light gradient boosting machine(LightGBM),random forest(RF),gene expression programming(GEP),and multivariate adaptive regression splines(MARS),were developed and optimized.The ensemble models(XGBoost,LightGBM,and RF)were optimized using the Chernobyl disaster optimizer(CDO),a recently developed metaheuristic algorithm.Of these,LightGBM-CDO achieved the highest accuracy for UCS prediction.XAI techniques like feature importance analysis(FIA),SHapley additive exPlanations(SHAP),and partial dependence plots(PDPs)were also used to investigate the complex non-linear relationships between the input and output variables.The results obtained have demonstrated that the XAI-driven models can enhance the predictive accuracy and interpretability of MICP processes,offering a sustainable pathway for optimizing geotechnical applications.
基金supported by the National Natural Science Foundation of China(Grant Nos.52378392,52408356)the Foal Eagle Program Youth Top-notch Talent Project of Fujian Province,China(Grant No.00387088).
文摘Although machine learning models have achieved high enough accuracy in predicting shield position deviations,their“black box”nature makes the prediction mechanisms and decision-making processes opaque,leading to weaker explanations and practicability.This study introduces a novel explainable deep learning framework comprising the Informer model with enhanced attention mechanisms(EAMInfor)and deep learning important features(DeepLIFT),aimed at improving the prediction accuracy of shield position deviations and providing interpretability for predictive results.The EAMInfor model attempts to integrate channel attention,spatial attention,and simple attention modules to improve the Informer model's performance.The framework is tested with the four different geological conditions datasets generated from the Xiamen metro line 3,China.Results show that the EAMInfor model outperforms the traditional Informer and comparison models.The analysis with the DeepLIFT method indicates that the push thrust of push cylinder and the earth chamber pressure are the most significant features,while the stroke length of the push cylinder demonstrated lower importance.Furthermore,the variation trends in the significance of data points within input sequences exhibit substantial differences between single and composite strata.This framework not only improves predictive accuracy but also strengthens the credibility and reliability of the results.
文摘In the field of intelligent surveillance,weakly supervised video anomaly detection(WSVAD)has garnered widespread attention as a key technology that identifies anomalous events using only video-level labels.Although multiple instance learning(MIL)has dominated the WSVAD for a long time,its reliance solely on video-level labels without semantic grounding hinders a fine-grained understanding of visually similar yet semantically distinct events.In addition,insufficient temporal modeling obscures causal relationships between events,making anomaly decisions reactive rather than reasoning-based.To overcome the limitations above,this paper proposes an adaptive knowledgebased guidance method that integrates external structured knowledge.The approach combines hierarchical category information with learnable prompt vectors.It then constructs continuously updated contextual references within the feature space,enabling fine-grained meaning-based guidance over video content.Building on this,the work introduces an event relation analysis module.This module explicitly models temporal dependencies and causal correlations between video snippets.It constructs an evolving logic chain of anomalous events,revealing the process by which isolated anomalous snippets develop into a complete event.Experiments on multiple benchmark datasets show that the proposed method achieves highly competitive performance,achieving an AUC of 88.19%on UCF-Crime and an AP of 86.49%on XD-Violence.More importantly,the method provides temporal and causal explanations derived from event relationships alongside its detection results.This capability significantly advances WSVAD from a simple binary classification to a new level of interpretable behavior analysis.
文摘Predicting the behavior of renewable energy systems requires models capable of generating accurate forecasts from limited historical data,a challenge that becomes especially pronounced when commissioning new facil-ities where operational records are scarce.This review aims to synthesize recent progress in data-efficient deep learning approaches for addressing such“cold-start”forecasting problems.It primarily covers three interrelated domains—solar photovoltaic(PV),wind power,and electrical load forecasting—where data scarcity and operational variability are most critical,while also including representative studies on hydropower and carbon emission prediction to provide a broader systems perspective.To this end,we examined trends from over 150 predominantly peer-reviewed studies published between 2019 and mid-2025,highlighting advances in zero-shot and few-shot meta-learning frameworks that enable rapid model adaptation with minimal labeled data.Moreover,transfer learning approaches combined with spatiotemporal graph neural networks have been employed to transfer knowledge from existing energy assets to new,data-sparse environments,effectively capturing hidden dependencies among geographic features,meteorological dynamics,and grid structures.Synthetic data generation has further proven valuable for expanding training samples and mitigating overfitting in cold-start scenarios.In addition,large language models and explainable artificial intelligence(XAI)—notably conversational XAI systems—have been used to interpret and communicate complex model behaviors in accessible terms,fostering operator trust from the earliest deployment stages.By consolidating methodological advances,unresolved challenges,and open-source resources,this review provides a coherent overview of deep learning strategies that can shorten the data-sparse ramp-up period of new energy infrastructures and accelerate the transition toward resilient,low-carbon electricity grids.
基金supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University(IMSIU)(grant number IMSIU-DDRSP2601).
文摘Automated detection of Motor Imagery(MI)tasks is extremely useful for prosthetic arms and legs of stroke patients for their rehabilitation.Prediction of MI tasks can be performed with the help of Electroencephalogram(EEG)signals recorded by placing electrodes on the scalp of subjects;however,accurate prediction of MI tasks remains a challenge due to noise that is incurred during the EEG signal recording process,the extraction of a feature vector with high interclass variance,and accurate classification.The proposed method consists of preprocessing,feature extraction,and classification.First,EEG signals are denoised using a bandpass filter followed by Independent Component Analysis(ICA).Multiple channels are combined to form a single surrogate channel.Short Time Fourier Transform(STFT)is then applied to convert time domain EEG signals into the frequency domain.Handcrafted and automated features are extracted from EEG signals and then concatenated to form a single feature vector.We propose a customized two-dimensional Convolutional Neural Network(CNN)for automated feature extraction with high interclass variance.Feature selection is performed using Particle Swarm Optimization(PSO)to obtain optimal features.The final feature vector is passed to three different classifiers:Support Vector Machine(SVM),Random Forest(RF),and Long Short-Term Memory(LSTM).The final decision is made using the Model-Agnostic Meta Learning(MAML).The Proposed method has been tested on two datasets,including PhysioNet and BCI Competition IV-2a,and it achieved better results in terms of accuracy and F1 score than existing state-of-the-art methods.The proposed framework achieved an accuracy and F1 score of 96%on the PhysioNet dataset and 95.5%on the BCI Competition IV,dataset 2a.We also present SHapley Additive exPlanations(SHAP)and Gradient-weighted Class Activation Mapping(Grad-CAM)explainable techniques to enhance model interpretability in a clinical setting.
基金supported via funding from Prince Sattam bin Abdulaziz University project number(PSAU/2025/03/32440).
文摘Parkinson’s disease remains a major clinical issue in terms of early detection,especially during its prodromal stage when symptoms are not evident or not distinct.To address this problem,we proposed a new deep learning 2-based approach for detecting Parkinson’s disease before any of the overt symptoms develop during their prodromal stage.We used 5 publicly accessible datasets,including UCI Parkinson’s Voice,Spiral Drawings,PaHaW,NewHandPD,and PPMI,and implemented a dual stream CNN–BiLSTM architecture with Fisher-weighted feature merging and SHAP-based explanation.The findings reveal that the model’s performance was superior and achieved 98.2%,a F1-score of 0.981,and AUC of 0.991 on the UCI Voice dataset.The model’s performance on the remaining datasets was also comparable,with up to a 2–7 percent betterment in accuracy compared to existing strong models such as CNN–RNN–MLP,ILN–GNet,and CASENet.Across the evidence,the findings back the diagnostic promise of micro-tremor assessment and demonstrate that combining temporal and spatial features with a scatter-based segment for a multi-modal approach can be an effective and scalable platform for an“early,”interpretable PD screening system.
基金the King Salman center for Disability Research for funding this work through Research Group No.KSRG-2024-050.
文摘Artificial Intelligence(AI)is changing healthcare by helping with diagnosis.However,for doctors to trust AI tools,they need to be both accurate and easy to understand.In this study,we created a new machine learning system for the early detection of Autism Spectrum Disorder(ASD)in children.Our main goal was to build a model that is not only good at predicting ASD but also clear in its reasoning.For this,we combined several different models,including Random Forest,XGBoost,and Neural Networks,into a single,more powerful framework.We used two different types of datasets:(i)a standard behavioral dataset and(ii)a more complex multimodal dataset with images,audio,and physiological information.The datasets were carefully preprocessed for missing values,redundant features,and dataset imbalance to ensure fair learning.The results outperformed the state-of-the-art with a Regularized Neural Network,achieving 97.6%accuracy on behavioral data.Whereas,on the multimodal data,the accuracy is 98.2%.Other models also did well with accuracies consistently above 96%.We also used SHAP and LIME on a behavioral dataset for models’explainability.
文摘阅读理解One large dinosaur hid in the thick jungle(热带雨林).With small,hungry eyes he watched a larger dinosaur,about 90 feet long!It was eating grass by a lake.The one in the jungle stood 20 feet tall on his powerful back legs.His two front legs were short,with sharp claws on the feet.His teeth were like long knives.
基金the National Natural Science Foundation of China(42377170,42407212)the National Funded Postdoctoral Researcher Program(GZB20230606)+3 种基金the Postdoctoral Research Foundation of China(2024M752679)the Sichuan Natural Science Foundation(2025ZNSFSC1205)the National Key R&D Program of China(2022YFC3005704)the Sichuan Province Science and Technology Support Program(2024NSFSC0100)。
文摘Wildfires significantly disrupt the physical and hydrologic conditions of the environment,leading to vegetation loss and altered surface geo-material properties.These complex dynamics promote post-fire gully erosion,yet the key conditioning factors(e.g.,topography,hydrology)remain insufficiently understood.This study proposes a novel artificial intelligence(AI)framework that integrates four machine learning(ML)models with Shapley Additive Explanations(SHAP)method,offering a hierarchical perspective from global to local on the dominant factors controlling gully distribution in wildfireaffected areas.In a case study of Xiangjiao catchment burned on March 28,2020,in Muli County in Sichuan Province of Southwest China,we derived 21 geoenvironmental factors to assess the susceptibility of post-fire gully erosion using logistic regression(LR),support vector machine(SVM),random forest(RF),and convolutional neural network(CNN)models.SHAP-based model interpretation revealed eight key conditioning factors:topographic position index(TPI),topographic wetness index(TWI),distance to stream,mean annual precipitation,differenced normalized burn ratio(d NBR),land use/cover,soil type,and distance to road.Comparative model evaluation demonstrated that reduced-variable models incorporating these dominant factors achieved accuracy comparable to that of the initial-variable models,with AUC values exceeding 0.868 across all ML algorithms.These findings provide critical insights into gully erosion behavior in wildfire-affected areas,supporting the decision-making process behind environmental management and hazard mitigation.
基金Supported by the Laoshan Laboratory(No.LSKJ202202402)the National Natural Science Foundation of China(No.42030410)+2 种基金the Startup Foundation for Introducing Talent of Nanjing University of Information Science&Technology,and Jiangsu Innovation Research Group(No.JSSCTD 202346)supported by the China National Postdoctoral Program for Innovative Talents(No.BX20240169)the China Postdoctoral Science Foundation(No.2141062400101)。
文摘Deep learning(DL)has become a crucial technique for predicting the El Niño-Southern Oscillation(ENSO)and evaluating its predictability.While various DL-based models have been developed for ENSO predictions,many fail to capture the coherent multivariate evolution within the coupled ocean-atmosphere system of the tropical Pacific.To address this three-dimensional(3D)limitation and represent ENSO-related ocean-atmosphere interactions more accurately,a novel this 3D multivariate prediction model was proposed based on a Transformer architecture,which incorporates a spatiotemporal self-attention mechanism.This model,named 3D-Geoformer,offers several advantages,enabling accurate ENSO predictions up to one and a half years in advance.Furthermore,an integrated gradient method was introduced into the model to identify the sources of predictability for sea surface temperature(SST)variability in the eastern equatorial Pacific.Results reveal that the 3D-Geoformer effectively captures ENSO-related precursors during the evolution of ENSO events,particularly the thermocline feedback processes and ocean temperature anomaly pathways on and off the equator.By extending DL-based ENSO predictions from one-dimensional Niño time series to 3D multivariate fields,the 3D-Geoformer represents a significant advancement in ENSO prediction.This study provides details in the model formulation,analysis procedures,sensitivity experiments,and illustrative examples,offering practical guidance for the application of the model in ENSO research.
基金financial support from the National Key Research and Development Program of China(2021YFB 3501501)the National Natural Science Foundation of China(No.22225803,22038001,22108007 and 22278011)+1 种基金Beijing Natural Science Foundation(No.Z230023)Beijing Science and Technology Commission(No.Z211100004321001).
文摘The high porosity and tunable chemical functionality of metal-organic frameworks(MOFs)make it a promising catalyst design platform.High-throughput screening of catalytic performance is feasible since the large MOF structure database is available.In this study,we report a machine learning model for high-throughput screening of MOF catalysts for the CO_(2) cycloaddition reaction.The descriptors for model training were judiciously chosen according to the reaction mechanism,which leads to high accuracy up to 97%for the 75%quantile of the training set as the classification criterion.The feature contribution was further evaluated with SHAP and PDP analysis to provide a certain physical understanding.12,415 hypothetical MOF structures and 100 reported MOFs were evaluated under 100℃ and 1 bar within one day using the model,and 239 potentially efficient catalysts were discovered.Among them,MOF-76(Y)achieved the top performance experimentally among reported MOFs,in good agreement with the prediction.