Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural ...Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.展开更多
Accurate reservoir permeability determination is crucial in hydrocarbon exploration and production.Conventional methods relying on empirical correlations and assumptions often result in high costs,time consumption,ina...Accurate reservoir permeability determination is crucial in hydrocarbon exploration and production.Conventional methods relying on empirical correlations and assumptions often result in high costs,time consumption,inaccuracies,and uncertainties.This study introduces a novel hybrid machine learning approach to predict the permeability of the Wangkwar formation in the Gunya oilfield,Northwestern Uganda.The group method of data handling with differential evolution(GMDH-DE)algorithm was used to predict permeability due to its capability to manage complex,nonlinear relationships between variables,reduced computation time,and parameter optimization through evolutionary algorithms.Using 1953 samples from Gunya-1 and Gunya-2 wells for training and 1563 samples from Gunya-3 for testing,the GMDH-DE outperformed the group method of data handling(GMDH)and random forest(RF)in predicting permeability with higher accuracy and lower computation time.The GMDH-DE achieved an R^(2)of 0.9985,RMSE of 3.157,MAE of 2.366,and ME of 0.001 during training,and for testing,the ME,MAE,RMSE,and R^(2)were 1.3508,12.503,21.3898,and 0.9534,respectively.Additionally,the GMDH-DE demonstrated a 41%reduction in processing time compared to GMDH and RF.The model was also used to predict the permeability of the Mita Gamma well in the Mandawa basin,Tanzania,which lacks core data.Shapley additive explanations(SHAP)analysis identified thermal neutron porosity(TNPH),effective porosity(PHIE),and spectral gamma-ray(SGR)as the most critical parameters in permeability prediction.Therefore,the GMDH-DE model offers a novel,efficient,and accurate approach for fast permeability prediction,enhancing hydrocarbon exploration and production.展开更多
BACKGROUND Diabetic foot ulcer(DFU)is a serious and destructive complication of diabetes,which has a high amputation rate and carries a huge social burden.Early detection of risk factors and intervention are essential...BACKGROUND Diabetic foot ulcer(DFU)is a serious and destructive complication of diabetes,which has a high amputation rate and carries a huge social burden.Early detection of risk factors and intervention are essential to reduce amputation rates.With the development of artificial intelligence technology,efficient interpretable predictive models can be generated in clinical practice to improve DFU care.AIM To develop and validate an interpretable model for predicting amputation risk in DFU patients.METHODS This retrospective study collected basic data from 599 patients with DFU in Beijing Shijitan Hospital between January 2015 and June 2024.The data set was randomly divided into a training set and test set with fivefold cross-validation.Three binary variable models were built with the eXtreme Gradient Boosting(XGBoost)algorithm to input risk factors that predict amputation probability.The model performance was optimized by adjusting the super parameters.The pre-dictive performance of the three models was expressed by sensitivity,specificity,positive predictive value,negative predictive value and area under the curve(AUC).Visualization of the prediction results was realized through SHapley Additive exPlanation(SHAP).RESULTS A total of 157(26.2%)patients underwent minor amputation during hospitalization and 50(8.3%)had major amputation.All three XGBoost models demonstrated good discriminative ability,with AUC values>0.7.The model for predicting major amputation achieved the highest performance[AUC=0.977,95%confidence interval(CI):0.956-0.998],followed by the minor amputation model(AUC=0.800,95%CI:0.762-0.838)and the non-amputation model(AUC=0.772,95%CI:0.730-0.814).Feature importance ranking of the three models revealed the risk factors for minor and major amputation.Wagner grade 4/5,osteomyelitis,and high C-reactive protein were all considered important predictive variables.CONCLUSION XGBoost effectively predicts diabetic foot amputation risk and provides interpretable insights to support person-alized treatment decisions.展开更多
This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging...This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.展开更多
The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial int...The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.展开更多
Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett...Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。展开更多
Collaborative Filtering(CF) is a leading approach to build recommender systems which has gained considerable development and popularity. A predominant approach to CF is rating prediction recommender algorithm, aiming ...Collaborative Filtering(CF) is a leading approach to build recommender systems which has gained considerable development and popularity. A predominant approach to CF is rating prediction recommender algorithm, aiming to predict a user's rating for those items which were not rated yet by the user. However, with the increasing number of items and users, thedata is sparse.It is difficult to detectlatent closely relation among the items or users for predicting the user behaviors. In this paper,we enhance the rating prediction approach leading to substantial improvement of prediction accuracy by categorizing according to the genres of movies. Then the probabilities that users are interested in the genres are computed to integrate the prediction of each genre cluster. A novel probabilistic approach based on the sentiment analysis of the user reviews is also proposed to give intuitional explanations of why an item is recommended.To test the novel recommendation approach, a new corpus of user reviews on movies obtained from the Internet Movies Database(IMDB) has been generated. Experimental results show that the proposed framework is effective and achieves a better prediction performance.展开更多
Recently,convolutional neural network(CNN)-based visual inspec-tion has been developed to detect defects on building surfaces automatically.The CNN model demonstrates remarkable accuracy in image data analysis;however...Recently,convolutional neural network(CNN)-based visual inspec-tion has been developed to detect defects on building surfaces automatically.The CNN model demonstrates remarkable accuracy in image data analysis;however,the predicted results have uncertainty in providing accurate informa-tion to users because of the“black box”problem in the deep learning model.Therefore,this study proposes a visual explanation method to overcome the uncertainty limitation of CNN-based defect identification.The visual repre-sentative gradient-weights class activation mapping(Grad-CAM)method is adopted to provide visually explainable information.A visualizing evaluation index is proposed to quantitatively analyze visual representations;this index reflects a rough estimate of the concordance rate between the visualized heat map and intended defects.In addition,an ablation study,adopting three-branch combinations with the VGG16,is implemented to identify perfor-mance variations by visualizing predicted results.Experiments reveal that the proposed model,combined with hybrid pooling,batch normalization,and multi-attention modules,achieves the best performance with an accuracy of 97.77%,corresponding to an improvement of 2.49%compared with the baseline model.Consequently,this study demonstrates that reliable results from an automatic defect classification model can be provided to an inspector through the visual representation of the predicted results using CNN models.展开更多
The Kirk test has good precision for measuring stray light in optical lithography and is the usual method of measuring stray light.However,Kirk did not provide a theoretical explanation to his simulation model.We atte...The Kirk test has good precision for measuring stray light in optical lithography and is the usual method of measuring stray light.However,Kirk did not provide a theoretical explanation to his simulation model.We attempt to give Kirk's model a kind of theoretical explanation and a little improvement based on the model of point spread function of scattering and the theory of statistical optics.It is indicated by simulation that the improved model fits Kirk's measurement data better.展开更多
Finding an attribute to explain the relationships between a given pair of entities is valuable in many applications.However,many direct solutions fail,owing to its low precision caused by heavy dependence on text and ...Finding an attribute to explain the relationships between a given pair of entities is valuable in many applications.However,many direct solutions fail,owing to its low precision caused by heavy dependence on text and low recall by evidence scarcity.Thus,we propose a generalization-and-inference framework and implement it to build a system:entity-relationship finder(ERF).Our main idea is conceptualizing entity pairs into proper concept pairs,as intermediate random variables to form the explanation.Although entity conceptualization has been studied,it has new challenges of collective optimization for multiple relationship instances,joint optimization for both entities,and aggregation of diluted observations into the head concepts defining the relationship.We propose conceptualization solutions and validate them as well as the framework with extensive experiments.展开更多
There is a puzzling astrophysical result concerning the latest observation of the absorption profile of the redshifted radio line 21 cm from the early Universe(as described in Bowman et al.). The amplitude of the prof...There is a puzzling astrophysical result concerning the latest observation of the absorption profile of the redshifted radio line 21 cm from the early Universe(as described in Bowman et al.). The amplitude of the profile was more than a factor of two greater than the largest predictions. This could mean that the primordial hydrogen gas was much cooler than expected. Some explanations in the literature suggested a possible cooling of baryons either by unspecified dark matter particles or by some exotic dark matter particles with a charge a million times smaller than the electron charge. Other explanations required an additional radio background. In the present paper, we entertain a possible different explanation for the above puzzling observational result: the explanation is based on the alternative kind of hydrogen atoms(AKHA),whose existence was previously demonstrated theoretically, as well as by the analysis of atomic experiments. Namely, the AKHA are expected to decouple from the cosmic microwave background(CMB) much earlier(in the course of the Universe expansion) than usual hydrogen atoms, so that the AKHA temperature is significantly lower than that of usual hydrogen atoms. This seems to lower the excitation(spin) temperature of the hyperfine doublet(responsible for the 21 cm line) sufficiently enough for explaining the above puzzling observational result. This possible explanation appears to be more specific and natural than the previous possible explanations. Further observational studies of the redshifted 21 cm radio line from the early Universe could help to verify which explanation is the most relevant.展开更多
[Objective] The research aimed to analyze explanation effect of the European numerical prediction on temperature. [Method] Based on CMSVM regression method, by using 850 hPa grid point data of the European numerical p...[Objective] The research aimed to analyze explanation effect of the European numerical prediction on temperature. [Method] Based on CMSVM regression method, by using 850 hPa grid point data of the European numerical prediction from 2003 to 2009 and actual data of the maximum and minimum temperatures at 8 automatic stations in Qingyang City, prediction model of the temperature was established, and running effect of the business from 2008 to 2010 was tested and evaluated. [Result] The method had very good guidance role in real-time business running of the temperature prediction. Test and evaluation found that as forecast time prolonged, prediction accuracies of the maximum and minimum temperatures declined. When temperature anomaly was higher (actual temperature was higher than historical mean), prediction accuracy increased. Influence of the European numerical prediction was bigger. [Conclusion] Compared with other methods, operation of the prediction method was convenient, modeling was automatic, running time was short, system was stable, and prediction accuracy was high. It was suitable for implementing of the explanation work for numerical prediction product at meteorological station.展开更多
With large-scale engineering projects being carried out in China, a large number of fossil localities have been discovered and excavated by responsible agencies, but still some important fossils of great value have be...With large-scale engineering projects being carried out in China, a large number of fossil localities have been discovered and excavated by responsible agencies, but still some important fossils of great value have been removed and smuggled into foreign countries. In the last three years, more than 1345 fossil specimens have been intercepted by Customs in Shenzhen, Shanghai, Tianjin, Beijing and elsewhere, and more than 5000 fossils, most of which are listed as key fossils,展开更多
Majorana zero modes in the hybrid semiconductor-superconductornanowire is one of the promising candidates for topologicalquantum computing. Recently, in nanowires with a superconductingisland, the signature of Majoran...Majorana zero modes in the hybrid semiconductor-superconductornanowire is one of the promising candidates for topologicalquantum computing. Recently, in nanowires with a superconductingisland, the signature of Majorana zero modescan be revealed as a subgap state whose energy oscillatesaround zero in magnetic field. This oscillation was interpretedas overlapping Majoranas. However, the oscillation amplitudeeither dies away after an overshoot or decays, sharply oppositeto the theoretically predicted enhanced oscillations for Majoranabound states, as the magnetic field increases. Several theoreticalstudies have tried to address this discrepancy, but arepartially successful. This discrepancy has raised the concernson the conclusive identification of Majorana bound states, andhas even endangered the scheme of Majorana qubits basedon the nanowires.展开更多
The flow regimes of GLCC with horizon inlet and a vertical pipe are investigated in experiments,and the velocities and pressure drops data labeled by the corresponding flow regimes are collected.Combined with the flow...The flow regimes of GLCC with horizon inlet and a vertical pipe are investigated in experiments,and the velocities and pressure drops data labeled by the corresponding flow regimes are collected.Combined with the flow regimes data of other GLCC positions from other literatures in existence,the gas and liquid superficial velocities and pressure drops are used as the input of the machine learning algorithms respectively which are applied to identify the flow regimes.The choosing of input data types takes the availability of data for practical industry fields into consideration,and the twelve machine learning algorithms are chosen from the classical and popular algorithms in the area of classification,including the typical ensemble models,SVM,KNN,Bayesian Model and MLP.The results of flow regimes identification show that gas and liquid superficial velocities are the ideal type of input data for the flow regimes identification by machine learning.Most of the ensemble models can identify the flow regimes of GLCC by gas and liquid velocities with the accuracy of 0.99 and more.For the pressure drops as the input of each algorithm,it is not the suitable as gas and liquid velocities,and only XGBoost and Bagging Tree can identify the GLCC flow regimes accurately.The success and confusion of each algorithm are analyzed and explained based on the experimental phenomena of flow regimes evolution processes,the flow regimes map,and the principles of algorithms.The applicability and feasibility of each algorithm according to different types of data for GLCC flow regimes identification are proposed.展开更多
Existing explanation methods for Convolutional Neural Networks(CNNs)lack the pixel-level visualization explanations to generate the reliable fine-grained decision features.Since there are inconsistencies between the e...Existing explanation methods for Convolutional Neural Networks(CNNs)lack the pixel-level visualization explanations to generate the reliable fine-grained decision features.Since there are inconsistencies between the explanation and the actual behavior of the model to be interpreted,we propose a Fine-Grained Visual Explanation for CNN,namely F-GVE,which produces a fine-grained explanation with higher consistency to the decision of the original model.The exact backward class-specific gradients with respect to the input image is obtained to highlight the object-related pixels the model used to make prediction.In addition,for better visualization and less noise,F-GVE selects an appropriate threshold to filter the gradient during the calculation and the explanation map is obtained by element-wise multiplying the gradient and the input image to show fine-grained classification decision features.Experimental results demonstrate that F-GVE has good visual performances and highlights the importance of fine-grained decision features.Moreover,the faithfulness of the explanation in this paper is high and it is effective and practical on troubleshooting and debugging detection.展开更多
DyTiFe_(11) compound is a ferromagnetic substance.It has tetragonal body-centered ThMn_(12)-type crystallographic structure.At room temperature,the easy magnetization direction is the c-axis.A spin reorientation begin...DyTiFe_(11) compound is a ferromagnetic substance.It has tetragonal body-centered ThMn_(12)-type crystallographic structure.At room temperature,the easy magnetization direction is the c-axis.A spin reorientation begins to appear at about 175K.The contribution of Fe sublattice to magnetocrystalline anisotropy was determined by experiments and that of Dy sublattice was obtained by using single ion model calculation.Results show that the spin reorientation arises from the competition of anisotropy between Fe and Dy sublattices.展开更多
In the letter to the editor, Dr. Comings et al. proposed a potential explanation of our findings that the L allele rather than S allele of 5-HTTLPR was associated with higher anxiety levels and reduced amygdala-prefro...In the letter to the editor, Dr. Comings et al. proposed a potential explanation of our findings that the L allele rather than S allele of 5-HTTLPR was associated with higher anxiety levels and reduced amygdala-prefrontal cortex (PFC) connectivity in Han Chinese[1], which demonstrated an 'allele reversal' in the genetics of the 5-HTTLPR gene in Asians versus Caucasians. The authors alleged that this 'allele reversal' might simply result from maternal age and suggested that we test this on our datasets. Unfortunately,展开更多
文摘Predicting molecular properties is essential for advancing for advancing drug discovery and design. Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to capture the complex structural and relational information inherent in molecular graphs. Despite their effectiveness, the “black-box” nature of GNNs remains a significant obstacle to their widespread adoption in chemistry, as it hinders interpretability and trust. In this context, several explanation methods based on factual reasoning have emerged. These methods aim to interpret the predictions made by GNNs by analyzing the key features contributing to the prediction. However, these approaches fail to answer critical questions: “How to ensure that the structure-property mapping learned by GNNs is consistent with established domain knowledge”. In this paper, we propose MMGCF, a novel counterfactual explanation framework designed specifically for the prediction of GNN-based molecular properties. MMGCF constructs a hierarchical tree structure on molecular motifs, enabling the systematic generation of counterfactuals through motif perturbations. This framework identifies causally significant motifs and elucidates their impact on model predictions, offering insights into the relationship between structural modifications and predicted properties. Our method demonstrates its effectiveness through comprehensive quantitative and qualitative evaluations of four real-world molecular datasets.
基金supported by the Major National Science and Technology Programs in the“Thirteenth Five-Year”Plan period(Grant No.2017ZX05032-002-004)the Innovation Team Funding of Natural Science Foundation of Hubei Province,China(Grant No.2021CFA031)the Chinese Scholarship Council(CSC)and Silk Road Institute for their support in terms of stipend.
文摘Accurate reservoir permeability determination is crucial in hydrocarbon exploration and production.Conventional methods relying on empirical correlations and assumptions often result in high costs,time consumption,inaccuracies,and uncertainties.This study introduces a novel hybrid machine learning approach to predict the permeability of the Wangkwar formation in the Gunya oilfield,Northwestern Uganda.The group method of data handling with differential evolution(GMDH-DE)algorithm was used to predict permeability due to its capability to manage complex,nonlinear relationships between variables,reduced computation time,and parameter optimization through evolutionary algorithms.Using 1953 samples from Gunya-1 and Gunya-2 wells for training and 1563 samples from Gunya-3 for testing,the GMDH-DE outperformed the group method of data handling(GMDH)and random forest(RF)in predicting permeability with higher accuracy and lower computation time.The GMDH-DE achieved an R^(2)of 0.9985,RMSE of 3.157,MAE of 2.366,and ME of 0.001 during training,and for testing,the ME,MAE,RMSE,and R^(2)were 1.3508,12.503,21.3898,and 0.9534,respectively.Additionally,the GMDH-DE demonstrated a 41%reduction in processing time compared to GMDH and RF.The model was also used to predict the permeability of the Mita Gamma well in the Mandawa basin,Tanzania,which lacks core data.Shapley additive explanations(SHAP)analysis identified thermal neutron porosity(TNPH),effective porosity(PHIE),and spectral gamma-ray(SGR)as the most critical parameters in permeability prediction.Therefore,the GMDH-DE model offers a novel,efficient,and accurate approach for fast permeability prediction,enhancing hydrocarbon exploration and production.
文摘BACKGROUND Diabetic foot ulcer(DFU)is a serious and destructive complication of diabetes,which has a high amputation rate and carries a huge social burden.Early detection of risk factors and intervention are essential to reduce amputation rates.With the development of artificial intelligence technology,efficient interpretable predictive models can be generated in clinical practice to improve DFU care.AIM To develop and validate an interpretable model for predicting amputation risk in DFU patients.METHODS This retrospective study collected basic data from 599 patients with DFU in Beijing Shijitan Hospital between January 2015 and June 2024.The data set was randomly divided into a training set and test set with fivefold cross-validation.Three binary variable models were built with the eXtreme Gradient Boosting(XGBoost)algorithm to input risk factors that predict amputation probability.The model performance was optimized by adjusting the super parameters.The pre-dictive performance of the three models was expressed by sensitivity,specificity,positive predictive value,negative predictive value and area under the curve(AUC).Visualization of the prediction results was realized through SHapley Additive exPlanation(SHAP).RESULTS A total of 157(26.2%)patients underwent minor amputation during hospitalization and 50(8.3%)had major amputation.All three XGBoost models demonstrated good discriminative ability,with AUC values>0.7.The model for predicting major amputation achieved the highest performance[AUC=0.977,95%confidence interval(CI):0.956-0.998],followed by the minor amputation model(AUC=0.800,95%CI:0.762-0.838)and the non-amputation model(AUC=0.772,95%CI:0.730-0.814).Feature importance ranking of the three models revealed the risk factors for minor and major amputation.Wagner grade 4/5,osteomyelitis,and high C-reactive protein were all considered important predictive variables.CONCLUSION XGBoost effectively predicts diabetic foot amputation risk and provides interpretable insights to support person-alized treatment decisions.
基金supported by a National Research Foundation of Korea(NRF)grant funded by the Korean government(MSIT)(RS-2023-00222536).
文摘This study provides an in-depth comparative evaluation of landslide susceptibility using two distinct spatial units:and slope units(SUs)and hydrological response units(HRUs),within Goesan County,South Korea.Leveraging the capabilities of the extreme gradient boosting(XGB)algorithm combined with Shapley Additive Explanations(SHAP),this work assesses the precision and clarity with which each unit predicts areas vulnerable to landslides.SUs focus on the geomorphological features like ridges and valleys,focusing on slope stability and landslide triggers.Conversely,HRUs are established based on a variety of hydrological factors,including land cover,soil type and slope gradients,to encapsulate the dynamic water processes of the region.The methodological framework includes the systematic gathering,preparation and analysis of data,ranging from historical landslide occurrences to topographical and environmental variables like elevation,slope angle and land curvature etc.The XGB algorithm used to construct the Landslide Susceptibility Model(LSM)was combined with SHAP for model interpretation and the results were evaluated using Random Cross-validation(RCV)to ensure accuracy and reliability.To ensure optimal model performance,the XGB algorithm’s hyperparameters were tuned using Differential Evolution,considering multicollinearity-free variables.The results show that SU and HRU are effective for LSM,but their effectiveness varies depending on landscape characteristics.The XGB algorithm demonstrates strong predictive power and SHAP enhances model transparency of the influential variables involved.This work underscores the importance of selecting appropriate assessment units tailored to specific landscape characteristics for accurate LSM.The integration of advanced machine learning techniques with interpretative tools offers a robust framework for landslide susceptibility assessment,improving both predictive capabilities and model interpretability.Future research should integrate broader data sets and explore hybrid analytical models to strengthen the generalizability of these findings across varied geographical settings.
基金supported by the National Natural Science Foundation of China Project(No.62302540)please visit their website at https://www.nsfc.gov.cn/(accessed on 18 June 2024).
文摘The methods of network attacks have become increasingly sophisticated,rendering traditional cybersecurity defense mechanisms insufficient to address novel and complex threats effectively.In recent years,artificial intelligence has achieved significant progress in the field of network security.However,many challenges and issues remain,particularly regarding the interpretability of deep learning and ensemble learning algorithms.To address the challenge of enhancing the interpretability of network attack prediction models,this paper proposes a method that combines Light Gradient Boosting Machine(LGBM)and SHapley Additive exPlanations(SHAP).LGBM is employed to model anomalous fluctuations in various network indicators,enabling the rapid and accurate identification and prediction of potential network attack types,thereby facilitating the implementation of timely defense measures,the model achieved an accuracy of 0.977,precision of 0.985,recall of 0.975,and an F1 score of 0.979,demonstrating better performance compared to other models in the domain of network attack prediction.SHAP is utilized to analyze the black-box decision-making process of the model,providing interpretability by quantifying the contribution of each feature to the prediction results and elucidating the relationships between features.The experimental results demonstrate that the network attack predictionmodel based on LGBM exhibits superior accuracy and outstanding predictive capabilities.Moreover,the SHAP-based interpretability analysis significantly improves the model’s transparency and interpretability.
基金support provided by The Science and Technology Development Fund,Macao SAR,China(File Nos.0057/2020/AGJ and SKL-IOTSC-2021-2023)Science and Technology Program of Guangdong Province,China(Grant No.2021A0505080009).
文摘Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。
基金supported in part by National Science Foundation of China under Grants No.61303105 and 61402304the Humanity&Social Science general project of Ministry of Education under Grants No.14YJAZH046+2 种基金the Beijing Natural Science Foundation under Grants No.4154065the Beijing Educational Committee Science and Technology Development Planned under Grants No.KM201410028017Academic Degree Graduate Courses group projects
文摘Collaborative Filtering(CF) is a leading approach to build recommender systems which has gained considerable development and popularity. A predominant approach to CF is rating prediction recommender algorithm, aiming to predict a user's rating for those items which were not rated yet by the user. However, with the increasing number of items and users, thedata is sparse.It is difficult to detectlatent closely relation among the items or users for predicting the user behaviors. In this paper,we enhance the rating prediction approach leading to substantial improvement of prediction accuracy by categorizing according to the genres of movies. Then the probabilities that users are interested in the genres are computed to integrate the prediction of each genre cluster. A novel probabilistic approach based on the sentiment analysis of the user reviews is also proposed to give intuitional explanations of why an item is recommended.To test the novel recommendation approach, a new corpus of user reviews on movies obtained from the Internet Movies Database(IMDB) has been generated. Experimental results show that the proposed framework is effective and achieves a better prediction performance.
基金supported by a Korea Agency for Infrastructure Technology Advancement(KAIA)grant funded by the Ministry of Land,Infrastructure,and Transport(Grant 22CTAP-C163951-02).
文摘Recently,convolutional neural network(CNN)-based visual inspec-tion has been developed to detect defects on building surfaces automatically.The CNN model demonstrates remarkable accuracy in image data analysis;however,the predicted results have uncertainty in providing accurate informa-tion to users because of the“black box”problem in the deep learning model.Therefore,this study proposes a visual explanation method to overcome the uncertainty limitation of CNN-based defect identification.The visual repre-sentative gradient-weights class activation mapping(Grad-CAM)method is adopted to provide visually explainable information.A visualizing evaluation index is proposed to quantitatively analyze visual representations;this index reflects a rough estimate of the concordance rate between the visualized heat map and intended defects.In addition,an ablation study,adopting three-branch combinations with the VGG16,is implemented to identify perfor-mance variations by visualizing predicted results.Experiments reveal that the proposed model,combined with hybrid pooling,batch normalization,and multi-attention modules,achieves the best performance with an accuracy of 97.77%,corresponding to an improvement of 2.49%compared with the baseline model.Consequently,this study demonstrates that reliable results from an automatic defect classification model can be provided to an inspector through the visual representation of the predicted results using CNN models.
基金by the National Basic Research Program of China under Grant No 2007AA01Z333the National Special Program of China under Grant No 2009ZX02204-008.
文摘The Kirk test has good precision for measuring stray light in optical lithography and is the usual method of measuring stray light.However,Kirk did not provide a theoretical explanation to his simulation model.We attempt to give Kirk's model a kind of theoretical explanation and a little improvement based on the model of point spread function of scattering and the theory of statistical optics.It is indicated by simulation that the improved model fits Kirk's measurement data better.
基金the Shanghai Science and Technology Innovation Action Plan(No.19511120400)the National Key Research and Development Project(No.2020AAA0109302)the Shanghai Municipal Science and Technology Major Project(No.2021SHZDZX0103)。
文摘Finding an attribute to explain the relationships between a given pair of entities is valuable in many applications.However,many direct solutions fail,owing to its low precision caused by heavy dependence on text and low recall by evidence scarcity.Thus,we propose a generalization-and-inference framework and implement it to build a system:entity-relationship finder(ERF).Our main idea is conceptualizing entity pairs into proper concept pairs,as intermediate random variables to form the explanation.Although entity conceptualization has been studied,it has new challenges of collective optimization for multiple relationship instances,joint optimization for both entities,and aggregation of diluted observations into the head concepts defining the relationship.We propose conceptualization solutions and validate them as well as the framework with extensive experiments.
文摘There is a puzzling astrophysical result concerning the latest observation of the absorption profile of the redshifted radio line 21 cm from the early Universe(as described in Bowman et al.). The amplitude of the profile was more than a factor of two greater than the largest predictions. This could mean that the primordial hydrogen gas was much cooler than expected. Some explanations in the literature suggested a possible cooling of baryons either by unspecified dark matter particles or by some exotic dark matter particles with a charge a million times smaller than the electron charge. Other explanations required an additional radio background. In the present paper, we entertain a possible different explanation for the above puzzling observational result: the explanation is based on the alternative kind of hydrogen atoms(AKHA),whose existence was previously demonstrated theoretically, as well as by the analysis of atomic experiments. Namely, the AKHA are expected to decouple from the cosmic microwave background(CMB) much earlier(in the course of the Universe expansion) than usual hydrogen atoms, so that the AKHA temperature is significantly lower than that of usual hydrogen atoms. This seems to lower the excitation(spin) temperature of the hyperfine doublet(responsible for the 21 cm line) sufficiently enough for explaining the above puzzling observational result. This possible explanation appears to be more specific and natural than the previous possible explanations. Further observational studies of the redshifted 21 cm radio line from the early Universe could help to verify which explanation is the most relevant.
文摘[Objective] The research aimed to analyze explanation effect of the European numerical prediction on temperature. [Method] Based on CMSVM regression method, by using 850 hPa grid point data of the European numerical prediction from 2003 to 2009 and actual data of the maximum and minimum temperatures at 8 automatic stations in Qingyang City, prediction model of the temperature was established, and running effect of the business from 2008 to 2010 was tested and evaluated. [Result] The method had very good guidance role in real-time business running of the temperature prediction. Test and evaluation found that as forecast time prolonged, prediction accuracies of the maximum and minimum temperatures declined. When temperature anomaly was higher (actual temperature was higher than historical mean), prediction accuracy increased. Influence of the European numerical prediction was bigger. [Conclusion] Compared with other methods, operation of the prediction method was convenient, modeling was automatic, running time was short, system was stable, and prediction accuracy was high. It was suitable for implementing of the explanation work for numerical prediction product at meteorological station.
文摘With large-scale engineering projects being carried out in China, a large number of fossil localities have been discovered and excavated by responsible agencies, but still some important fossils of great value have been removed and smuggled into foreign countries. In the last three years, more than 1345 fossil specimens have been intercepted by Customs in Shenzhen, Shanghai, Tianjin, Beijing and elsewhere, and more than 5000 fossils, most of which are listed as key fossils,
文摘Majorana zero modes in the hybrid semiconductor-superconductornanowire is one of the promising candidates for topologicalquantum computing. Recently, in nanowires with a superconductingisland, the signature of Majorana zero modescan be revealed as a subgap state whose energy oscillatesaround zero in magnetic field. This oscillation was interpretedas overlapping Majoranas. However, the oscillation amplitudeeither dies away after an overshoot or decays, sharply oppositeto the theoretically predicted enhanced oscillations for Majoranabound states, as the magnetic field increases. Several theoreticalstudies have tried to address this discrepancy, but arepartially successful. This discrepancy has raised the concernson the conclusive identification of Majorana bound states, andhas even endangered the scheme of Majorana qubits basedon the nanowires.
文摘The flow regimes of GLCC with horizon inlet and a vertical pipe are investigated in experiments,and the velocities and pressure drops data labeled by the corresponding flow regimes are collected.Combined with the flow regimes data of other GLCC positions from other literatures in existence,the gas and liquid superficial velocities and pressure drops are used as the input of the machine learning algorithms respectively which are applied to identify the flow regimes.The choosing of input data types takes the availability of data for practical industry fields into consideration,and the twelve machine learning algorithms are chosen from the classical and popular algorithms in the area of classification,including the typical ensemble models,SVM,KNN,Bayesian Model and MLP.The results of flow regimes identification show that gas and liquid superficial velocities are the ideal type of input data for the flow regimes identification by machine learning.Most of the ensemble models can identify the flow regimes of GLCC by gas and liquid velocities with the accuracy of 0.99 and more.For the pressure drops as the input of each algorithm,it is not the suitable as gas and liquid velocities,and only XGBoost and Bagging Tree can identify the GLCC flow regimes accurately.The success and confusion of each algorithm are analyzed and explained based on the experimental phenomena of flow regimes evolution processes,the flow regimes map,and the principles of algorithms.The applicability and feasibility of each algorithm according to different types of data for GLCC flow regimes identification are proposed.
基金This work was partially supported by Beijing Natural Science Foundation(No.4222038)by Open Research Project of the State Key Laboratory of Media Convergence and Communication(Communication University of China),by the National Key RD Program of China(No.2021YFF0307600)and by Fundamental Research Funds for the Central Universities.
文摘Existing explanation methods for Convolutional Neural Networks(CNNs)lack the pixel-level visualization explanations to generate the reliable fine-grained decision features.Since there are inconsistencies between the explanation and the actual behavior of the model to be interpreted,we propose a Fine-Grained Visual Explanation for CNN,namely F-GVE,which produces a fine-grained explanation with higher consistency to the decision of the original model.The exact backward class-specific gradients with respect to the input image is obtained to highlight the object-related pixels the model used to make prediction.In addition,for better visualization and less noise,F-GVE selects an appropriate threshold to filter the gradient during the calculation and the explanation map is obtained by element-wise multiplying the gradient and the input image to show fine-grained classification decision features.Experimental results demonstrate that F-GVE has good visual performances and highlights the importance of fine-grained decision features.Moreover,the faithfulness of the explanation in this paper is high and it is effective and practical on troubleshooting and debugging detection.
文摘DyTiFe_(11) compound is a ferromagnetic substance.It has tetragonal body-centered ThMn_(12)-type crystallographic structure.At room temperature,the easy magnetization direction is the c-axis.A spin reorientation begins to appear at about 175K.The contribution of Fe sublattice to magnetocrystalline anisotropy was determined by experiments and that of Dy sublattice was obtained by using single ion model calculation.Results show that the spin reorientation arises from the competition of anisotropy between Fe and Dy sublattices.
文摘In the letter to the editor, Dr. Comings et al. proposed a potential explanation of our findings that the L allele rather than S allele of 5-HTTLPR was associated with higher anxiety levels and reduced amygdala-prefrontal cortex (PFC) connectivity in Han Chinese[1], which demonstrated an 'allele reversal' in the genetics of the 5-HTTLPR gene in Asians versus Caucasians. The authors alleged that this 'allele reversal' might simply result from maternal age and suggested that we test this on our datasets. Unfortunately,