BACKGROUND Ischemic heart disease(IHD)impacts the quality of life and has the highest mortality rate of cardiovascular diseases globally.AIM To compare variations in the parameters of the single-lead electrocardiogram...BACKGROUND Ischemic heart disease(IHD)impacts the quality of life and has the highest mortality rate of cardiovascular diseases globally.AIM To compare variations in the parameters of the single-lead electrocardiogram(ECG)during resting conditions and physical exertion in individuals diagnosed with IHD and those without the condition using vasodilator-induced stress computed tomography(CT)myocardial perfusion imaging as the diagnostic reference standard.METHODS This single center observational study included 80 participants.The participants were aged≥40 years and given an informed written consent to participate in the study.Both groups,G1(n=31)with and G2(n=49)without post stress induced myocardial perfusion defect,passed cardiologist consultation,anthropometric measurements,blood pressure and pulse rate measurement,echocardiography,cardio-ankle vascular index,bicycle ergometry,recording 3-min single-lead ECG(Cardio-Qvark)before and just after bicycle ergometry followed by performing CT myocardial perfusion.The LASSO regression with nested cross-validation was used to find the association between Cardio-Qvark parameters and the existence of the perfusion defect.Statistical processing was performed with the R programming language v4.2,Python v.3.10[^R],and Statistica 12 program.RESULTS Bicycle ergometry yielded an area under the receiver operating characteristic curve of 50.7%[95%confidence interval(CI):0.388-0.625],specificity of 53.1%(95%CI:0.392-0.673),and sensitivity of 48.4%(95%CI:0.306-0.657).In contrast,the Cardio-Qvark test performed notably better with an area under the receiver operating characteristic curve of 67%(95%CI:0.530-0.801),specificity of 75.5%(95%CI:0.628-0.88),and sensitivity of 51.6%(95%CI:0.333-0.695).CONCLUSION The single-lead ECG has a relatively higher diagnostic accuracy compared with bicycle ergometry by using machine learning models,but the difference was not statistically significant.However,further investigations are required to uncover the hidden capabilities of single-lead ECG in IHD diagnosis.展开更多
BACKGROUND Colorectal cancer is a common digestive malignancy,and chemotherapy remains a cornerstone of treatment.Myelosuppression,a frequent hematologic toxicity,poses significant clinical challenges.However,no inter...BACKGROUND Colorectal cancer is a common digestive malignancy,and chemotherapy remains a cornerstone of treatment.Myelosuppression,a frequent hematologic toxicity,poses significant clinical challenges.However,no interpretable machine learning-based nomogram exists to predict chemotherapy-induced myelosuppression in colorectal cancer patients.This study aimed to develop and validate an inter-pretable clinic-machine learning nomogram integrating clinical predictors with multiple algorithms via a feature mapping algorithm.The model provides accurate risk estimation and clinical interpretability,supporting individualized prevention strategies and optimizing decision-making in patients receiving first-line chemotherapy.AIM To develop and validate an interpretable clinic-machine learning nomogram predicting chemotherapy-induced myelosuppression in colorectal cancer.METHODS This retrospective study enrolled 855 colorectal cancer patients receiving first-line chemotherapy.Data were split into training(n=612),validation(n=153),and testing(n=90)cohorts.Ten predictors were identified through least absolute shrinkage and selection operator,decision tree,random forest,and expert con-sensus.Ten machine learning algorithms were applied,with performance assessed by area under the receiver operating characteristic curve(AUC),area under the precision-recall curve(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.RESULTS A total of 855 colorectal cancer patients were enrolled,with 765 cases(April 2020 to December 2023)used for model training and validation,and 90 cases(January 2024 to July 2024)for internal testing.Baseline clinical features did not differ significantly between training and validation cohorts(P>0.05).Ten predictors were identified through integrated feature selection and expert consensus,including age,body surface area,body mass index,tumor position,albumin,carcinoembryonic antigen,carbohydrate antigen(CA)19-9,CA125,chemotherapy regimen,and chemotherapy cycles.Among ten machine learning algorithms,extreme gradient boosting achieved the best validation performance(AUC=0.97,AUPRC=0.92,sensitivity=0.79,specificity=0.92,accuracy=0.88).Logistic regression confirmed extra trees and random forest as independent predictors,which were incorporated into a clinic-machine learning nomogram.The clinic-machine learning nomogram demonstrated superior discrimination(AUC=0.96,AUPRC=0.93,accuracy=0.90,specificity=0.95),good calibration,and greater net clinical benefit across a wide probability range(10%-90%).Internal testing further confirmed its robustness and generalizability(AUC=0.95).CONCLUSION The clinic-machine learning nomogram accurately predicts chemotherapy-induced myelosuppression in colorectal cancer,providing interpretability and clinical utility to support individualized risk assessment and treatment decision-making.展开更多
Objective Accurately identifying the key influencing factors of psychological birth trauma in primiparous women is crucial for implementing effective preventive and intervention measures.This study aimed to develop an...Objective Accurately identifying the key influencing factors of psychological birth trauma in primiparous women is crucial for implementing effective preventive and intervention measures.This study aimed to develop and validate an interpretable machine learning prediction model for identifying the key influencing factors of psychological birth trauma in primiparous women.Methods A multicenter cross-sectional study was conducted on primiparous women in four tertiary hospitals in Sichuan Province,southwestern China,from December 2023 to March 2024.The Childbirth Trauma Index was used in assessing psychological birth trauma in primiparous women.Data were collected and randomly divided into a training set(80%,n=289)and a testing set(20%,n=73).Six different machine learning models were trained and tested.Training and prediction were conducted using six machine learning models included Linear Regression,Support Vector Regression,Multilayer Perceptron Regression,eXtreme Gradient Boosting Regression,Random Forest Regression,and Adaptive Boosting Regression.The optimal model was selected based on various performance metrics,and its predictive results were interpreted using SHapley Additive exPlanations(SHAP)and accumulated local effects(ALE).Results Among the six machine learning models,the Multilayer Perceptron Regression model exhibited the best overall performance in the testing set(MAE=3.977,MSE=24.832,R2=0.507,EVS=0.524,RMSE=4.983).In the testing set,the R2 and EVS of the Multilayer Perceptron Regression model increased by 8.3%and 1.2%,respectively,compared to the traditional linear regression model.Meanwhile,the MAE,MSE,and RMSE decreased by 0.4%,7.3%,and 3.7%,respectively,compared to the traditional linear regression model.The SHAP analysis indicated that intrapartum pain,anxiety,postpartum pain,resilience,and planned pregnancy are the most critical influencing factors of psychological birth trauma in primiparous women.The ALE analysis indicated that higher intrapartum pain,anxiety,and postpartum pain scores are risk factors,while higher resilience scores are protective factors.Conclusions Interpretable machine learning prediction models can identify the key influencing factors of psychological birth trauma in primiparous women.SHAP and ALE analyses based on the Multilayer Perceptron Regression model can help healthcare providers understand the complex decision-making logic within a prediction model.This study provides a scientific basis for the early prevention and personalized intervention of psychological birth trauma in primiparous women.展开更多
BACKGROUND Pancreatic fistula is the most common complication of pancreatic surgeries that causes more serious conditions,including bleeding due to visceral vessel erosion and peritonitis.AIM To develop a machine lear...BACKGROUND Pancreatic fistula is the most common complication of pancreatic surgeries that causes more serious conditions,including bleeding due to visceral vessel erosion and peritonitis.AIM To develop a machine learning(ML)model for postoperative pancreatic fistula and identify significant risk factors of the complication.METHODS A single-center retrospective clinical study was conducted which included 150 patients,who underwent pancreat-oduodenectomy.Logistic regression,random forest,and CatBoost were employed for modeling the biochemical leak(symptomless fistula)and fistula grade B/C(clinically significant complication).The performance was estimated by receiver operating characteristic(ROC)area under the curve(AUC)after 5-fold cross-validation(20%testing and 80%training data).The risk factors were evaluated with the most accurate algorithm,based on the parameter“Importance”(Im),and Kendall correlation,P<0.05.RESULTS The CatBoost algorithm was the most accurate with an AUC of 74%-86%.The study provided results of ML-based modeling and algorithm selection for pancreatic fistula prediction and risk factor evaluation.From 14 parameters we selected the main pre-and intraoperative prognostic factors of all the fistulas:Tumor vascular invasion(Im=24.8%),age(Im=18.6%),and body mass index(Im=16.4%),AUC=74%.The ML model showed that biochemical leak,blood and drain amylase level(Im=21.6%and 16.4%),and blood leukocytes(Im=11.2%)were crucial predictors for subsequent fistula B/C,AUC=86%.Surgical techniques,morphology,and pancreatic duct diameter less than 3 mm were insignificant(Im<5%and no correlations detected).The results were confirmed by correlation analysis.CONCLUSION This study highlights the key predictors of postoperative pancreatic fistula and establishes a robust ML-based model for individualized risk prediction.These findings contribute to the advancement of personalized periop-erative care and may guide targeted preventive strategies.展开更多
Moisture accumulation within road pavements,particularly in unbound granular materials with or without thin sprayed seals,presents significant challenges in high-rainfall regions such as Queensland.This infiltration o...Moisture accumulation within road pavements,particularly in unbound granular materials with or without thin sprayed seals,presents significant challenges in high-rainfall regions such as Queensland.This infiltration often leads to various forms of pavement distress,eventually causing irreversible damage to the pavement structure.The moisture content within pavements exhibits considerable dynamism and directly influenced by environmental factors such as precipitation,air temperature,and relative humidity.This variability underscores the importance of monitoring moisture changes using real-time climatic data to assess pavement conditions for operational management or incorporating these effects during pavement design based on historical climate data.Consequently,there is an increasing demand for advanced,technology-driven methodologies to predict moisture variations based on climatic inputs.Addressing this gap,the present study employs five traditional machine learning(ML)algorithms,K-nearest neighbors(KNN),regression trees,random forest,support vector machines(SVMs),and gaussian process regression(GPR),to forecast moisture levels within pavement layers over time,with varying algorithm complexities.Using data collected from an instrumented road in Brisbane,Australia,which includes pavement moisture and climatic factors,the study develops predictive models to forecast moisture content at future time steps.The approach incorporates current moisture content,rather than averaged values,along with seasonality(both daily and annual),and key climatic factors to predict next step moisture.Model performance is evaluated using R2,MSE,RMSE,and MAPE metrics.Results show that ML algorithms can reliably predict long-term moisture variations in pavements,provided optimal hyperparameters are selected for each algorithm.The best-performing algorithms include KNN(the number of neighbours equals to 15),medium regression tree,medium random forest,coarse SVM,and simple GPR,with medium random forest outperforming the others.The study also identifies the optimal hyperparameter combinations for each algorithm,offering significant advancements in moisture prediction tools for pavement technology。展开更多
BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR...BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR)is a common polypectomy proce-dure in clinical practice,but it has a high postoperative recurrence rate.Currently,there is no predictive model for the recurrence of colorectal polyps after EMR.AIM To construct and validate a machine learning(ML)model for predicting the risk of colorectal polyp recurrence one year after EMR.METHODS This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou.Additionally,a total of 166 patients were collected to form a prospective validation set.Feature variable screening was conducted using uni-variate and multivariate logistic regression analyses,and five ML algorithms were used to construct the predictive models.The optimal models were evaluated based on different performance metrics.Decision curve analysis(DCA)and SHapley Additive exPlanation(SHAP)analysis were performed to assess clinical applicability and predictor importance.RESULTS Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR(P<0.05).Among the models,eXtreme Gradient Boosting(XGBoost)demonstrated the highest area under the curve(AUC)in the training set,internal validation set,and prospective validation set,with AUCs of 0.909(95%CI:0.89-0.92),0.921(95%CI:0.90-0.94),and 0.963(95%CI:0.94-0.99),respectively.DCA indicated favorable clinical utility for the XGBoost model.SHAP analysis identified smoking history,family history,and age as the top three most important predictors in the model.CONCLUSION The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.展开更多
Objective:As an age-related neurodegenerative disease,the prevalence of mild cognitive impairment(MCI)increases with age.Within the framework of traditional Chinese medicine,spleen-kidney deficiency syndrome(SKDS)is r...Objective:As an age-related neurodegenerative disease,the prevalence of mild cognitive impairment(MCI)increases with age.Within the framework of traditional Chinese medicine,spleen-kidney deficiency syndrome(SKDS)is recognized as the most frequent MCI subtype.Due to the covert and gradual onset of MCI,in community settings it poses a significant challenge for patients and their families to discern between typical aging and pathological changes.There exists an urgent need to devise a preliminary diagnostic tool designed for community-residing older adults with MCI attributed to SKDS(MCI-SKDS).Methods:This investigation enrolled 312 elderly individuals diagnosed with MCI,who were randomly distributed into training and test datasets at a 3:1 ratio.Five machine learning methods,including logistic regression(LR),decision tree(DT),naive Bayes(NB),support vector machine(SVM),and gradient boosting(GB),were used to build a diagnostic prediction model for MCI-SKDS.Accuracy,sensitivity,specificity,precision,F1 score,and area under the curve were used to evaluate model performance.Furthermore,the clinical applicability of the model was evaluated through decision curve analysis(DCA).Results:The accuracy,precision,specificity and F1 score of the DT model performed best in the training set(test set),with scores of 0.904(0.845),0.875(0.795),0.973(0.875)and 0.973(0.875).The sensitivity of the training set(test set)of the SVM model performed best among the five models with a score of 0.865(0.821).The area under the curve of all five models was greater than 0.9 for the training dataset and greater than 0.8 for the test dataset.The DCA of all models showed good clinical application value.The study identified ten indicators that were significant predictors of MCI-SKDS.Conclusion:The risk prediction index derived from machine learning for the MCI-SKDS prediction model is simple and practical;the model demonstrates good predictive value and clinical applicability,and the DT model had the best performance.展开更多
The endpoint carbon content in the converter is critical for the quality of steel products,and accurately predicting this parameter is an effective way to reduce alloy consumption and improve smelting efficiency.Howev...The endpoint carbon content in the converter is critical for the quality of steel products,and accurately predicting this parameter is an effective way to reduce alloy consumption and improve smelting efficiency.However,most scholars currently focus on modifying methods to enhance model accuracy,while overlooking the extent to which input parameters influence accuracy.To address this issue,in this study,a prediction model for the endpoint carbon content in the converter was developed using factor analysis(FA)and support vector machine(SVM)optimized by improved particle swarm optimization(IPSO).Analysis of the factors influencing the endpoint carbon content during the converter smelting process led to the identification of 21 input parameters.Subsequently,FA was used to reduce the dimensionality of the data and applied to the prediction model.The results demonstrate that the performance of the FA-IPSO-SVM model surpasses several existing methods,such as twin support vector regression and support vector machine.The model achieves hit rates of 89.59%,96.21%,and 98.74%within error ranges of±0.01%,±0.015%,and±0.02%,respectively.Finally,based on the prediction results obtained by sequentially removing input parameters,the parameters were classified into high influence(5%-7%),medium influence(2%-5%),and low influence(0-2%)categories according to their varying degrees of impact on prediction accuracy.This classi-fication provides a reference for selecting input parameters in future prediction models for endpoint carbon content.展开更多
Studies have shown that vascular dysfunction is closely related to the pathogenesis of Alzheimer's disease.The middle temporal gyrus region of the brain is susceptible to pronounced impairment in Alzheimer's d...Studies have shown that vascular dysfunction is closely related to the pathogenesis of Alzheimer's disease.The middle temporal gyrus region of the brain is susceptible to pronounced impairment in Alzheimer's disease.Identification of the molecules involved in vascular aberrance of the middle temporal gyrus would support elucidation of the mechanisms underlying Alzheimer's disease and discove ry of novel targets for intervention.We carried out single-cell transcriptomic analysis of the middle temporal gyrus in the brains of patients with Alzheimer's disease and healthy controls,revealing obvious changes in vascular function.CellChat analysis of intercellular communication in the middle temporal gyrus showed that the number of cell interactions in this region was decreased in Alzheimer's disease patients,with altered intercellular communication of endothelial cells and pericytes being the most prominent.Differentially expressed genes were also identified.Using the CellChat results,AUCell evaluation of the pathway activity of specific cells showed that the obvious changes in vascular function in the middle temporal gyrus in Alzheimer's disease were directly related to changes in the vascular endothelial growth factor(VEGF)A-VEGF receptor(VEGFR)2 pathway.AUCell analysis identified subtypes of endothelial cells and pericytes directly related to VEGFA-VEGFR2 pathway activity.Two subtypes of middle temporal gyrus cells showed significant alteration in AD:endothelial cells with high expression of Erb-B2 receptor tyrosine kinase 4(ERBB4^(high))and pericytes with high expression of angiopoietin-like 4(ANGPTL4^(high)).Finally,combining bulk RNA sequencing data and two machine learning algorithms(least absolute shrinkage and selection operator and random forest),four characteristic Alzheimer's disease feature genes were identified:somatostatin(SST),protein tyrosine phosphatase non-receptor type 3(PTPN3),glutinase(GL3),and tropomyosin 3(PTM3).These genes were downregulated in the middle temporal gyrus of patients with Alzheimer's disease and may be used to target the VEGF pathway.Alzheimer's disease mouse models demonstrated consistent altered expression of these genes in the middle temporal gyrus.In conclusion,this study detected changes in intercellular communication between endothelial cells and pericytes in the middle temporal gyrus and identified four novel feature genes related to middle temporal gyrus and vascular functioning in patients with Alzheimer's disease.These findings contribute to a deeper understanding of the molecular mechanisms underlying Alzheimer's disease and present novel treatment targets.展开更多
Silicone material extrusion(MEX)is widely used for processing liquids and pastes.Owing to the uneven linewidth and elastic extrusion deformation caused by material accumulation,products may exhibit geometric errors an...Silicone material extrusion(MEX)is widely used for processing liquids and pastes.Owing to the uneven linewidth and elastic extrusion deformation caused by material accumulation,products may exhibit geometric errors and performance defects,leading to a decline in product quality and affecting its service life.This study proposes a process parameter optimization method that considers the mechanical properties of printed specimens and production costs.To improve the quality of silicone printing samples and reduce production costs,three machine learning models,kernel extreme learning machine(KELM),support vector regression(SVR),and random forest(RF),were developed to predict these three factors.Training data were obtained through a complete factorial experiment.A new dataset is obtained using the Euclidean distance method,which assigns the elimination factor.It is trained with Bayesian optimization algorithms for parameter optimization,the new dataset is input into the improved double Gaussian extreme learning machine,and finally obtains the improved KELM model.The results showed improved prediction accuracy over SVR and RF.Furthermore,a multi-objective optimization framework was proposed by combining genetic algorithm technology with the improved KELM model.The effectiveness and reasonableness of the model algorithm were verified by comparing the optimized results with the experimental results.展开更多
Factorization machine(FM)is a prevalent approach to modelling pairwise(second-order)feature interactions when dealing with high-dimensional sparse data.However,on the one hand,FMs fail to capture higher-order feature ...Factorization machine(FM)is a prevalent approach to modelling pairwise(second-order)feature interactions when dealing with high-dimensional sparse data.However,on the one hand,FMs fail to capture higher-order feature interactions suffering from combinatorial expansion.On the other hand,taking into account interactions between every pair of features may introduce noise and degrade the prediction accuracy.To solve these problems,we propose a novel approach,the graph factorization machine(GraphFM),which naturally represents features in the graph structure.In particular,we design a mechanism to select beneficial feature interactions and formulate them as edges between features.Then the proposed model,which integrates the interaction function of the FM into the feature aggregation strategy of the graph neural network(GNN),can model arbitrary-order feature interactions on graph-structured features by stacking layers.Experimental results on several real-world datasets demonstrate the rationality and effectiveness of our proposed approach.The code and data are available at https://github.com/CRIPAC-DIG/GraphCTR.展开更多
基金Supported by Government Assignment,No.1023022600020-6RSF Grant,No.24-15-00549Ministry of Science and Higher Education of the Russian Federation within the Framework of State Support for the Creation and Development of World-Class Research Center,No.075-15-2022-304.
文摘BACKGROUND Ischemic heart disease(IHD)impacts the quality of life and has the highest mortality rate of cardiovascular diseases globally.AIM To compare variations in the parameters of the single-lead electrocardiogram(ECG)during resting conditions and physical exertion in individuals diagnosed with IHD and those without the condition using vasodilator-induced stress computed tomography(CT)myocardial perfusion imaging as the diagnostic reference standard.METHODS This single center observational study included 80 participants.The participants were aged≥40 years and given an informed written consent to participate in the study.Both groups,G1(n=31)with and G2(n=49)without post stress induced myocardial perfusion defect,passed cardiologist consultation,anthropometric measurements,blood pressure and pulse rate measurement,echocardiography,cardio-ankle vascular index,bicycle ergometry,recording 3-min single-lead ECG(Cardio-Qvark)before and just after bicycle ergometry followed by performing CT myocardial perfusion.The LASSO regression with nested cross-validation was used to find the association between Cardio-Qvark parameters and the existence of the perfusion defect.Statistical processing was performed with the R programming language v4.2,Python v.3.10[^R],and Statistica 12 program.RESULTS Bicycle ergometry yielded an area under the receiver operating characteristic curve of 50.7%[95%confidence interval(CI):0.388-0.625],specificity of 53.1%(95%CI:0.392-0.673),and sensitivity of 48.4%(95%CI:0.306-0.657).In contrast,the Cardio-Qvark test performed notably better with an area under the receiver operating characteristic curve of 67%(95%CI:0.530-0.801),specificity of 75.5%(95%CI:0.628-0.88),and sensitivity of 51.6%(95%CI:0.333-0.695).CONCLUSION The single-lead ECG has a relatively higher diagnostic accuracy compared with bicycle ergometry by using machine learning models,but the difference was not statistically significant.However,further investigations are required to uncover the hidden capabilities of single-lead ECG in IHD diagnosis.
基金Supported by the Beijing Municipal Natural Science Foundation,No.7252262High Level Chinese Medical Hospital Promotion Project,No.HLCMHPP2023085+2 种基金National Natural Science Foundation of China,No.82174463National Administration of Traditional Chinese Medicine,No.ZYYCXTD-C-C202205China Academy of Chinese Medical Sciences,No.CI2021A01804 and No.2022S469.
文摘BACKGROUND Colorectal cancer is a common digestive malignancy,and chemotherapy remains a cornerstone of treatment.Myelosuppression,a frequent hematologic toxicity,poses significant clinical challenges.However,no interpretable machine learning-based nomogram exists to predict chemotherapy-induced myelosuppression in colorectal cancer patients.This study aimed to develop and validate an inter-pretable clinic-machine learning nomogram integrating clinical predictors with multiple algorithms via a feature mapping algorithm.The model provides accurate risk estimation and clinical interpretability,supporting individualized prevention strategies and optimizing decision-making in patients receiving first-line chemotherapy.AIM To develop and validate an interpretable clinic-machine learning nomogram predicting chemotherapy-induced myelosuppression in colorectal cancer.METHODS This retrospective study enrolled 855 colorectal cancer patients receiving first-line chemotherapy.Data were split into training(n=612),validation(n=153),and testing(n=90)cohorts.Ten predictors were identified through least absolute shrinkage and selection operator,decision tree,random forest,and expert con-sensus.Ten machine learning algorithms were applied,with performance assessed by area under the receiver operating characteristic curve(AUC),area under the precision-recall curve(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.RESULTS A total of 855 colorectal cancer patients were enrolled,with 765 cases(April 2020 to December 2023)used for model training and validation,and 90 cases(January 2024 to July 2024)for internal testing.Baseline clinical features did not differ significantly between training and validation cohorts(P>0.05).Ten predictors were identified through integrated feature selection and expert consensus,including age,body surface area,body mass index,tumor position,albumin,carcinoembryonic antigen,carbohydrate antigen(CA)19-9,CA125,chemotherapy regimen,and chemotherapy cycles.Among ten machine learning algorithms,extreme gradient boosting achieved the best validation performance(AUC=0.97,AUPRC=0.92,sensitivity=0.79,specificity=0.92,accuracy=0.88).Logistic regression confirmed extra trees and random forest as independent predictors,which were incorporated into a clinic-machine learning nomogram.The clinic-machine learning nomogram demonstrated superior discrimination(AUC=0.96,AUPRC=0.93,accuracy=0.90,specificity=0.95),good calibration,and greater net clinical benefit across a wide probability range(10%-90%).Internal testing further confirmed its robustness and generalizability(AUC=0.95).CONCLUSION The clinic-machine learning nomogram accurately predicts chemotherapy-induced myelosuppression in colorectal cancer,providing interpretability and clinical utility to support individualized risk assessment and treatment decision-making.
基金supported by the Sichuan Province Nursing Scientific Research Project Plan(H23022)the 2022 Municipal-University Science and Technology Strategic Cooperation Special Fund of Nanchong Science and Technology Bureau(22SXQT0222)。
文摘Objective Accurately identifying the key influencing factors of psychological birth trauma in primiparous women is crucial for implementing effective preventive and intervention measures.This study aimed to develop and validate an interpretable machine learning prediction model for identifying the key influencing factors of psychological birth trauma in primiparous women.Methods A multicenter cross-sectional study was conducted on primiparous women in four tertiary hospitals in Sichuan Province,southwestern China,from December 2023 to March 2024.The Childbirth Trauma Index was used in assessing psychological birth trauma in primiparous women.Data were collected and randomly divided into a training set(80%,n=289)and a testing set(20%,n=73).Six different machine learning models were trained and tested.Training and prediction were conducted using six machine learning models included Linear Regression,Support Vector Regression,Multilayer Perceptron Regression,eXtreme Gradient Boosting Regression,Random Forest Regression,and Adaptive Boosting Regression.The optimal model was selected based on various performance metrics,and its predictive results were interpreted using SHapley Additive exPlanations(SHAP)and accumulated local effects(ALE).Results Among the six machine learning models,the Multilayer Perceptron Regression model exhibited the best overall performance in the testing set(MAE=3.977,MSE=24.832,R2=0.507,EVS=0.524,RMSE=4.983).In the testing set,the R2 and EVS of the Multilayer Perceptron Regression model increased by 8.3%and 1.2%,respectively,compared to the traditional linear regression model.Meanwhile,the MAE,MSE,and RMSE decreased by 0.4%,7.3%,and 3.7%,respectively,compared to the traditional linear regression model.The SHAP analysis indicated that intrapartum pain,anxiety,postpartum pain,resilience,and planned pregnancy are the most critical influencing factors of psychological birth trauma in primiparous women.The ALE analysis indicated that higher intrapartum pain,anxiety,and postpartum pain scores are risk factors,while higher resilience scores are protective factors.Conclusions Interpretable machine learning prediction models can identify the key influencing factors of psychological birth trauma in primiparous women.SHAP and ALE analyses based on the Multilayer Perceptron Regression model can help healthcare providers understand the complex decision-making logic within a prediction model.This study provides a scientific basis for the early prevention and personalized intervention of psychological birth trauma in primiparous women.
文摘BACKGROUND Pancreatic fistula is the most common complication of pancreatic surgeries that causes more serious conditions,including bleeding due to visceral vessel erosion and peritonitis.AIM To develop a machine learning(ML)model for postoperative pancreatic fistula and identify significant risk factors of the complication.METHODS A single-center retrospective clinical study was conducted which included 150 patients,who underwent pancreat-oduodenectomy.Logistic regression,random forest,and CatBoost were employed for modeling the biochemical leak(symptomless fistula)and fistula grade B/C(clinically significant complication).The performance was estimated by receiver operating characteristic(ROC)area under the curve(AUC)after 5-fold cross-validation(20%testing and 80%training data).The risk factors were evaluated with the most accurate algorithm,based on the parameter“Importance”(Im),and Kendall correlation,P<0.05.RESULTS The CatBoost algorithm was the most accurate with an AUC of 74%-86%.The study provided results of ML-based modeling and algorithm selection for pancreatic fistula prediction and risk factor evaluation.From 14 parameters we selected the main pre-and intraoperative prognostic factors of all the fistulas:Tumor vascular invasion(Im=24.8%),age(Im=18.6%),and body mass index(Im=16.4%),AUC=74%.The ML model showed that biochemical leak,blood and drain amylase level(Im=21.6%and 16.4%),and blood leukocytes(Im=11.2%)were crucial predictors for subsequent fistula B/C,AUC=86%.Surgical techniques,morphology,and pancreatic duct diameter less than 3 mm were insignificant(Im<5%and no correlations detected).The results were confirmed by correlation analysis.CONCLUSION This study highlights the key predictors of postoperative pancreatic fistula and establishes a robust ML-based model for individualized risk prediction.These findings contribute to the advancement of personalized periop-erative care and may guide targeted preventive strategies.
基金the financial and intellectual support provided by Queensland University of Technology(QUT),Australia,through its Higher Degree Research Program,which played a crucial role in the successful completion of this research study
文摘Moisture accumulation within road pavements,particularly in unbound granular materials with or without thin sprayed seals,presents significant challenges in high-rainfall regions such as Queensland.This infiltration often leads to various forms of pavement distress,eventually causing irreversible damage to the pavement structure.The moisture content within pavements exhibits considerable dynamism and directly influenced by environmental factors such as precipitation,air temperature,and relative humidity.This variability underscores the importance of monitoring moisture changes using real-time climatic data to assess pavement conditions for operational management or incorporating these effects during pavement design based on historical climate data.Consequently,there is an increasing demand for advanced,technology-driven methodologies to predict moisture variations based on climatic inputs.Addressing this gap,the present study employs five traditional machine learning(ML)algorithms,K-nearest neighbors(KNN),regression trees,random forest,support vector machines(SVMs),and gaussian process regression(GPR),to forecast moisture levels within pavement layers over time,with varying algorithm complexities.Using data collected from an instrumented road in Brisbane,Australia,which includes pavement moisture and climatic factors,the study develops predictive models to forecast moisture content at future time steps.The approach incorporates current moisture content,rather than averaged values,along with seasonality(both daily and annual),and key climatic factors to predict next step moisture.Model performance is evaluated using R2,MSE,RMSE,and MAPE metrics.Results show that ML algorithms can reliably predict long-term moisture variations in pavements,provided optimal hyperparameters are selected for each algorithm.The best-performing algorithms include KNN(the number of neighbours equals to 15),medium regression tree,medium random forest,coarse SVM,and simple GPR,with medium random forest outperforming the others.The study also identifies the optimal hyperparameter combinations for each algorithm,offering significant advancements in moisture prediction tools for pavement technology。
文摘BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR)is a common polypectomy proce-dure in clinical practice,but it has a high postoperative recurrence rate.Currently,there is no predictive model for the recurrence of colorectal polyps after EMR.AIM To construct and validate a machine learning(ML)model for predicting the risk of colorectal polyp recurrence one year after EMR.METHODS This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou.Additionally,a total of 166 patients were collected to form a prospective validation set.Feature variable screening was conducted using uni-variate and multivariate logistic regression analyses,and five ML algorithms were used to construct the predictive models.The optimal models were evaluated based on different performance metrics.Decision curve analysis(DCA)and SHapley Additive exPlanation(SHAP)analysis were performed to assess clinical applicability and predictor importance.RESULTS Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR(P<0.05).Among the models,eXtreme Gradient Boosting(XGBoost)demonstrated the highest area under the curve(AUC)in the training set,internal validation set,and prospective validation set,with AUCs of 0.909(95%CI:0.89-0.92),0.921(95%CI:0.90-0.94),and 0.963(95%CI:0.94-0.99),respectively.DCA indicated favorable clinical utility for the XGBoost model.SHAP analysis identified smoking history,family history,and age as the top three most important predictors in the model.CONCLUSION The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.
基金funded by the National Natural Science Foundation of China(No.82405530,81973921 and 72374068)the Science and Technology Research Project of Hubei Provincial Department of Education(No.B2023098)。
文摘Objective:As an age-related neurodegenerative disease,the prevalence of mild cognitive impairment(MCI)increases with age.Within the framework of traditional Chinese medicine,spleen-kidney deficiency syndrome(SKDS)is recognized as the most frequent MCI subtype.Due to the covert and gradual onset of MCI,in community settings it poses a significant challenge for patients and their families to discern between typical aging and pathological changes.There exists an urgent need to devise a preliminary diagnostic tool designed for community-residing older adults with MCI attributed to SKDS(MCI-SKDS).Methods:This investigation enrolled 312 elderly individuals diagnosed with MCI,who were randomly distributed into training and test datasets at a 3:1 ratio.Five machine learning methods,including logistic regression(LR),decision tree(DT),naive Bayes(NB),support vector machine(SVM),and gradient boosting(GB),were used to build a diagnostic prediction model for MCI-SKDS.Accuracy,sensitivity,specificity,precision,F1 score,and area under the curve were used to evaluate model performance.Furthermore,the clinical applicability of the model was evaluated through decision curve analysis(DCA).Results:The accuracy,precision,specificity and F1 score of the DT model performed best in the training set(test set),with scores of 0.904(0.845),0.875(0.795),0.973(0.875)and 0.973(0.875).The sensitivity of the training set(test set)of the SVM model performed best among the five models with a score of 0.865(0.821).The area under the curve of all five models was greater than 0.9 for the training dataset and greater than 0.8 for the test dataset.The DCA of all models showed good clinical application value.The study identified ten indicators that were significant predictors of MCI-SKDS.Conclusion:The risk prediction index derived from machine learning for the MCI-SKDS prediction model is simple and practical;the model demonstrates good predictive value and clinical applicability,and the DT model had the best performance.
基金financially supported by the National Natural Science Foundation of China(No.52174297).
文摘The endpoint carbon content in the converter is critical for the quality of steel products,and accurately predicting this parameter is an effective way to reduce alloy consumption and improve smelting efficiency.However,most scholars currently focus on modifying methods to enhance model accuracy,while overlooking the extent to which input parameters influence accuracy.To address this issue,in this study,a prediction model for the endpoint carbon content in the converter was developed using factor analysis(FA)and support vector machine(SVM)optimized by improved particle swarm optimization(IPSO).Analysis of the factors influencing the endpoint carbon content during the converter smelting process led to the identification of 21 input parameters.Subsequently,FA was used to reduce the dimensionality of the data and applied to the prediction model.The results demonstrate that the performance of the FA-IPSO-SVM model surpasses several existing methods,such as twin support vector regression and support vector machine.The model achieves hit rates of 89.59%,96.21%,and 98.74%within error ranges of±0.01%,±0.015%,and±0.02%,respectively.Finally,based on the prediction results obtained by sequentially removing input parameters,the parameters were classified into high influence(5%-7%),medium influence(2%-5%),and low influence(0-2%)categories according to their varying degrees of impact on prediction accuracy.This classi-fication provides a reference for selecting input parameters in future prediction models for endpoint carbon content.
基金supported by the Natural Science Foundation of Shanxi Province,No.20210302123299The Belt and Road Program of Shanxi Province,No.110000261420228002(both to CZ)。
文摘Studies have shown that vascular dysfunction is closely related to the pathogenesis of Alzheimer's disease.The middle temporal gyrus region of the brain is susceptible to pronounced impairment in Alzheimer's disease.Identification of the molecules involved in vascular aberrance of the middle temporal gyrus would support elucidation of the mechanisms underlying Alzheimer's disease and discove ry of novel targets for intervention.We carried out single-cell transcriptomic analysis of the middle temporal gyrus in the brains of patients with Alzheimer's disease and healthy controls,revealing obvious changes in vascular function.CellChat analysis of intercellular communication in the middle temporal gyrus showed that the number of cell interactions in this region was decreased in Alzheimer's disease patients,with altered intercellular communication of endothelial cells and pericytes being the most prominent.Differentially expressed genes were also identified.Using the CellChat results,AUCell evaluation of the pathway activity of specific cells showed that the obvious changes in vascular function in the middle temporal gyrus in Alzheimer's disease were directly related to changes in the vascular endothelial growth factor(VEGF)A-VEGF receptor(VEGFR)2 pathway.AUCell analysis identified subtypes of endothelial cells and pericytes directly related to VEGFA-VEGFR2 pathway activity.Two subtypes of middle temporal gyrus cells showed significant alteration in AD:endothelial cells with high expression of Erb-B2 receptor tyrosine kinase 4(ERBB4^(high))and pericytes with high expression of angiopoietin-like 4(ANGPTL4^(high)).Finally,combining bulk RNA sequencing data and two machine learning algorithms(least absolute shrinkage and selection operator and random forest),four characteristic Alzheimer's disease feature genes were identified:somatostatin(SST),protein tyrosine phosphatase non-receptor type 3(PTPN3),glutinase(GL3),and tropomyosin 3(PTM3).These genes were downregulated in the middle temporal gyrus of patients with Alzheimer's disease and may be used to target the VEGF pathway.Alzheimer's disease mouse models demonstrated consistent altered expression of these genes in the middle temporal gyrus.In conclusion,this study detected changes in intercellular communication between endothelial cells and pericytes in the middle temporal gyrus and identified four novel feature genes related to middle temporal gyrus and vascular functioning in patients with Alzheimer's disease.These findings contribute to a deeper understanding of the molecular mechanisms underlying Alzheimer's disease and present novel treatment targets.
基金supported by the National Key R&D Program of China(No.2022YFA1005204l)。
文摘Silicone material extrusion(MEX)is widely used for processing liquids and pastes.Owing to the uneven linewidth and elastic extrusion deformation caused by material accumulation,products may exhibit geometric errors and performance defects,leading to a decline in product quality and affecting its service life.This study proposes a process parameter optimization method that considers the mechanical properties of printed specimens and production costs.To improve the quality of silicone printing samples and reduce production costs,three machine learning models,kernel extreme learning machine(KELM),support vector regression(SVR),and random forest(RF),were developed to predict these three factors.Training data were obtained through a complete factorial experiment.A new dataset is obtained using the Euclidean distance method,which assigns the elimination factor.It is trained with Bayesian optimization algorithms for parameter optimization,the new dataset is input into the improved double Gaussian extreme learning machine,and finally obtains the improved KELM model.The results showed improved prediction accuracy over SVR and RF.Furthermore,a multi-objective optimization framework was proposed by combining genetic algorithm technology with the improved KELM model.The effectiveness and reasonableness of the model algorithm were verified by comparing the optimized results with the experimental results.
文摘目的:探讨胎盘生长因子(placental growth factor,PLGF)、可溶性fms样酪氨酸激酶-1(soluble fms-like tyrosine kinase-1,SFLT-1)和糖基化纤连蛋白(glycosylated fibronectin,GLYFN)检测对子痫前期的预测价值。方法:选择在无锡市妇幼保健院就诊的188例孕妇,分154例正常孕妇(对照组)和34例子痫前期患者(子痫组),应用免疫荧光法分别检测其在孕16~18周血清中PLGF、SFLT-1和GLYFN的浓度,比较子痫前期组和对照组各标志物的水平,并使用受试者操作特征曲线(receiver operating characteristic,ROC)对3种标志物的预测价值进行效能评估。结果:在妊娠中期,子痫前期组血清PLGF浓度低于对照组,SFLT-1及GLYFN浓度均高于对照组,3种标志物的差异均有统计学意义(3指标P=0.000)。95%置信区间的ROC曲线下面积(areas under the ROC curve,AUC)为,PLGF为0.941(0.907~0.974),SFLT-1为0.881(0.800~0.962),GLYFN为0.951(0.918~0.985),联合指标SFLT-1和GLYFN、3项指标联合检测在ROC曲线下面积(areas under the ROC curve,AUC)分别为0.968、0.986。结论:PLGF、SFLT-1、GLYFN 3种标志物水平在对照组和子痫前期组均存在明显差异,对子痫前期的发病具有一定的预测价值,SFLT-1联合PLGF、SFLT-1联合GLYFN、3项指标联合检测对子痫前期的预测价值高于任一单项指标。
基金supported by the National Science Foundation of China(No.62141608).
文摘Factorization machine(FM)is a prevalent approach to modelling pairwise(second-order)feature interactions when dealing with high-dimensional sparse data.However,on the one hand,FMs fail to capture higher-order feature interactions suffering from combinatorial expansion.On the other hand,taking into account interactions between every pair of features may introduce noise and degrade the prediction accuracy.To solve these problems,we propose a novel approach,the graph factorization machine(GraphFM),which naturally represents features in the graph structure.In particular,we design a mechanism to select beneficial feature interactions and formulate them as edges between features.Then the proposed model,which integrates the interaction function of the FM into the feature aggregation strategy of the graph neural network(GNN),can model arbitrary-order feature interactions on graph-structured features by stacking layers.Experimental results on several real-world datasets demonstrate the rationality and effectiveness of our proposed approach.The code and data are available at https://github.com/CRIPAC-DIG/GraphCTR.