Landslides pose a formidable natural hazard across the Qinghai-Tibet Plateau(QTP),endangering both ecosystems and human life.Identifying the driving factors behind landslides and accurately assessing susceptibility ar...Landslides pose a formidable natural hazard across the Qinghai-Tibet Plateau(QTP),endangering both ecosystems and human life.Identifying the driving factors behind landslides and accurately assessing susceptibility are key to mitigating disaster risk.This study integrated multi-source historical landslide data with 15 predictive factors and used several machine learning models—Random Forest(RF),Gradient Boosting Regression Trees(GBRT),Extreme Gradient Boosting(XGBoost),and Categorical Boosting(CatBoost)—to generate susceptibility maps.The Shapley additive explanation(SHAP)method was applied to quantify factor importance and explore their nonlinear effects.The results showed that:(1)CatBoost was the best-performing model(CA=0.938,AUC=0.980)in assessing landslide susceptibility,with altitude emerging as the most significant factor,followed by distance to roads and earthquake sites,precipitation,and slope;(2)the SHAP method revealed critical nonlinear thresholds,demonstrating that historical landslides were concentrated at mid-altitudes(1400-4000 m)and decreased markedly above 4000 m,with a parallel reduction in probability beyond 700 m from roads;and(3)landslide-prone areas,comprising 13%of the QTP,were concentrated in the southeastern and northeastern parts of the plateau.By integrating machine learning and SHAP analysis,this study revealed landslide hazard-prone areas and their driving factors,providing insights to support disaster management strategies and sustainable regional planning.展开更多
Objective Accurately identifying the key influencing factors of psychological birth trauma in primiparous women is crucial for implementing effective preventive and intervention measures.This study aimed to develop an...Objective Accurately identifying the key influencing factors of psychological birth trauma in primiparous women is crucial for implementing effective preventive and intervention measures.This study aimed to develop and validate an interpretable machine learning prediction model for identifying the key influencing factors of psychological birth trauma in primiparous women.Methods A multicenter cross-sectional study was conducted on primiparous women in four tertiary hospitals in Sichuan Province,southwestern China,from December 2023 to March 2024.The Childbirth Trauma Index was used in assessing psychological birth trauma in primiparous women.Data were collected and randomly divided into a training set(80%,n=289)and a testing set(20%,n=73).Six different machine learning models were trained and tested.Training and prediction were conducted using six machine learning models included Linear Regression,Support Vector Regression,Multilayer Perceptron Regression,eXtreme Gradient Boosting Regression,Random Forest Regression,and Adaptive Boosting Regression.The optimal model was selected based on various performance metrics,and its predictive results were interpreted using SHapley Additive exPlanations(SHAP)and accumulated local effects(ALE).Results Among the six machine learning models,the Multilayer Perceptron Regression model exhibited the best overall performance in the testing set(MAE=3.977,MSE=24.832,R2=0.507,EVS=0.524,RMSE=4.983).In the testing set,the R2 and EVS of the Multilayer Perceptron Regression model increased by 8.3%and 1.2%,respectively,compared to the traditional linear regression model.Meanwhile,the MAE,MSE,and RMSE decreased by 0.4%,7.3%,and 3.7%,respectively,compared to the traditional linear regression model.The SHAP analysis indicated that intrapartum pain,anxiety,postpartum pain,resilience,and planned pregnancy are the most critical influencing factors of psychological birth trauma in primiparous women.The ALE analysis indicated that higher intrapartum pain,anxiety,and postpartum pain scores are risk factors,while higher resilience scores are protective factors.Conclusions Interpretable machine learning prediction models can identify the key influencing factors of psychological birth trauma in primiparous women.SHAP and ALE analyses based on the Multilayer Perceptron Regression model can help healthcare providers understand the complex decision-making logic within a prediction model.This study provides a scientific basis for the early prevention and personalized intervention of psychological birth trauma in primiparous women.展开更多
The endpoint carbon content in the converter is critical for the quality of steel products,and accurately predicting this parameter is an effective way to reduce alloy consumption and improve smelting efficiency.Howev...The endpoint carbon content in the converter is critical for the quality of steel products,and accurately predicting this parameter is an effective way to reduce alloy consumption and improve smelting efficiency.However,most scholars currently focus on modifying methods to enhance model accuracy,while overlooking the extent to which input parameters influence accuracy.To address this issue,in this study,a prediction model for the endpoint carbon content in the converter was developed using factor analysis(FA)and support vector machine(SVM)optimized by improved particle swarm optimization(IPSO).Analysis of the factors influencing the endpoint carbon content during the converter smelting process led to the identification of 21 input parameters.Subsequently,FA was used to reduce the dimensionality of the data and applied to the prediction model.The results demonstrate that the performance of the FA-IPSO-SVM model surpasses several existing methods,such as twin support vector regression and support vector machine.The model achieves hit rates of 89.59%,96.21%,and 98.74%within error ranges of±0.01%,±0.015%,and±0.02%,respectively.Finally,based on the prediction results obtained by sequentially removing input parameters,the parameters were classified into high influence(5%-7%),medium influence(2%-5%),and low influence(0-2%)categories according to their varying degrees of impact on prediction accuracy.This classi-fication provides a reference for selecting input parameters in future prediction models for endpoint carbon content.展开更多
BACKGROUND Pancreatic fistula is the most common complication of pancreatic surgeries that causes more serious conditions,including bleeding due to visceral vessel erosion and peritonitis.AIM To develop a machine lear...BACKGROUND Pancreatic fistula is the most common complication of pancreatic surgeries that causes more serious conditions,including bleeding due to visceral vessel erosion and peritonitis.AIM To develop a machine learning(ML)model for postoperative pancreatic fistula and identify significant risk factors of the complication.METHODS A single-center retrospective clinical study was conducted which included 150 patients,who underwent pancreat-oduodenectomy.Logistic regression,random forest,and CatBoost were employed for modeling the biochemical leak(symptomless fistula)and fistula grade B/C(clinically significant complication).The performance was estimated by receiver operating characteristic(ROC)area under the curve(AUC)after 5-fold cross-validation(20%testing and 80%training data).The risk factors were evaluated with the most accurate algorithm,based on the parameter“Importance”(Im),and Kendall correlation,P<0.05.RESULTS The CatBoost algorithm was the most accurate with an AUC of 74%-86%.The study provided results of ML-based modeling and algorithm selection for pancreatic fistula prediction and risk factor evaluation.From 14 parameters we selected the main pre-and intraoperative prognostic factors of all the fistulas:Tumor vascular invasion(Im=24.8%),age(Im=18.6%),and body mass index(Im=16.4%),AUC=74%.The ML model showed that biochemical leak,blood and drain amylase level(Im=21.6%and 16.4%),and blood leukocytes(Im=11.2%)were crucial predictors for subsequent fistula B/C,AUC=86%.Surgical techniques,morphology,and pancreatic duct diameter less than 3 mm were insignificant(Im<5%and no correlations detected).The results were confirmed by correlation analysis.CONCLUSION This study highlights the key predictors of postoperative pancreatic fistula and establishes a robust ML-based model for individualized risk prediction.These findings contribute to the advancement of personalized periop-erative care and may guide targeted preventive strategies.展开更多
Research on the application of machine learning(ML)models to landslide susceptibility assessments has gained popularity in recent years,with a focus primarily on topographic factors derived from digital elevation mode...Research on the application of machine learning(ML)models to landslide susceptibility assessments has gained popularity in recent years,with a focus primarily on topographic factors derived from digital elevation models(DEMs).However,few studies have focused on the explanatory effects of these factors on different models,i.e.whether DEM-based factors affect different models in the same way.This study investigated whether different ML models could yield consistent interpretations of DEM-based factors using explanatory algorithms.Six ML models,including a support vector machine,a neural network,extreme gradient boosting,a random forest,linear regression,and K-nearest neighbors,were trained and evaluated on five geospatial datasets derived from different DEMs.Each dataset contained eight DEM-based and six non-DEM-based factors from 8912 landslide samples.Model performance was assessed using accuracy,precision,recall rate,F1-score,kappa coefficient,and receiver operating characteristic curves.Explanatory analyses,including Shapley additive explanations and partial dependence plots,were also employed to investigate the effects of topographic factors on landslide susceptibility.The results indicate that DEM-based factors consistently influenced different ML models across the datasets.Furthermore,tree-based models outperformed the other models in almost all datasets,while the most suitable DEMs were obtained from Copernicus and TanDEM-X.In addition,the concave surface without potholes on steep slopes are ideal topographic conditions for landslide formation in the study area.This study can benefit the wider landslide research community by clarifying how topographic factors affect ML models.展开更多
BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR...BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR)is a common polypectomy proce-dure in clinical practice,but it has a high postoperative recurrence rate.Currently,there is no predictive model for the recurrence of colorectal polyps after EMR.AIM To construct and validate a machine learning(ML)model for predicting the risk of colorectal polyp recurrence one year after EMR.METHODS This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou.Additionally,a total of 166 patients were collected to form a prospective validation set.Feature variable screening was conducted using uni-variate and multivariate logistic regression analyses,and five ML algorithms were used to construct the predictive models.The optimal models were evaluated based on different performance metrics.Decision curve analysis(DCA)and SHapley Additive exPlanation(SHAP)analysis were performed to assess clinical applicability and predictor importance.RESULTS Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR(P<0.05).Among the models,eXtreme Gradient Boosting(XGBoost)demonstrated the highest area under the curve(AUC)in the training set,internal validation set,and prospective validation set,with AUCs of 0.909(95%CI:0.89-0.92),0.921(95%CI:0.90-0.94),and 0.963(95%CI:0.94-0.99),respectively.DCA indicated favorable clinical utility for the XGBoost model.SHAP analysis identified smoking history,family history,and age as the top three most important predictors in the model.CONCLUSION The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.展开更多
BACKGROUND Intensive care unit-acquired weakness(ICU-AW)is a common complication that significantly impacts the patient's recovery process,even leading to adverse outcomes.Currently,there is a lack of effective pr...BACKGROUND Intensive care unit-acquired weakness(ICU-AW)is a common complication that significantly impacts the patient's recovery process,even leading to adverse outcomes.Currently,there is a lack of effective preventive measures.AIM To identify significant risk factors for ICU-AW through iterative machine learning techniques and offer recommendations for its prevention and treatment.METHODS Patients were categorized into ICU-AW and non-ICU-AW groups on the 14th day post-ICU admission.Relevant data from the initial 14 d of ICU stay,such as age,comorbidities,sedative dosage,vasopressor dosage,duration of mechanical ventilation,length of ICU stay,and rehabilitation therapy,were gathered.The relationships between these variables and ICU-AW were examined.Utilizing iterative machine learning techniques,a multilayer perceptron neural network model was developed,and its predictive performance for ICU-AW was assessed using the receiver operating characteristic curve.RESULTS Within the ICU-AW group,age,duration of mechanical ventilation,lorazepam dosage,adrenaline dosage,and length of ICU stay were significantly higher than in the non-ICU-AW group.Additionally,sepsis,multiple organ dysfunction syndrome,hypoalbuminemia,acute heart failure,respiratory failure,acute kidney injury,anemia,stress-related gastrointestinal bleeding,shock,hypertension,coronary artery disease,malignant tumors,and rehabilitation therapy ratios were significantly higher in the ICU-AW group,demonstrating statistical significance.The most influential factors contributing to ICU-AW were identified as the length of ICU stay(100.0%)and the duration of mechanical ventilation(54.9%).The neural network model predicted ICU-AW with an area under the curve of 0.941,sensitivity of 92.2%,and specificity of 82.7%.CONCLUSION The main factors influencing ICU-AW are the length of ICU stay and the duration of mechanical ventilation.A primary preventive strategy,when feasible,involves minimizing both ICU stay and mechanical ventilation duration.展开更多
Harmonic analysis, the traditional tidal forecasting method, cannot take into account the impact of noncyclical factors, and is also based on the BP neural network tidal prediction model which is easily limited by the...Harmonic analysis, the traditional tidal forecasting method, cannot take into account the impact of noncyclical factors, and is also based on the BP neural network tidal prediction model which is easily limited by the amount of data. According to the movement of celestial bodies, and considering the insufficient tidal characteristics of historical data which are impacted by the nonperiodic weather, a tidal prediction method is designed based on support vector machine (SVM) to carry out the simulation experiment by using tidal data from Xiamen Tide Gauge, Luchaogang Tide Gauge and Weifang Tide Gauge individually. And the results show that the model satisfactorily carries out the tide prediction which is influenced by noncyclical factors. At the same time, it also proves that the proposed prediction method, which when compared with harmonic analysis method and the BP neural network method, has faster modeling speed, higher prediction precision and stronger generalization ability.展开更多
This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linea...This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linear model(SGLM),elastic net(ENET),partial least square(PLS),ridge regression,support vector machine(SVM),classification and regression trees(CART),bagged CART,and random forest(RF)for gully erosion susceptibility mapping(GESM)in Iran.The location of 462 previously existing gully erosion sites were mapped through widespread field investigations,of which 70%(323)and 30%(139)of observations were arbitrarily divided for algorithm calibration and validation.Twelve controlling factors for gully erosion,namely,soil texture,annual mean rainfall,digital elevation model(DEM),drainage density,slope,lithology,topographic wetness index(TWI),distance from rivers,aspect,distance from roads,plan curvature,and profile curvature were ranked in terms of their importance using each MLA.The MLA were compared using a training dataset for gully erosion and statistical measures such as RMSE(root mean square error),MAE(mean absolute error),and R-squared.Based on the comparisons among MLA,the RF algorithm exhibited the minimum RMSE and MAE and the maximum value of R-squared,and was therefore selected as the best model.The variable importance evaluation using the RF model revealed that distance from rivers had the highest significance in influencing the occurrence of gully erosion whereas plan curvature had the least importance.According to the GESM generated using RF,most of the study area is predicted to have a low(53.72%)or moderate(29.65%)susceptibility to gully erosion,whereas only a small area is identified to have a high(12.56%)or very high(4.07%)susceptibility.The outcome generated by RF model is validated using the ROC(Receiver Operating Characteristics)curve approach,which returned an area under the curve(AUC)of 0.985,proving the excellent forecasting ability of the model.The GESM prepared using the RF algorithm can aid decision-makers in targeting remedial actions for minimizing the damage caused by gully erosion.展开更多
To perform landslide susceptibility prediction(LSP),it is important to select appropriate mapping unit and landslide-related conditioning factors.The efficient and automatic multi-scale segmentation(MSS)method propose...To perform landslide susceptibility prediction(LSP),it is important to select appropriate mapping unit and landslide-related conditioning factors.The efficient and automatic multi-scale segmentation(MSS)method proposed by the authors promotes the application of slope units.However,LSP modeling based on these slope units has not been performed.Moreover,the heterogeneity of conditioning factors in slope units is neglected,leading to incomplete input variables of LSP modeling.In this study,the slope units extracted by the MSS method are used to construct LSP modeling,and the heterogeneity of conditioning factors is represented by the internal variations of conditioning factors within slope unit using the descriptive statistics features of mean,standard deviation and range.Thus,slope units-based machine learning models considering internal variations of conditioning factors(variant slope-machine learning)are proposed.The Chongyi County is selected as the case study and is divided into 53,055 slope units.Fifteen original slope unit-based conditioning factors are expanded to 38 slope unit-based conditioning factors through considering their internal variations.Random forest(RF)and multi-layer perceptron(MLP)machine learning models are used to construct variant Slope-RF and Slope-MLP models.Meanwhile,the Slope-RF and Slope-MLP models without considering the internal variations of conditioning factors,and conventional grid units-based machine learning(Grid-RF and MLP)models are built for comparisons through the LSP performance assessments.Results show that the variant Slopemachine learning models have higher LSP performances than Slope-machine learning models;LSP results of variant Slope-machine learning models have stronger directivity and practical application than Grid-machine learning models.It is concluded that slope units extracted by MSS method can be appropriate for LSP modeling,and the heterogeneity of conditioning factors within slope units can more comprehensively reflect the relationships between conditioning factors and landslides.The research results have important reference significance for land use and landslide prevention.展开更多
BACKGROUND Ischemic heart disease(IHD)impacts the quality of life and has the highest mortality rate of cardiovascular diseases globally.AIM To compare variations in the parameters of the single-lead electrocardiogram...BACKGROUND Ischemic heart disease(IHD)impacts the quality of life and has the highest mortality rate of cardiovascular diseases globally.AIM To compare variations in the parameters of the single-lead electrocardiogram(ECG)during resting conditions and physical exertion in individuals diagnosed with IHD and those without the condition using vasodilator-induced stress computed tomography(CT)myocardial perfusion imaging as the diagnostic reference standard.METHODS This single center observational study included 80 participants.The participants were aged≥40 years and given an informed written consent to participate in the study.Both groups,G1(n=31)with and G2(n=49)without post stress induced myocardial perfusion defect,passed cardiologist consultation,anthropometric measurements,blood pressure and pulse rate measurement,echocardiography,cardio-ankle vascular index,bicycle ergometry,recording 3-min single-lead ECG(Cardio-Qvark)before and just after bicycle ergometry followed by performing CT myocardial perfusion.The LASSO regression with nested cross-validation was used to find the association between Cardio-Qvark parameters and the existence of the perfusion defect.Statistical processing was performed with the R programming language v4.2,Python v.3.10[^R],and Statistica 12 program.RESULTS Bicycle ergometry yielded an area under the receiver operating characteristic curve of 50.7%[95%confidence interval(CI):0.388-0.625],specificity of 53.1%(95%CI:0.392-0.673),and sensitivity of 48.4%(95%CI:0.306-0.657).In contrast,the Cardio-Qvark test performed notably better with an area under the receiver operating characteristic curve of 67%(95%CI:0.530-0.801),specificity of 75.5%(95%CI:0.628-0.88),and sensitivity of 51.6%(95%CI:0.333-0.695).CONCLUSION The single-lead ECG has a relatively higher diagnostic accuracy compared with bicycle ergometry by using machine learning models,but the difference was not statistically significant.However,further investigations are required to uncover the hidden capabilities of single-lead ECG in IHD diagnosis.展开更多
Software-Defined Network(SDN)decouples the control plane of network devices from the data plane.While alleviating the problems presented in traditional network architectures,it also brings potential security risks,par...Software-Defined Network(SDN)decouples the control plane of network devices from the data plane.While alleviating the problems presented in traditional network architectures,it also brings potential security risks,particularly network Denial-of-Service(DoS)attacks.While many research efforts have been devoted to identifying new features for DoS attack detection,detection methods are less accurate in detecting DoS attacks against client hosts due to the high stealth of such attacks.To solve this problem,a new method of DoS attack detection based on Deep Factorization Machine(DeepFM)is proposed in SDN.Firstly,we select the Growth Rate of Max Matched Packets(GRMMP)in SDN as detection feature.Then,the DeepFM algorithm is used to extract features from flow rules and classify them into dense and discrete features to detect DoS attacks.After training,the model can be used to infer whether SDN is under DoS attacks,and a DeepFM-based detection method for DoS attacks against client host is implemented.Simulation results show that our method can effectively detect DoS attacks in SDN.Compared with the K-Nearest Neighbor(K-NN),Artificial Neural Network(ANN)models,Support Vector Machine(SVM)and Random Forest models,our proposed method outperforms in accuracy,precision and F1 values.展开更多
In the live broadcast process,eye movement characteristics can reflect people’s attention to the product.However,the existing interest degree predictive model research does not consider the eye movement characteristi...In the live broadcast process,eye movement characteristics can reflect people’s attention to the product.However,the existing interest degree predictive model research does not consider the eye movement characteristics.In order to obtain the users’interest in the product more effectively,we will consider the key eye movement indicators.We first collect eye movement characteristics based on the self-developed data processing algorithm fast discriminative model prediction for tracking(FDIMP),and then we add data dimensions to the original data set through information filling.In addition,we apply the deep factorization machine(DeepFM)architecture to simultaneously learn the combination of low-level and high-level features.In order to effectively learn important features and emphasize relatively important features,the multi-head attention mechanism is applied in the interest model.The experimental results on the public data set Criteo show that,compared with the original DeepFM algorithm,the area under curve(AUC)value was improved by up to 9.32%.展开更多
BACKGROUND Colorectal cancer is a common digestive malignancy,and chemotherapy remains a cornerstone of treatment.Myelosuppression,a frequent hematologic toxicity,poses significant clinical challenges.However,no inter...BACKGROUND Colorectal cancer is a common digestive malignancy,and chemotherapy remains a cornerstone of treatment.Myelosuppression,a frequent hematologic toxicity,poses significant clinical challenges.However,no interpretable machine learning-based nomogram exists to predict chemotherapy-induced myelosuppression in colorectal cancer patients.This study aimed to develop and validate an inter-pretable clinic-machine learning nomogram integrating clinical predictors with multiple algorithms via a feature mapping algorithm.The model provides accurate risk estimation and clinical interpretability,supporting individualized prevention strategies and optimizing decision-making in patients receiving first-line chemotherapy.AIM To develop and validate an interpretable clinic-machine learning nomogram predicting chemotherapy-induced myelosuppression in colorectal cancer.METHODS This retrospective study enrolled 855 colorectal cancer patients receiving first-line chemotherapy.Data were split into training(n=612),validation(n=153),and testing(n=90)cohorts.Ten predictors were identified through least absolute shrinkage and selection operator,decision tree,random forest,and expert con-sensus.Ten machine learning algorithms were applied,with performance assessed by area under the receiver operating characteristic curve(AUC),area under the precision-recall curve(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.RESULTS A total of 855 colorectal cancer patients were enrolled,with 765 cases(April 2020 to December 2023)used for model training and validation,and 90 cases(January 2024 to July 2024)for internal testing.Baseline clinical features did not differ significantly between training and validation cohorts(P>0.05).Ten predictors were identified through integrated feature selection and expert consensus,including age,body surface area,body mass index,tumor position,albumin,carcinoembryonic antigen,carbohydrate antigen(CA)19-9,CA125,chemotherapy regimen,and chemotherapy cycles.Among ten machine learning algorithms,extreme gradient boosting achieved the best validation performance(AUC=0.97,AUPRC=0.92,sensitivity=0.79,specificity=0.92,accuracy=0.88).Logistic regression confirmed extra trees and random forest as independent predictors,which were incorporated into a clinic-machine learning nomogram.The clinic-machine learning nomogram demonstrated superior discrimination(AUC=0.96,AUPRC=0.93,accuracy=0.90,specificity=0.95),good calibration,and greater net clinical benefit across a wide probability range(10%-90%).Internal testing further confirmed its robustness and generalizability(AUC=0.95).CONCLUSION The clinic-machine learning nomogram accurately predicts chemotherapy-induced myelosuppression in colorectal cancer,providing interpretability and clinical utility to support individualized risk assessment and treatment decision-making.展开更多
Permeability is one of the main oil reservoir characteristics.It affects potential oil production,well-completion technologies,the choice of enhanced oil recovery methods,and more.The methods used to determine and pre...Permeability is one of the main oil reservoir characteristics.It affects potential oil production,well-completion technologies,the choice of enhanced oil recovery methods,and more.The methods used to determine and predict reservoir permeability have serious shortcomings.This article aims to refine and adapt machine learning techniques using historical data from hydrocarbon field development to evaluate and predict parameters such as the skin factor and permeability of the remote reservoir zone.The article analyzes data from 4045 wells tests in oil fields in the Perm Krai(Russia).An evaluation of the performance of different Machine Learning(ML)al-gorithms in the prediction of the well permeability is performed.Three different real datasets are used to train more than 20 machine learning regressors,whose hyperparameters are optimized using Bayesian Optimization(BO).The resulting models demonstrate significantly better predictive performance compared to traditional methods and the best ML model found is one that never was applied before to this problem.The permeability prediction model is characterized by a high R^(2) adjusted value of 0.799.A promising approach is the integration of machine learning methods and the use of pressure recovery curves to estimate permeability in real-time.The work is unique for its approach to predicting pressure recovery curves during well operation without stopping wells,providing primary data for interpretation.These innovations are exclusive and can improve the accuracy of permeability forecasts.It also reduces well downtime associated with traditional well-testing procedures.The proposed methods pave the way for more efficient and cost-effective reservoir development,ultimately sup-porting better decision-making and resource optimization in oil production.展开更多
Moisture accumulation within road pavements,particularly in unbound granular materials with or without thin sprayed seals,presents significant challenges in high-rainfall regions such as Queensland.This infiltration o...Moisture accumulation within road pavements,particularly in unbound granular materials with or without thin sprayed seals,presents significant challenges in high-rainfall regions such as Queensland.This infiltration often leads to various forms of pavement distress,eventually causing irreversible damage to the pavement structure.The moisture content within pavements exhibits considerable dynamism and directly influenced by environmental factors such as precipitation,air temperature,and relative humidity.This variability underscores the importance of monitoring moisture changes using real-time climatic data to assess pavement conditions for operational management or incorporating these effects during pavement design based on historical climate data.Consequently,there is an increasing demand for advanced,technology-driven methodologies to predict moisture variations based on climatic inputs.Addressing this gap,the present study employs five traditional machine learning(ML)algorithms,K-nearest neighbors(KNN),regression trees,random forest,support vector machines(SVMs),and gaussian process regression(GPR),to forecast moisture levels within pavement layers over time,with varying algorithm complexities.Using data collected from an instrumented road in Brisbane,Australia,which includes pavement moisture and climatic factors,the study develops predictive models to forecast moisture content at future time steps.The approach incorporates current moisture content,rather than averaged values,along with seasonality(both daily and annual),and key climatic factors to predict next step moisture.Model performance is evaluated using R2,MSE,RMSE,and MAPE metrics.Results show that ML algorithms can reliably predict long-term moisture variations in pavements,provided optimal hyperparameters are selected for each algorithm.The best-performing algorithms include KNN(the number of neighbours equals to 15),medium regression tree,medium random forest,coarse SVM,and simple GPR,with medium random forest outperforming the others.The study also identifies the optimal hyperparameter combinations for each algorithm,offering significant advancements in moisture prediction tools for pavement technology。展开更多
Objective:As an age-related neurodegenerative disease,the prevalence of mild cognitive impairment(MCI)increases with age.Within the framework of traditional Chinese medicine,spleen-kidney deficiency syndrome(SKDS)is r...Objective:As an age-related neurodegenerative disease,the prevalence of mild cognitive impairment(MCI)increases with age.Within the framework of traditional Chinese medicine,spleen-kidney deficiency syndrome(SKDS)is recognized as the most frequent MCI subtype.Due to the covert and gradual onset of MCI,in community settings it poses a significant challenge for patients and their families to discern between typical aging and pathological changes.There exists an urgent need to devise a preliminary diagnostic tool designed for community-residing older adults with MCI attributed to SKDS(MCI-SKDS).Methods:This investigation enrolled 312 elderly individuals diagnosed with MCI,who were randomly distributed into training and test datasets at a 3:1 ratio.Five machine learning methods,including logistic regression(LR),decision tree(DT),naive Bayes(NB),support vector machine(SVM),and gradient boosting(GB),were used to build a diagnostic prediction model for MCI-SKDS.Accuracy,sensitivity,specificity,precision,F1 score,and area under the curve were used to evaluate model performance.Furthermore,the clinical applicability of the model was evaluated through decision curve analysis(DCA).Results:The accuracy,precision,specificity and F1 score of the DT model performed best in the training set(test set),with scores of 0.904(0.845),0.875(0.795),0.973(0.875)and 0.973(0.875).The sensitivity of the training set(test set)of the SVM model performed best among the five models with a score of 0.865(0.821).The area under the curve of all five models was greater than 0.9 for the training dataset and greater than 0.8 for the test dataset.The DCA of all models showed good clinical application value.The study identified ten indicators that were significant predictors of MCI-SKDS.Conclusion:The risk prediction index derived from machine learning for the MCI-SKDS prediction model is simple and practical;the model demonstrates good predictive value and clinical applicability,and the DT model had the best performance.展开更多
Studies have shown that vascular dysfunction is closely related to the pathogenesis of Alzheimer's disease.The middle temporal gyrus region of the brain is susceptible to pronounced impairment in Alzheimer's d...Studies have shown that vascular dysfunction is closely related to the pathogenesis of Alzheimer's disease.The middle temporal gyrus region of the brain is susceptible to pronounced impairment in Alzheimer's disease.Identification of the molecules involved in vascular aberrance of the middle temporal gyrus would support elucidation of the mechanisms underlying Alzheimer's disease and discove ry of novel targets for intervention.We carried out single-cell transcriptomic analysis of the middle temporal gyrus in the brains of patients with Alzheimer's disease and healthy controls,revealing obvious changes in vascular function.CellChat analysis of intercellular communication in the middle temporal gyrus showed that the number of cell interactions in this region was decreased in Alzheimer's disease patients,with altered intercellular communication of endothelial cells and pericytes being the most prominent.Differentially expressed genes were also identified.Using the CellChat results,AUCell evaluation of the pathway activity of specific cells showed that the obvious changes in vascular function in the middle temporal gyrus in Alzheimer's disease were directly related to changes in the vascular endothelial growth factor(VEGF)A-VEGF receptor(VEGFR)2 pathway.AUCell analysis identified subtypes of endothelial cells and pericytes directly related to VEGFA-VEGFR2 pathway activity.Two subtypes of middle temporal gyrus cells showed significant alteration in AD:endothelial cells with high expression of Erb-B2 receptor tyrosine kinase 4(ERBB4^(high))and pericytes with high expression of angiopoietin-like 4(ANGPTL4^(high)).Finally,combining bulk RNA sequencing data and two machine learning algorithms(least absolute shrinkage and selection operator and random forest),four characteristic Alzheimer's disease feature genes were identified:somatostatin(SST),protein tyrosine phosphatase non-receptor type 3(PTPN3),glutinase(GL3),and tropomyosin 3(PTM3).These genes were downregulated in the middle temporal gyrus of patients with Alzheimer's disease and may be used to target the VEGF pathway.Alzheimer's disease mouse models demonstrated consistent altered expression of these genes in the middle temporal gyrus.In conclusion,this study detected changes in intercellular communication between endothelial cells and pericytes in the middle temporal gyrus and identified four novel feature genes related to middle temporal gyrus and vascular functioning in patients with Alzheimer's disease.These findings contribute to a deeper understanding of the molecular mechanisms underlying Alzheimer's disease and present novel treatment targets.展开更多
Silicone material extrusion(MEX)is widely used for processing liquids and pastes.Owing to the uneven linewidth and elastic extrusion deformation caused by material accumulation,products may exhibit geometric errors an...Silicone material extrusion(MEX)is widely used for processing liquids and pastes.Owing to the uneven linewidth and elastic extrusion deformation caused by material accumulation,products may exhibit geometric errors and performance defects,leading to a decline in product quality and affecting its service life.This study proposes a process parameter optimization method that considers the mechanical properties of printed specimens and production costs.To improve the quality of silicone printing samples and reduce production costs,three machine learning models,kernel extreme learning machine(KELM),support vector regression(SVR),and random forest(RF),were developed to predict these three factors.Training data were obtained through a complete factorial experiment.A new dataset is obtained using the Euclidean distance method,which assigns the elimination factor.It is trained with Bayesian optimization algorithms for parameter optimization,the new dataset is input into the improved double Gaussian extreme learning machine,and finally obtains the improved KELM model.The results showed improved prediction accuracy over SVR and RF.Furthermore,a multi-objective optimization framework was proposed by combining genetic algorithm technology with the improved KELM model.The effectiveness and reasonableness of the model algorithm were verified by comparing the optimized results with the experimental results.展开更多
Performing arts and movies have become commercial products with high profit and great market potential. Previous research works have developed comprehensive models to forecast the demand for movies. However,they did n...Performing arts and movies have become commercial products with high profit and great market potential. Previous research works have developed comprehensive models to forecast the demand for movies. However,they did not pay enough attention to the decision support for performing arts which is a special category unlike movies. For performing arts with high-dimensional categorical attributes and limit samples, determining ticket prices in different levels is still a challenge job faced by the producers and distributors. In terms of these difficulties, factorization machine(FM), which can handle huge sparse categorical attributes, is used in this work first. Adaptive stochastic gradient descent(ASGD) and Markov chain Monte Carlo(MCMC) are both explored to estimate the model parameters of FM. FM with ASGD(FM-ASGD) and FM with MCMC(FM-MCMC) both can achieve a better prediction accuracy, compared with a traditional algorithm. In addition, the multi-output model is proposed to determine the price in multiple price levels simultaneously, which avoids the trouble of the models' repeating training. The results also confirm the prediction accuracy of the multi-output model, compared with those from the general single-output model.展开更多
基金The National Key Research and Development Program of China,No.2023YFC3206601。
文摘Landslides pose a formidable natural hazard across the Qinghai-Tibet Plateau(QTP),endangering both ecosystems and human life.Identifying the driving factors behind landslides and accurately assessing susceptibility are key to mitigating disaster risk.This study integrated multi-source historical landslide data with 15 predictive factors and used several machine learning models—Random Forest(RF),Gradient Boosting Regression Trees(GBRT),Extreme Gradient Boosting(XGBoost),and Categorical Boosting(CatBoost)—to generate susceptibility maps.The Shapley additive explanation(SHAP)method was applied to quantify factor importance and explore their nonlinear effects.The results showed that:(1)CatBoost was the best-performing model(CA=0.938,AUC=0.980)in assessing landslide susceptibility,with altitude emerging as the most significant factor,followed by distance to roads and earthquake sites,precipitation,and slope;(2)the SHAP method revealed critical nonlinear thresholds,demonstrating that historical landslides were concentrated at mid-altitudes(1400-4000 m)and decreased markedly above 4000 m,with a parallel reduction in probability beyond 700 m from roads;and(3)landslide-prone areas,comprising 13%of the QTP,were concentrated in the southeastern and northeastern parts of the plateau.By integrating machine learning and SHAP analysis,this study revealed landslide hazard-prone areas and their driving factors,providing insights to support disaster management strategies and sustainable regional planning.
基金supported by the Sichuan Province Nursing Scientific Research Project Plan(H23022)the 2022 Municipal-University Science and Technology Strategic Cooperation Special Fund of Nanchong Science and Technology Bureau(22SXQT0222)。
文摘Objective Accurately identifying the key influencing factors of psychological birth trauma in primiparous women is crucial for implementing effective preventive and intervention measures.This study aimed to develop and validate an interpretable machine learning prediction model for identifying the key influencing factors of psychological birth trauma in primiparous women.Methods A multicenter cross-sectional study was conducted on primiparous women in four tertiary hospitals in Sichuan Province,southwestern China,from December 2023 to March 2024.The Childbirth Trauma Index was used in assessing psychological birth trauma in primiparous women.Data were collected and randomly divided into a training set(80%,n=289)and a testing set(20%,n=73).Six different machine learning models were trained and tested.Training and prediction were conducted using six machine learning models included Linear Regression,Support Vector Regression,Multilayer Perceptron Regression,eXtreme Gradient Boosting Regression,Random Forest Regression,and Adaptive Boosting Regression.The optimal model was selected based on various performance metrics,and its predictive results were interpreted using SHapley Additive exPlanations(SHAP)and accumulated local effects(ALE).Results Among the six machine learning models,the Multilayer Perceptron Regression model exhibited the best overall performance in the testing set(MAE=3.977,MSE=24.832,R2=0.507,EVS=0.524,RMSE=4.983).In the testing set,the R2 and EVS of the Multilayer Perceptron Regression model increased by 8.3%and 1.2%,respectively,compared to the traditional linear regression model.Meanwhile,the MAE,MSE,and RMSE decreased by 0.4%,7.3%,and 3.7%,respectively,compared to the traditional linear regression model.The SHAP analysis indicated that intrapartum pain,anxiety,postpartum pain,resilience,and planned pregnancy are the most critical influencing factors of psychological birth trauma in primiparous women.The ALE analysis indicated that higher intrapartum pain,anxiety,and postpartum pain scores are risk factors,while higher resilience scores are protective factors.Conclusions Interpretable machine learning prediction models can identify the key influencing factors of psychological birth trauma in primiparous women.SHAP and ALE analyses based on the Multilayer Perceptron Regression model can help healthcare providers understand the complex decision-making logic within a prediction model.This study provides a scientific basis for the early prevention and personalized intervention of psychological birth trauma in primiparous women.
基金financially supported by the National Natural Science Foundation of China(No.52174297).
文摘The endpoint carbon content in the converter is critical for the quality of steel products,and accurately predicting this parameter is an effective way to reduce alloy consumption and improve smelting efficiency.However,most scholars currently focus on modifying methods to enhance model accuracy,while overlooking the extent to which input parameters influence accuracy.To address this issue,in this study,a prediction model for the endpoint carbon content in the converter was developed using factor analysis(FA)and support vector machine(SVM)optimized by improved particle swarm optimization(IPSO).Analysis of the factors influencing the endpoint carbon content during the converter smelting process led to the identification of 21 input parameters.Subsequently,FA was used to reduce the dimensionality of the data and applied to the prediction model.The results demonstrate that the performance of the FA-IPSO-SVM model surpasses several existing methods,such as twin support vector regression and support vector machine.The model achieves hit rates of 89.59%,96.21%,and 98.74%within error ranges of±0.01%,±0.015%,and±0.02%,respectively.Finally,based on the prediction results obtained by sequentially removing input parameters,the parameters were classified into high influence(5%-7%),medium influence(2%-5%),and low influence(0-2%)categories according to their varying degrees of impact on prediction accuracy.This classi-fication provides a reference for selecting input parameters in future prediction models for endpoint carbon content.
文摘BACKGROUND Pancreatic fistula is the most common complication of pancreatic surgeries that causes more serious conditions,including bleeding due to visceral vessel erosion and peritonitis.AIM To develop a machine learning(ML)model for postoperative pancreatic fistula and identify significant risk factors of the complication.METHODS A single-center retrospective clinical study was conducted which included 150 patients,who underwent pancreat-oduodenectomy.Logistic regression,random forest,and CatBoost were employed for modeling the biochemical leak(symptomless fistula)and fistula grade B/C(clinically significant complication).The performance was estimated by receiver operating characteristic(ROC)area under the curve(AUC)after 5-fold cross-validation(20%testing and 80%training data).The risk factors were evaluated with the most accurate algorithm,based on the parameter“Importance”(Im),and Kendall correlation,P<0.05.RESULTS The CatBoost algorithm was the most accurate with an AUC of 74%-86%.The study provided results of ML-based modeling and algorithm selection for pancreatic fistula prediction and risk factor evaluation.From 14 parameters we selected the main pre-and intraoperative prognostic factors of all the fistulas:Tumor vascular invasion(Im=24.8%),age(Im=18.6%),and body mass index(Im=16.4%),AUC=74%.The ML model showed that biochemical leak,blood and drain amylase level(Im=21.6%and 16.4%),and blood leukocytes(Im=11.2%)were crucial predictors for subsequent fistula B/C,AUC=86%.Surgical techniques,morphology,and pancreatic duct diameter less than 3 mm were insignificant(Im<5%and no correlations detected).The results were confirmed by correlation analysis.CONCLUSION This study highlights the key predictors of postoperative pancreatic fistula and establishes a robust ML-based model for individualized risk prediction.These findings contribute to the advancement of personalized periop-erative care and may guide targeted preventive strategies.
基金supported by the National Key Research and Development Program of China(Grant No.2022YFC3003205)the Chengdu University of Technology Postgraduate Innovative Cultivation Program(Grant No.10800-000510-01-022)+1 种基金the Sichuan Science and Technology Program(Grant No.2025ZNSFSC1206)the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection Independent Research Project(Grant No.SKLGP2023Z026).
文摘Research on the application of machine learning(ML)models to landslide susceptibility assessments has gained popularity in recent years,with a focus primarily on topographic factors derived from digital elevation models(DEMs).However,few studies have focused on the explanatory effects of these factors on different models,i.e.whether DEM-based factors affect different models in the same way.This study investigated whether different ML models could yield consistent interpretations of DEM-based factors using explanatory algorithms.Six ML models,including a support vector machine,a neural network,extreme gradient boosting,a random forest,linear regression,and K-nearest neighbors,were trained and evaluated on five geospatial datasets derived from different DEMs.Each dataset contained eight DEM-based and six non-DEM-based factors from 8912 landslide samples.Model performance was assessed using accuracy,precision,recall rate,F1-score,kappa coefficient,and receiver operating characteristic curves.Explanatory analyses,including Shapley additive explanations and partial dependence plots,were also employed to investigate the effects of topographic factors on landslide susceptibility.The results indicate that DEM-based factors consistently influenced different ML models across the datasets.Furthermore,tree-based models outperformed the other models in almost all datasets,while the most suitable DEMs were obtained from Copernicus and TanDEM-X.In addition,the concave surface without potholes on steep slopes are ideal topographic conditions for landslide formation in the study area.This study can benefit the wider landslide research community by clarifying how topographic factors affect ML models.
文摘BACKGROUND Colorectal polyps are precancerous diseases of colorectal cancer.Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer.Endoscopic mucosal resection(EMR)is a common polypectomy proce-dure in clinical practice,but it has a high postoperative recurrence rate.Currently,there is no predictive model for the recurrence of colorectal polyps after EMR.AIM To construct and validate a machine learning(ML)model for predicting the risk of colorectal polyp recurrence one year after EMR.METHODS This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou.Additionally,a total of 166 patients were collected to form a prospective validation set.Feature variable screening was conducted using uni-variate and multivariate logistic regression analyses,and five ML algorithms were used to construct the predictive models.The optimal models were evaluated based on different performance metrics.Decision curve analysis(DCA)and SHapley Additive exPlanation(SHAP)analysis were performed to assess clinical applicability and predictor importance.RESULTS Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR(P<0.05).Among the models,eXtreme Gradient Boosting(XGBoost)demonstrated the highest area under the curve(AUC)in the training set,internal validation set,and prospective validation set,with AUCs of 0.909(95%CI:0.89-0.92),0.921(95%CI:0.90-0.94),and 0.963(95%CI:0.94-0.99),respectively.DCA indicated favorable clinical utility for the XGBoost model.SHAP analysis identified smoking history,family history,and age as the top three most important predictors in the model.CONCLUSION The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.
基金Supported by Science and Technology Support Program of Qiandongnan Prefecture,No.Qiandongnan Sci-Tech Support[2021]12Guizhou Province High-Level Innovative Talent Training Program,No.Qiannan Thousand Talents[2022]201701.
文摘BACKGROUND Intensive care unit-acquired weakness(ICU-AW)is a common complication that significantly impacts the patient's recovery process,even leading to adverse outcomes.Currently,there is a lack of effective preventive measures.AIM To identify significant risk factors for ICU-AW through iterative machine learning techniques and offer recommendations for its prevention and treatment.METHODS Patients were categorized into ICU-AW and non-ICU-AW groups on the 14th day post-ICU admission.Relevant data from the initial 14 d of ICU stay,such as age,comorbidities,sedative dosage,vasopressor dosage,duration of mechanical ventilation,length of ICU stay,and rehabilitation therapy,were gathered.The relationships between these variables and ICU-AW were examined.Utilizing iterative machine learning techniques,a multilayer perceptron neural network model was developed,and its predictive performance for ICU-AW was assessed using the receiver operating characteristic curve.RESULTS Within the ICU-AW group,age,duration of mechanical ventilation,lorazepam dosage,adrenaline dosage,and length of ICU stay were significantly higher than in the non-ICU-AW group.Additionally,sepsis,multiple organ dysfunction syndrome,hypoalbuminemia,acute heart failure,respiratory failure,acute kidney injury,anemia,stress-related gastrointestinal bleeding,shock,hypertension,coronary artery disease,malignant tumors,and rehabilitation therapy ratios were significantly higher in the ICU-AW group,demonstrating statistical significance.The most influential factors contributing to ICU-AW were identified as the length of ICU stay(100.0%)and the duration of mechanical ventilation(54.9%).The neural network model predicted ICU-AW with an area under the curve of 0.941,sensitivity of 92.2%,and specificity of 82.7%.CONCLUSION The main factors influencing ICU-AW are the length of ICU stay and the duration of mechanical ventilation.A primary preventive strategy,when feasible,involves minimizing both ICU stay and mechanical ventilation duration.
基金The Shanghai Committee of Science and Technology of China under contract No. 10510502800the Graduate Student Education Innovation Program Foundation of Shanghai Municipal Education Commission of Chinathe National Key Science Foundation Research "973" Project of the Ministry of Science and Technology of China under contract No. 2012CB316200
文摘Harmonic analysis, the traditional tidal forecasting method, cannot take into account the impact of noncyclical factors, and is also based on the BP neural network tidal prediction model which is easily limited by the amount of data. According to the movement of celestial bodies, and considering the insufficient tidal characteristics of historical data which are impacted by the nonperiodic weather, a tidal prediction method is designed based on support vector machine (SVM) to carry out the simulation experiment by using tidal data from Xiamen Tide Gauge, Luchaogang Tide Gauge and Weifang Tide Gauge individually. And the results show that the model satisfactorily carries out the tide prediction which is influenced by noncyclical factors. At the same time, it also proves that the proposed prediction method, which when compared with harmonic analysis method and the BP neural network method, has faster modeling speed, higher prediction precision and stronger generalization ability.
基金supported by the College of Agriculture,Shiraz University(Grant No.97GRC1M271143)funding from the UK Biotechnology and Biological Sciences Research Council(BBSRC)funded by BBSRC grant award BBS/E/C/000I0330–Soil to Nutrition project 3–Sustainable intensification:optimisation at multiple scales。
文摘This investigation assessed the efficacy of 10 widely used machine learning algorithms(MLA)comprising the least absolute shrinkage and selection operator(LASSO),generalized linear model(GLM),stepwise generalized linear model(SGLM),elastic net(ENET),partial least square(PLS),ridge regression,support vector machine(SVM),classification and regression trees(CART),bagged CART,and random forest(RF)for gully erosion susceptibility mapping(GESM)in Iran.The location of 462 previously existing gully erosion sites were mapped through widespread field investigations,of which 70%(323)and 30%(139)of observations were arbitrarily divided for algorithm calibration and validation.Twelve controlling factors for gully erosion,namely,soil texture,annual mean rainfall,digital elevation model(DEM),drainage density,slope,lithology,topographic wetness index(TWI),distance from rivers,aspect,distance from roads,plan curvature,and profile curvature were ranked in terms of their importance using each MLA.The MLA were compared using a training dataset for gully erosion and statistical measures such as RMSE(root mean square error),MAE(mean absolute error),and R-squared.Based on the comparisons among MLA,the RF algorithm exhibited the minimum RMSE and MAE and the maximum value of R-squared,and was therefore selected as the best model.The variable importance evaluation using the RF model revealed that distance from rivers had the highest significance in influencing the occurrence of gully erosion whereas plan curvature had the least importance.According to the GESM generated using RF,most of the study area is predicted to have a low(53.72%)or moderate(29.65%)susceptibility to gully erosion,whereas only a small area is identified to have a high(12.56%)or very high(4.07%)susceptibility.The outcome generated by RF model is validated using the ROC(Receiver Operating Characteristics)curve approach,which returned an area under the curve(AUC)of 0.985,proving the excellent forecasting ability of the model.The GESM prepared using the RF algorithm can aid decision-makers in targeting remedial actions for minimizing the damage caused by gully erosion.
基金funded by the Natural Science Foundation of China(Grant Nos.41807285,41972280 and 52179103).
文摘To perform landslide susceptibility prediction(LSP),it is important to select appropriate mapping unit and landslide-related conditioning factors.The efficient and automatic multi-scale segmentation(MSS)method proposed by the authors promotes the application of slope units.However,LSP modeling based on these slope units has not been performed.Moreover,the heterogeneity of conditioning factors in slope units is neglected,leading to incomplete input variables of LSP modeling.In this study,the slope units extracted by the MSS method are used to construct LSP modeling,and the heterogeneity of conditioning factors is represented by the internal variations of conditioning factors within slope unit using the descriptive statistics features of mean,standard deviation and range.Thus,slope units-based machine learning models considering internal variations of conditioning factors(variant slope-machine learning)are proposed.The Chongyi County is selected as the case study and is divided into 53,055 slope units.Fifteen original slope unit-based conditioning factors are expanded to 38 slope unit-based conditioning factors through considering their internal variations.Random forest(RF)and multi-layer perceptron(MLP)machine learning models are used to construct variant Slope-RF and Slope-MLP models.Meanwhile,the Slope-RF and Slope-MLP models without considering the internal variations of conditioning factors,and conventional grid units-based machine learning(Grid-RF and MLP)models are built for comparisons through the LSP performance assessments.Results show that the variant Slopemachine learning models have higher LSP performances than Slope-machine learning models;LSP results of variant Slope-machine learning models have stronger directivity and practical application than Grid-machine learning models.It is concluded that slope units extracted by MSS method can be appropriate for LSP modeling,and the heterogeneity of conditioning factors within slope units can more comprehensively reflect the relationships between conditioning factors and landslides.The research results have important reference significance for land use and landslide prevention.
基金Supported by Government Assignment,No.1023022600020-6RSF Grant,No.24-15-00549Ministry of Science and Higher Education of the Russian Federation within the Framework of State Support for the Creation and Development of World-Class Research Center,No.075-15-2022-304.
文摘BACKGROUND Ischemic heart disease(IHD)impacts the quality of life and has the highest mortality rate of cardiovascular diseases globally.AIM To compare variations in the parameters of the single-lead electrocardiogram(ECG)during resting conditions and physical exertion in individuals diagnosed with IHD and those without the condition using vasodilator-induced stress computed tomography(CT)myocardial perfusion imaging as the diagnostic reference standard.METHODS This single center observational study included 80 participants.The participants were aged≥40 years and given an informed written consent to participate in the study.Both groups,G1(n=31)with and G2(n=49)without post stress induced myocardial perfusion defect,passed cardiologist consultation,anthropometric measurements,blood pressure and pulse rate measurement,echocardiography,cardio-ankle vascular index,bicycle ergometry,recording 3-min single-lead ECG(Cardio-Qvark)before and just after bicycle ergometry followed by performing CT myocardial perfusion.The LASSO regression with nested cross-validation was used to find the association between Cardio-Qvark parameters and the existence of the perfusion defect.Statistical processing was performed with the R programming language v4.2,Python v.3.10[^R],and Statistica 12 program.RESULTS Bicycle ergometry yielded an area under the receiver operating characteristic curve of 50.7%[95%confidence interval(CI):0.388-0.625],specificity of 53.1%(95%CI:0.392-0.673),and sensitivity of 48.4%(95%CI:0.306-0.657).In contrast,the Cardio-Qvark test performed notably better with an area under the receiver operating characteristic curve of 67%(95%CI:0.530-0.801),specificity of 75.5%(95%CI:0.628-0.88),and sensitivity of 51.6%(95%CI:0.333-0.695).CONCLUSION The single-lead ECG has a relatively higher diagnostic accuracy compared with bicycle ergometry by using machine learning models,but the difference was not statistically significant.However,further investigations are required to uncover the hidden capabilities of single-lead ECG in IHD diagnosis.
基金This work was funded by the Researchers Supporting Project No.(RSP-2021/102)King Saud University,Riyadh,Saudi ArabiaThis work was supported by the Research Project on Teaching Reform of General Colleges and Universities in Hunan Province(Grant No.HNJG-2020-0261),China.
文摘Software-Defined Network(SDN)decouples the control plane of network devices from the data plane.While alleviating the problems presented in traditional network architectures,it also brings potential security risks,particularly network Denial-of-Service(DoS)attacks.While many research efforts have been devoted to identifying new features for DoS attack detection,detection methods are less accurate in detecting DoS attacks against client hosts due to the high stealth of such attacks.To solve this problem,a new method of DoS attack detection based on Deep Factorization Machine(DeepFM)is proposed in SDN.Firstly,we select the Growth Rate of Max Matched Packets(GRMMP)in SDN as detection feature.Then,the DeepFM algorithm is used to extract features from flow rules and classify them into dense and discrete features to detect DoS attacks.After training,the model can be used to infer whether SDN is under DoS attacks,and a DeepFM-based detection method for DoS attacks against client host is implemented.Simulation results show that our method can effectively detect DoS attacks in SDN.Compared with the K-Nearest Neighbor(K-NN),Artificial Neural Network(ANN)models,Support Vector Machine(SVM)and Random Forest models,our proposed method outperforms in accuracy,precision and F1 values.
文摘In the live broadcast process,eye movement characteristics can reflect people’s attention to the product.However,the existing interest degree predictive model research does not consider the eye movement characteristics.In order to obtain the users’interest in the product more effectively,we will consider the key eye movement indicators.We first collect eye movement characteristics based on the self-developed data processing algorithm fast discriminative model prediction for tracking(FDIMP),and then we add data dimensions to the original data set through information filling.In addition,we apply the deep factorization machine(DeepFM)architecture to simultaneously learn the combination of low-level and high-level features.In order to effectively learn important features and emphasize relatively important features,the multi-head attention mechanism is applied in the interest model.The experimental results on the public data set Criteo show that,compared with the original DeepFM algorithm,the area under curve(AUC)value was improved by up to 9.32%.
基金Supported by the Beijing Municipal Natural Science Foundation,No.7252262High Level Chinese Medical Hospital Promotion Project,No.HLCMHPP2023085+2 种基金National Natural Science Foundation of China,No.82174463National Administration of Traditional Chinese Medicine,No.ZYYCXTD-C-C202205China Academy of Chinese Medical Sciences,No.CI2021A01804 and No.2022S469.
文摘BACKGROUND Colorectal cancer is a common digestive malignancy,and chemotherapy remains a cornerstone of treatment.Myelosuppression,a frequent hematologic toxicity,poses significant clinical challenges.However,no interpretable machine learning-based nomogram exists to predict chemotherapy-induced myelosuppression in colorectal cancer patients.This study aimed to develop and validate an inter-pretable clinic-machine learning nomogram integrating clinical predictors with multiple algorithms via a feature mapping algorithm.The model provides accurate risk estimation and clinical interpretability,supporting individualized prevention strategies and optimizing decision-making in patients receiving first-line chemotherapy.AIM To develop and validate an interpretable clinic-machine learning nomogram predicting chemotherapy-induced myelosuppression in colorectal cancer.METHODS This retrospective study enrolled 855 colorectal cancer patients receiving first-line chemotherapy.Data were split into training(n=612),validation(n=153),and testing(n=90)cohorts.Ten predictors were identified through least absolute shrinkage and selection operator,decision tree,random forest,and expert con-sensus.Ten machine learning algorithms were applied,with performance assessed by area under the receiver operating characteristic curve(AUC),area under the precision-recall curve(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.(AUPRC),calibration,and decision curves.The optimal model was integrated into a clinic-machine learning nomogram via the feature mapping algorithm,which was internally validated for predictive accuracy and clinical utility.RESULTS A total of 855 colorectal cancer patients were enrolled,with 765 cases(April 2020 to December 2023)used for model training and validation,and 90 cases(January 2024 to July 2024)for internal testing.Baseline clinical features did not differ significantly between training and validation cohorts(P>0.05).Ten predictors were identified through integrated feature selection and expert consensus,including age,body surface area,body mass index,tumor position,albumin,carcinoembryonic antigen,carbohydrate antigen(CA)19-9,CA125,chemotherapy regimen,and chemotherapy cycles.Among ten machine learning algorithms,extreme gradient boosting achieved the best validation performance(AUC=0.97,AUPRC=0.92,sensitivity=0.79,specificity=0.92,accuracy=0.88).Logistic regression confirmed extra trees and random forest as independent predictors,which were incorporated into a clinic-machine learning nomogram.The clinic-machine learning nomogram demonstrated superior discrimination(AUC=0.96,AUPRC=0.93,accuracy=0.90,specificity=0.95),good calibration,and greater net clinical benefit across a wide probability range(10%-90%).Internal testing further confirmed its robustness and generalizability(AUC=0.95).CONCLUSION The clinic-machine learning nomogram accurately predicts chemotherapy-induced myelosuppression in colorectal cancer,providing interpretability and clinical utility to support individualized risk assessment and treatment decision-making.
基金funded by the Ministry of Science and Higher Education of the Russian Federation(Project No.FSNM-2024-0005).
文摘Permeability is one of the main oil reservoir characteristics.It affects potential oil production,well-completion technologies,the choice of enhanced oil recovery methods,and more.The methods used to determine and predict reservoir permeability have serious shortcomings.This article aims to refine and adapt machine learning techniques using historical data from hydrocarbon field development to evaluate and predict parameters such as the skin factor and permeability of the remote reservoir zone.The article analyzes data from 4045 wells tests in oil fields in the Perm Krai(Russia).An evaluation of the performance of different Machine Learning(ML)al-gorithms in the prediction of the well permeability is performed.Three different real datasets are used to train more than 20 machine learning regressors,whose hyperparameters are optimized using Bayesian Optimization(BO).The resulting models demonstrate significantly better predictive performance compared to traditional methods and the best ML model found is one that never was applied before to this problem.The permeability prediction model is characterized by a high R^(2) adjusted value of 0.799.A promising approach is the integration of machine learning methods and the use of pressure recovery curves to estimate permeability in real-time.The work is unique for its approach to predicting pressure recovery curves during well operation without stopping wells,providing primary data for interpretation.These innovations are exclusive and can improve the accuracy of permeability forecasts.It also reduces well downtime associated with traditional well-testing procedures.The proposed methods pave the way for more efficient and cost-effective reservoir development,ultimately sup-porting better decision-making and resource optimization in oil production.
基金the financial and intellectual support provided by Queensland University of Technology(QUT),Australia,through its Higher Degree Research Program,which played a crucial role in the successful completion of this research study
文摘Moisture accumulation within road pavements,particularly in unbound granular materials with or without thin sprayed seals,presents significant challenges in high-rainfall regions such as Queensland.This infiltration often leads to various forms of pavement distress,eventually causing irreversible damage to the pavement structure.The moisture content within pavements exhibits considerable dynamism and directly influenced by environmental factors such as precipitation,air temperature,and relative humidity.This variability underscores the importance of monitoring moisture changes using real-time climatic data to assess pavement conditions for operational management or incorporating these effects during pavement design based on historical climate data.Consequently,there is an increasing demand for advanced,technology-driven methodologies to predict moisture variations based on climatic inputs.Addressing this gap,the present study employs five traditional machine learning(ML)algorithms,K-nearest neighbors(KNN),regression trees,random forest,support vector machines(SVMs),and gaussian process regression(GPR),to forecast moisture levels within pavement layers over time,with varying algorithm complexities.Using data collected from an instrumented road in Brisbane,Australia,which includes pavement moisture and climatic factors,the study develops predictive models to forecast moisture content at future time steps.The approach incorporates current moisture content,rather than averaged values,along with seasonality(both daily and annual),and key climatic factors to predict next step moisture.Model performance is evaluated using R2,MSE,RMSE,and MAPE metrics.Results show that ML algorithms can reliably predict long-term moisture variations in pavements,provided optimal hyperparameters are selected for each algorithm.The best-performing algorithms include KNN(the number of neighbours equals to 15),medium regression tree,medium random forest,coarse SVM,and simple GPR,with medium random forest outperforming the others.The study also identifies the optimal hyperparameter combinations for each algorithm,offering significant advancements in moisture prediction tools for pavement technology。
基金funded by the National Natural Science Foundation of China(No.82405530,81973921 and 72374068)the Science and Technology Research Project of Hubei Provincial Department of Education(No.B2023098)。
文摘Objective:As an age-related neurodegenerative disease,the prevalence of mild cognitive impairment(MCI)increases with age.Within the framework of traditional Chinese medicine,spleen-kidney deficiency syndrome(SKDS)is recognized as the most frequent MCI subtype.Due to the covert and gradual onset of MCI,in community settings it poses a significant challenge for patients and their families to discern between typical aging and pathological changes.There exists an urgent need to devise a preliminary diagnostic tool designed for community-residing older adults with MCI attributed to SKDS(MCI-SKDS).Methods:This investigation enrolled 312 elderly individuals diagnosed with MCI,who were randomly distributed into training and test datasets at a 3:1 ratio.Five machine learning methods,including logistic regression(LR),decision tree(DT),naive Bayes(NB),support vector machine(SVM),and gradient boosting(GB),were used to build a diagnostic prediction model for MCI-SKDS.Accuracy,sensitivity,specificity,precision,F1 score,and area under the curve were used to evaluate model performance.Furthermore,the clinical applicability of the model was evaluated through decision curve analysis(DCA).Results:The accuracy,precision,specificity and F1 score of the DT model performed best in the training set(test set),with scores of 0.904(0.845),0.875(0.795),0.973(0.875)and 0.973(0.875).The sensitivity of the training set(test set)of the SVM model performed best among the five models with a score of 0.865(0.821).The area under the curve of all five models was greater than 0.9 for the training dataset and greater than 0.8 for the test dataset.The DCA of all models showed good clinical application value.The study identified ten indicators that were significant predictors of MCI-SKDS.Conclusion:The risk prediction index derived from machine learning for the MCI-SKDS prediction model is simple and practical;the model demonstrates good predictive value and clinical applicability,and the DT model had the best performance.
基金supported by the Natural Science Foundation of Shanxi Province,No.20210302123299The Belt and Road Program of Shanxi Province,No.110000261420228002(both to CZ)。
文摘Studies have shown that vascular dysfunction is closely related to the pathogenesis of Alzheimer's disease.The middle temporal gyrus region of the brain is susceptible to pronounced impairment in Alzheimer's disease.Identification of the molecules involved in vascular aberrance of the middle temporal gyrus would support elucidation of the mechanisms underlying Alzheimer's disease and discove ry of novel targets for intervention.We carried out single-cell transcriptomic analysis of the middle temporal gyrus in the brains of patients with Alzheimer's disease and healthy controls,revealing obvious changes in vascular function.CellChat analysis of intercellular communication in the middle temporal gyrus showed that the number of cell interactions in this region was decreased in Alzheimer's disease patients,with altered intercellular communication of endothelial cells and pericytes being the most prominent.Differentially expressed genes were also identified.Using the CellChat results,AUCell evaluation of the pathway activity of specific cells showed that the obvious changes in vascular function in the middle temporal gyrus in Alzheimer's disease were directly related to changes in the vascular endothelial growth factor(VEGF)A-VEGF receptor(VEGFR)2 pathway.AUCell analysis identified subtypes of endothelial cells and pericytes directly related to VEGFA-VEGFR2 pathway activity.Two subtypes of middle temporal gyrus cells showed significant alteration in AD:endothelial cells with high expression of Erb-B2 receptor tyrosine kinase 4(ERBB4^(high))and pericytes with high expression of angiopoietin-like 4(ANGPTL4^(high)).Finally,combining bulk RNA sequencing data and two machine learning algorithms(least absolute shrinkage and selection operator and random forest),four characteristic Alzheimer's disease feature genes were identified:somatostatin(SST),protein tyrosine phosphatase non-receptor type 3(PTPN3),glutinase(GL3),and tropomyosin 3(PTM3).These genes were downregulated in the middle temporal gyrus of patients with Alzheimer's disease and may be used to target the VEGF pathway.Alzheimer's disease mouse models demonstrated consistent altered expression of these genes in the middle temporal gyrus.In conclusion,this study detected changes in intercellular communication between endothelial cells and pericytes in the middle temporal gyrus and identified four novel feature genes related to middle temporal gyrus and vascular functioning in patients with Alzheimer's disease.These findings contribute to a deeper understanding of the molecular mechanisms underlying Alzheimer's disease and present novel treatment targets.
基金supported by the National Key R&D Program of China(No.2022YFA1005204l)。
文摘Silicone material extrusion(MEX)is widely used for processing liquids and pastes.Owing to the uneven linewidth and elastic extrusion deformation caused by material accumulation,products may exhibit geometric errors and performance defects,leading to a decline in product quality and affecting its service life.This study proposes a process parameter optimization method that considers the mechanical properties of printed specimens and production costs.To improve the quality of silicone printing samples and reduce production costs,three machine learning models,kernel extreme learning machine(KELM),support vector regression(SVR),and random forest(RF),were developed to predict these three factors.Training data were obtained through a complete factorial experiment.A new dataset is obtained using the Euclidean distance method,which assigns the elimination factor.It is trained with Bayesian optimization algorithms for parameter optimization,the new dataset is input into the improved double Gaussian extreme learning machine,and finally obtains the improved KELM model.The results showed improved prediction accuracy over SVR and RF.Furthermore,a multi-objective optimization framework was proposed by combining genetic algorithm technology with the improved KELM model.The effectiveness and reasonableness of the model algorithm were verified by comparing the optimized results with the experimental results.
基金the Fund of the Science and Technology Commission of Shanghai Municipality(No.13511506402)
文摘Performing arts and movies have become commercial products with high profit and great market potential. Previous research works have developed comprehensive models to forecast the demand for movies. However,they did not pay enough attention to the decision support for performing arts which is a special category unlike movies. For performing arts with high-dimensional categorical attributes and limit samples, determining ticket prices in different levels is still a challenge job faced by the producers and distributors. In terms of these difficulties, factorization machine(FM), which can handle huge sparse categorical attributes, is used in this work first. Adaptive stochastic gradient descent(ASGD) and Markov chain Monte Carlo(MCMC) are both explored to estimate the model parameters of FM. FM with ASGD(FM-ASGD) and FM with MCMC(FM-MCMC) both can achieve a better prediction accuracy, compared with a traditional algorithm. In addition, the multi-output model is proposed to determine the price in multiple price levels simultaneously, which avoids the trouble of the models' repeating training. The results also confirm the prediction accuracy of the multi-output model, compared with those from the general single-output model.