摘要
目的:基于不同监督机器学习算法,构建适用于急诊多发伤患者院内死亡风险的最佳预测模型。方法:回顾性分析2019年1月—2023年12月首都医科大学大兴教学医院收治的817例急诊多发伤患者的临床资料,其中男性602例,女性215例;年龄18~89岁,平均(54.82±17.25)岁。以患者的一般资料、实验室检查指标等作为相关预测变量,研究终点为院内死亡。将患者按照7:3比例简单随机拆分为训练集(n=571)和测试集(n=246),在训练集中对院内生存组与死亡组的相关变量进行单因素分析,筛选出两组间差异具有统计学意义的变量后进行LASSO回归分析,筛选出非零系数变量作为最终入选特征。选择逻辑回归(LR)、随机森林(RF)、支持向量机(SVM)3种监督机器学习算法构建模型。在测试集中对各模型的性能进行评估,采用受试者操作特征(ROC)曲线验证模型的预测效能。正态分布的计量资料以均数±标准差(x^-±s)表示,组间比较采用t检验;非正态分布的计量资料以中位数和四分位间距[M(Q_(1),Q_(3))]表示,组间比较采用秩和检验。计数资料以例数和百分比[例(%)]表示,组间比较采用χ^(2)检验或Fisher确切概率法。结果:共纳入817例患者,死亡65例,死亡率为8.0%。基于训练集数据进行单因素分析,之后将差异具有统计学意义的变量进行LASSO回归分析,结果显示,患者年龄、白蛋白、红细胞计数(RBC)、肌酸激酶、葡萄糖、脑钠肽、C反应蛋白、乳酸、二氧化碳分压(PCO_(2))、低密度脂蛋白胆固醇、凝血酶原时间(PT)、纤维蛋白原、纤维蛋白降解产物(FDP)、肌钙蛋白I、降钙素原(PCT)、创伤程度评分(ISS)、格拉斯哥昏迷评分(GCS)共17个变量为急诊多发伤患者院内死亡的危险因素。根据上述17个变量建立3种监督机器学习模型,LR模型中重要性排名前5位的分别为PCO_(2)、PCT、FDP、PT和RBC,RF模型中重要性排名前5位的分别为PCO_(2)、ISS、葡萄糖、白蛋白和GCS,SVM模型中重要性排名前5位的分别为PCT、FDP、PCO_(2)、PT、葡萄糖。在测试集中进行模型效果评估,结果显示,LR模型的曲线下面积(AUC)为0.952,特异性为0.996,准确率为0.963,灵敏度和召回率均为0.600。RF模型的AUC值为0.970,优于LR和SVM模型,特异性为0.987,准确率为0.959,灵敏度和召回率均为0.650。SVM模型的AUC值为0.944,特异性为0.996,准确率为0.947,灵敏度和召回率均为0.400。3种模型各有优势,但RF模型在综合性能上表现最优。结论:以PCO_(2)、ISS、葡萄糖、白蛋白、GCS等17个最佳变量构建的RF模型对急诊多发伤患者院内死亡有较强的预测能力,值得临床进一步研究。
Objective:To construct the optimal prediction model for in-hospital mortality risk in emergency multiple trauma patients based on different supervised machine learning algorithms.Methods:A retrospective analysis was conducted on the clinical data of 817 patients with emergency multiple trauma who were admitted to the Daxing Teaching Hospital,Capital Medical University from January 2019 to December 2023.Among them,602 were males and 215 were females,the age ranged from 18 to 89 years,with an average of(54.82±17.25)years.The general information and laboratory test indicators of patients were collected as relevant predictor variables,with in-hospital mortality defined as the study endpoint.The patients were simply and randomly divided into the training set(n=571)and the testing set(n=246)in a 7∶3 ratio.Univariate analysis was performed on the training set to compare the relevant variables between the survival and death groups.Variables with statistical significance were then subjected to LASSO regression analysis to identify predictors with non-zero coefficients,which were selected as final features.Three supervised machine learning models,namely Logistic regression(LR),random forest(RF),and support vector machine(SVM)were selected to construct the model.The predictive performance of each model in testing set was evaluated,and the predictive efficacy of the models was verified using receiver operating characteristic(ROC)curve.The measurement data of normal distribution were expressed as mean±standard deviation(x^-±s),and comparisons between groups were conducted using the t-test.The measurement data with non-normal distribution were expressed as median and interquartile range[M(Q_(1),Q_(3))],and comparisons between groups were conducted using rank-sum tests.The count data were expressed as the number of cases and percentages,and comparisons between groups were conducted using the Chi-test or Fisher exact probability method.Results:A total of 817 patients were included,with 65 deaths,resulting in a mortality of 8.0%.Univariate analysis was conducted based on the training set data,and then LASSO regression analysis was performed on the variables with statistically significant differences.The results showed 17 variables were risk factors for in-hospital mortality in patients with emergency multiple trauma,including age,albumin,red blood cell(RBC),creatine kinase(CK),glucose(GLU),brain natriuretic peptide(BNP),C-reactive protein(CRP),lactic acid,PCO_(2),low-density lipoprotein cholesterol(LDL-C),prothrombin time(PT),fibrinogen(FIB),fibrin degradation products(FDP),troponin I(TNI),procalcitonin(PCT),injury severity score(ISS),and Glasgow coma scale(GCS).Based on the above 17 variables,three supervised machine learning models were established.Among the LR model,the top 5 in terms of importance were PCO_(2),PCT,FDP,PT,and RBC.Among the RF model,the top 5 in terms of importance were PCO_(2),ISS,GLU,ALB,and GCS.Among the SVM model,the top 5 in terms of importance were PCT,FDP,PCO_(2),PT,and GLU.Model performance evaluation in the testing set showed that the area under the curve(AUC)of the LR model was 0.952,the specificity was 0.996,the accuracy was 0.963,and both the sensitivity and recall rate were 0.600.The AUC of the RF model was 0.970,better than the LR and SVM models,the specificity was 0.987,the accuracy was 0.959,and both the sensitivity and recall rate were 0.650.The AUC of the SVM model was 0.944,the specificity was 0.996,the accuracy was 0.947,and both the sensitivity and recall rate were 0.400.Each model had its strengths,but the RF model demonstrated the best overall performance.Conclusion:The RF model constructed using 17 optimal variables such as PCO_(2),ISS,GLU,ALB,and GCS shows strong predictive capability for in-hospital mortality in emergency multiple trauma patients and warrants further clinical investigation.
作者
黄东明
王卫粮
Huang Dongming;Wang Weiliang(Department of Emergency,Daxing Teaching Hospital,Capital Medical University,Beijing 102600,China;Department of Trauma,Daxing Teaching Hospital,Capital Medical University,Beijing 102600,China)
出处
《国际外科学杂志》
2025年第11期753-760,F0003,共9页
International Journal of Surgery
基金
北京市大兴区人民医院院级课题(4202406497)。
关键词
多处创伤
急诊治疗
医院死亡率
监督机器学习
预测模型
Multiple trauma
Emergeney treatment
Hospital mortality
Supervised machine leaming
Predictive model