摘要
目的构建基于经阴道三维超声定量指标的T型子宫机器学习诊断模型。方法采用回顾性横断面研究。收集2021年7月至2024年6月在首都医科大学附属复兴医院因"不孕或反复流产"等不良孕史就诊的患者304例。由首都医科大学附属复兴医院、首都医科大学附属北京妇产医院、北京大学人民医院、北京医院的12名专家(7名临床医师和5名超声医师)采用改良德尔菲法匿名独立评估确定T型子宫的诊断,据此分为T型子宫组56例及非T型子宫组248例。纳入7项临床特征和14项超声特征,通过十折交叉验证的最小绝对收缩和选择算法(LASSO)回归筛选出非零系数指标,采用4种机器学习算法[逻辑回归、决策树、随机森林(RF)和支持向量机]构建T型子宫诊断模型。利用Python Random函数随机抽样数据,将数据集均分为5个子集,各子集均保持原始数据类别分布(T型子宫:非T型子宫≈1∶4)及两类样本量均衡。通过五折交叉验证,每轮用4个子集训练、1个子集验证。通过受试者工作特征(ROC)曲线下面积(AUC)、灵敏度、特异度、准确度、F1分数等指标评估4种模型的性能。RF模型的重要性以各特征在构建模型时对节点不纯度的平均减少量(基尼系数)为依据确定。结果304例患者年龄为(35±4)岁,T型子宫组年龄为(35±5)岁,非T型子宫组年龄为(34±4)岁。使用LASSO回归筛选出8项非零系数指标,分别为平均侧壁内聚深度、平均侧壁内聚角度、内聚起始部深度、子宫内膜厚度、宫腔面积、内聚起始部间距、两侧壁夹角、平均宫角角度(LASSO系数分别为0.125、-0.064、-0.037、-0.030、-0.026、-0.025、-0.025及-0.024)。RF模型诊断性能最佳,训练集AUC为0.986(95%CI:0.980~0.992),灵敏度0.978,特异度0.946,准确度0.802,F1分数0.881;验证集AUC为0.948(95%CI:0.911~0.985),灵敏度0.873,特异度0.919,准确度0.716,F1分数0.784。RF模型特征重要性分析显示,平均侧壁内聚深度、内聚起始部深度及平均侧壁内聚角度是模型最重要的3个特征(总占比>65%),对模型预测起决定作用。结论以经阴道三维超声定量指标建立的RF模型对T型子宫有较好的诊断效能,为临床提供辅助诊断新方法。
ObjectiveTo develop a machine learning diagnostic model for T-shaped uterus based on quantitative parameters from 3D transvaginal ultrasound.MethodsA retrospective cross-sectional study was conducted,recruiting 304 patients who visited the hysteroscopy centre of Fuxing Hospital,Beijing,China,between July 2021 and June 2024 for reasons such as"infertility or recurrent pregnancy loss"and other adverse obstetric histories.Twelve experts,including seven clinicians and five sonographers,from Fuxing Hospital and Beijing Obstetrics and Gynecology Hospital of Capital Medical University,Peking University People′s Hospital,and Beijing Hospital,independently and anonymously assessed the diagnosis of T-shaped uterus using a modified Delphi method.Based on the consensus results,56 cases were classified into the T-shaped uterus group and 248 cases into the non-T-shaped uterus group.A total of 7 clinical features and 14 sonographic features were initially included.Features demonstrating significant diagnostic impact were selected using 10-fold cross-validated LASSO(Least Absolute Shrinkage and Selection Operator)regression.Four machine learning algorithms[logistic regression(LR),decision tree(DT),random forest(RF),and support vector machine(SVM)]were subsequently implemented to develop T-shaped uterus diagnostic models.Using the Python random module,the patient dataset was randomly divided into five subsets,each maintaining the original class distribution(T-shaped uterus:non-T-shaped uterus≈1∶4)and a balanced number of samples between the two categories.Five-fold cross-validation was performed,with four subsets used for training and one for validation in each round,to enhance the reliability of model evaluation.Model performance was rigorously assessed using established metrics:area under the curve(AUC)of receiver operator characteristic(ROC)curve,sensitivity,specificity,precision,and F1-score.In the RF model,feature importance was assessed by the mean decrease in Gini impurity attributed to each variable.ResultsA total of 304 patients had a mean age of(35±4)years,and the age of the T-shaped uterus group was(35±5)years;the age of the non-T-shaped uterus group was(34±4)years..Eight features with non-zero coefficients were selected by LASSO regression,including average lateral wall indentation width,average lateral wall indentation angle,upper cavity depth,endometrial thickness,uterine cavity area,cavity width at level of lateral wall indentation,angle formed by the bilateral lateral walls,and average cornual angle(coefficient:0.125,-0.064,-0.037,-0.030,-0.026,-0.025,-0.025 and-0.024,respectively).The RF model showed the best diagnostic performance:in training set,AUC was 0.986(95%CI:0.980-0.992),sensitivity was 0.978,specificity 0.946,precision 0.802,and F1-score 0.881;in testing set,AUC was 0.948(95%CI:0.911-0.985),sensitivity was 0.873,specificity 0.919,precision 0.716,and F1-score 0.784.RF model feature importance analysis revealed that average lateral wall indentation width,upper cavity depth,and average lateral wall indentation angle were the top three features(over 65%in total),playing a decisive role in model prediction.ConclusionThe machine learning models developed in this study,particularly the RF model,are promising for the diagnosis of T-shaped uterus,offering new perspectives and technical support for clinical practice.
作者
李斯静
王瑜
黄睿
杨丽曼
吕晓丹
黄晓武
彭雪冰
宋冬梅
马宁
肖豫
周巧云
郭艳
梁娜
刘爽
高侃
闫亚妮
夏恩兰
Li Sijing;Wang Yu;Huang Rui;Yang Liman;Lyu Xiaodan;Huang Xiaowu;Peng Xuebing;Song Dongmei;Ma Ning;Xiao Yu;Zhou Qiaoyun;Guo Yan;Liang Na;Liu Shuang;Gao Kan;Yan Yani;Xia Enlan(Hysteroscopy Center,Fuxing Hospital,Capital Medical University,Beijing 100038,China;Center for Applied Statistics,Renmin University of China,School of Statistics,Renmin University of China,Beijing 100086,China;Department of ultrasound,Beijing Obstetrics and Gynecology Hospital,Capital Medical University,Beijing 100026,China;Department of ultrasound,Beijing Hospital,Beijing 100730,China;Department of Obstetrics and Gynecology,Peking University People′s Hospital,Beijing 100044,China)
出处
《中华医学杂志》
北大核心
2025年第30期2551-2557,共7页
National Medical Journal of China
关键词
超声检查
T型子宫
成像
三维
模型
统计学
Ultrasonography
T-shaped terus
Imaging,three-dimensional
Model,statistical