摘要
目的基于公开的医学知识图谱和电子病历构建脑卒中临床专病知识图谱。方法使用生物医学信息本体系统和脑卒中患者结构化电子病历作为知识来源,依次构建脑卒中基础术语表、患者特征词表、脑卒中概念字典、脑卒中关系三元组集和脑卒中概念术语集,将关系三元组集和概念术语集导入Neo4j数据库中,完成脑卒中专病知识图谱的构建。通过知识图谱嵌入模型获得知识图谱表示,设计实验将链接预测和三元组分类作为评价任务对比图谱通过TransE、Rotate以及Analogy等模型获得的图谱嵌入的性能差异。此外,设计关于患者住院是否超过7和14 d的2个预测任务,将性能最优知识图谱嵌入与基于Skip-Gram算法的原始特征患者表示相融合,构建机器学习模型完成预测任务并评价其性能。评价指标采用F1得分、受试者工作特征(Receiver Operating Characteristic,ROC)的曲线下面积(Area Under Curve,AUC)、精确率-召回率AUC。结果构建的脑卒中专病知识图谱拥有215090个实体和550976个关系,基于RotatE模型获得最优图谱嵌入。实验结果显示,相较于P-vector,KGP-vector在预测患者住院是否超过7和14 d的任务中,F1得分、ROC的AUC、精确率-召回率AUC分别提升0.039、0.061、0.047和0.089、0.081、0.103。结论利用公开的医学知识图谱并结合患者数据可以快速构建高质量专病知识图谱,有望为脑卒中疾病的临床决策、疾病诊断以及个性化医疗提供支持。
Objective To construct a clinical specific disease knowledge graph of stroke based on the public medical knowledge graph and electronic medical records.Methods The biomedical information ontology system and the structured electronic medical records of stroke patients were used as knowledge sources,the basic terminology list of stroke,the characteristic word list of patients,the concept dictionary of stroke,the triplet set of stroke relationships and the concept terminology set of stroke were constructed successively.The relational triplet set and the conceptual term set were imported into the Neo4j database to complete the construction of the knowledge graph of stroke specific diseases.The knowledge graph representation was obtained through the knowledge graph embedding model.Experiments were designed to take link prediction and triple classification as evaluation tasks to compare the performance differences of the graph embeddings obtained by the graph through models such as TransE,Rotate,and Analogy.In addition,two prediction tasks were designed for whether the patient’s hospitalization exceeded 7 and 14 d.The embedding of the knowledge graph with the optimal performance was fused with the original feature patient representation based on the Skip-gram algorithm to construct a machine learning model to complete the prediction task and evaluate its performance.F1 score,area under curve(AUC)of receiver operating characteristic(ROC),and AUC of precision-recall rate were adopted as the evaluation indicators.Results The constructed knowledge graph for stroke specific diseases had 215090 entities and 550976 relationships,and the optimal graph embedding was obtained based on the RotatE model.The experimental results showed that,compared to the P-vector,the KGP-vector achieved improvements in the tasks of predicting whether a patient’s hospitalization exceeds 7 and 14 d.Specifically,the F1 score,ROC AUC,and precision-recall AUC increased by 0.039,0.061,0.047 and 0.089,0.081,0.103,respectively.Conclusion By using the public medical knowledge graph combined with patient data,a high-quality specific disease knowledge graph can be rapidly constructed,which is expected to provide support for clinical decision-making,disease diagnosis and personalized medical treatment of stroke diseases.
作者
谢忠壤
王牧雨
范世玉
李一晨
陈卉
XIE Zhongrang;WANG Muyu;FAN Shiyu;LI Yichen;CHEN Hui(School of Biomedical Engineering,Capital Medical University,Beijing 100069,China;Beijing Key Laboratory of Basic Research in Applied Clinical Biomechanics,Capital Medical University,Beijing 100069,China)
出处
《中国医疗设备》
2025年第6期44-48,共5页
China Medical Devices
基金
国家自然科学基金项目(82372094)
北京市自然科学基金(7252278)。
关键词
知识图谱
电子病历
脑卒中
生物医学信息本体系统
患者表示
预测模型
knowledge graph
electronic medical record
stroke
biomedical information ontology system
patient stated
predictive model