摘要
目的对解放军总医院第一医学中心乳腺癌病理报告进行文本结构化信息提取,以支持临床分析研究。方法通过总结解放军总医院第一医学中心2005-2017年共计1万余份乳腺癌病理报告结构及特点,结合临床科研需求和专家经验,整理结构化字段词典和抽取规则,利用基于规则的模式匹配抽取方法对病理报告进行信息抽取。结果利用该方法得到临床科研所需乳腺病理文本结构化指标及其值,并对随机抽样的200条病理报告进行效果评估,结构化字段的召回率和准确率均高于90%。结论基于规则的模式匹配信息提取方法在乳腺癌病理报告中具有一定适用性,可快速、有效地实现对文本信息的结构化提取。
Objective To support clinical research by extracting structured information from more than 10000 breast cancer pathological reports of Chinese PLA General Hospital.Methods We collected 10590 pathological reports of breast cancer from 2005 to 2017 in Chinese PLA General Hospital.Combined the needs of clinical scientific research and expert experience,we constructed the index dictionary and extraction rules,and extracted the information from the pathological report by the extraction method of rule-based pattern matching.Results We got the text structured indexes and their values which were needed in the clinical scientific research from the breast cancer pathological reports by our method.According to the effect evaluation to 200 random pathological reports,the recall rate and accuracy of indexes obtained by the extraction method based on pattern matching were both higher than 90%.Conclusion The information extraction method based on rule-based pattern matching is feasible and can quickly extract information from the breast cancer pathological reports.
作者
吴欢
应俊
王逸飞
胡华宇
徐洪丽
郑一琼
WU Huan;YING Jun;WANG Yifei;HU Huayu;XU Hongli;ZHENG Yiqiong(Medical Big Data Center,Chinese PLA General Hospital,Beijing 100853,China;School of Medicine,Nankai University,Tianjin 300071,China;Department of General Surgery,the First Medical Center,Chinese PLA General Hospital,Beijing 100853,China)
出处
《解放军医学院学报》
CAS
2020年第7期746-751,共6页
Academic Journal of Chinese PLA Medical School
基金
解放军总医院医疗大数据中心研发项目(2016MBD-018,2018MBD-005)
关键词
乳腺癌病理报告
模式匹配
正则表达式
信息抽取
自然语言处理
breast cancer pathological reports
pattern matching
regular expression
information extraction
natural language processing