摘要
在医学图像分析领域,中文胸片报告数据集的缺乏限制了中文胸片报告生成技术的发展.一方面,构建中文胸片报告数据集时,专家准确地标注疾病耗时长成本高.另一方面,单一的自然语言生成指标通常用于评价生成报告与真实报告之间的相似性,而评价生成报告的临床正确性和有效性依赖于一个准确的疾病标注器(分类器).针对专家标注疾病耗时长成本高及疾病标注器缺乏的问题,研究提出了一种面向中文胸片报告生成的疾病标注器.该标注器利用双BERT结构分别处理诊断报告和临床信息,并通过疾病与身体部位的隶属关系构建层级标签学习算法,以提升文本分类性能.利用该疾病标注器,构建了一个包含51262例胸片报告样本的中文胸片报告数据集.最后,在专家标注的中文胸片报告子集上进行了实验和分析,验证了该疾病标注器的有效性.
In the field of medical image analysis,the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports.On one hand,the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate disease annotation by expert.On the other hand,a single natural language generation metric is commonly used to evaluate the similarity between generated and ground-truth reports,while the clinical accuracy and effectiveness of the generated reports rely on an accurate disease labeler(classifier).To address the issues,this study proposes a disease labeler tailored for the generation of Chinese chest X-ray reports.This labeler leverages a dual BERT architecture to handle diagnostic reports and clinical information separately and constructs a hierarchical label learning algorithm based on the affiliation between diseases and body parts to enhance text classification performance.Utilizing this disease labeler,a Chinese chest X-ray report dataset comprising 51262 report samples was established.Finally,experiments and analyses were conducted on a subset of expert-annotated Chinese chest X-ray reports,validating the effectiveness of the proposed disease labeler.
作者
王梦伟
颜瑞馨
侯泽毅
郎宁
周修庄
WANG Mengwei;YAN Ruixin;HOU Zeyi;LANG Ning;ZHOU Xiuzhuang(School of Artificial Intelligence,Beijing University of Posts and Telecommunications,Beijing 100876,China;Department of Radiology,Peking University Third Hospital,Beijing 100191,China)
出处
《小型微型计算机系统》
北大核心
2025年第6期1365-1372,共8页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61972046)资助
中关村科学城-北京大学第三医院临床医学概念验证基金项目(HDCXZHKC2022202)资助.
关键词
多标签分类
层级标签
BERT
中文胸片报告数据集
胸片报告生成
multi-label classification
hierarchical labels
BERT
Chinese chest X-ray report dataset
chest X-ray report generation