摘要
【目的】提出一种面向不平衡数据的电子病历自动分类方法,以进一步提高临床电子病历分类性能。【方法】利用MC-BERT增强电子病历的语义表示,并设计了相应的深度神经网络框架以提高模型的语义提取能力,最终利用类别数量比例、梯度协调机制和类别相似度从样本数量不平衡和样本分类难度不平衡两个角度设计了新的损失函数。【结果】通过真实电子病历数据集进行实证和对比实验,本文方法的精确率、宏平均F1值、微平均F1值分别为81.37%、65.89%、81.47%,优于前人提出的分类方法。【局限】仅针对单一临床科室的病历进行了实证研究。【结论】面向不平衡数据的电子病历自动分类方法可以有效地提高电子病历分类性能。
[Objective]This paper proposes an automatic classification method for electronic medical records with unbalanced data,aiming to further improve the classification performance of clinical electronic medical records.[Methods]First,we used the MC-BERT to enhance the semantic representation of electronic medical records.Then,we designed a deep neural network framework to improve the model’s semantic extraction capabilities.Finally,we designed a new loss function from the perspectives of the unbalanced sample categories and difficulty of classification.The proportion of categories,gradient coordination mechanism,and categories similarity were added to the model.[Results]We examined the new model with real electronic medical records.Its accuracy reached 81.37%,while the macro-average F1 value was 65.89%,and the micro-average F1 value was 81.47%.These results are better than the existing methods.[Limitations]We only retrieved medical records from one department.[Conclusions]The proposed method can effectively improve the classification results of unbalanced data.
作者
张云秋
李博诚
陈妍
Zhang Yunqiu;Li Bocheng;Chen Yan(College of Public Health,Jilin University,Changchun 130021,China;Shenzhen Health Development Research and Data Management Center,Shenzhen 518028,China)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2022年第2期233-241,共9页
Data Analysis and Knowledge Discovery
基金
教育部人文社会科学规划项目(项目编号:18YJA870017)
深圳市医学信息中心委托项目(项目编号:2020(261))
吉林大学研究生创新基金项目(项目编号:101832020CX279)的研究成果之一。
关键词
不平衡数据
深度学习
电子病历
代价敏感学习
Unbalanced Data
Deep Learning
Electronic Medical Records
Cost-Sensitive Learning