期刊文献+

面向不平衡数据的电子病历自动分类研究 被引量:1

Automatic Classification with Unbalanced Data for Electronic Medical Records
原文传递
导出
摘要 【目的】提出一种面向不平衡数据的电子病历自动分类方法,以进一步提高临床电子病历分类性能。【方法】利用MC-BERT增强电子病历的语义表示,并设计了相应的深度神经网络框架以提高模型的语义提取能力,最终利用类别数量比例、梯度协调机制和类别相似度从样本数量不平衡和样本分类难度不平衡两个角度设计了新的损失函数。【结果】通过真实电子病历数据集进行实证和对比实验,本文方法的精确率、宏平均F1值、微平均F1值分别为81.37%、65.89%、81.47%,优于前人提出的分类方法。【局限】仅针对单一临床科室的病历进行了实证研究。【结论】面向不平衡数据的电子病历自动分类方法可以有效地提高电子病历分类性能。 [Objective]This paper proposes an automatic classification method for electronic medical records with unbalanced data,aiming to further improve the classification performance of clinical electronic medical records.[Methods]First,we used the MC-BERT to enhance the semantic representation of electronic medical records.Then,we designed a deep neural network framework to improve the model’s semantic extraction capabilities.Finally,we designed a new loss function from the perspectives of the unbalanced sample categories and difficulty of classification.The proportion of categories,gradient coordination mechanism,and categories similarity were added to the model.[Results]We examined the new model with real electronic medical records.Its accuracy reached 81.37%,while the macro-average F1 value was 65.89%,and the micro-average F1 value was 81.47%.These results are better than the existing methods.[Limitations]We only retrieved medical records from one department.[Conclusions]The proposed method can effectively improve the classification results of unbalanced data.
作者 张云秋 李博诚 陈妍 Zhang Yunqiu;Li Bocheng;Chen Yan(College of Public Health,Jilin University,Changchun 130021,China;Shenzhen Health Development Research and Data Management Center,Shenzhen 518028,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2022年第2期233-241,共9页 Data Analysis and Knowledge Discovery
基金 教育部人文社会科学规划项目(项目编号:18YJA870017) 深圳市医学信息中心委托项目(项目编号:2020(261)) 吉林大学研究生创新基金项目(项目编号:101832020CX279)的研究成果之一。
关键词 不平衡数据 深度学习 电子病历 代价敏感学习 Unbalanced Data Deep Learning Electronic Medical Records Cost-Sensitive Learning
  • 相关文献

参考文献6

二级参考文献113

  • 1张琦,吴斌,王柏.非平衡数据训练方法概述[J].计算机科学,2005,32(10):181-186. 被引量:10
  • 2韩慧,王路,温明,王文渊.不均衡数据集学习中基于初分类的过抽样算法[J].计算机应用,2006,26(8):1894-1897. 被引量:12
  • 3Chan P K, Stolfo S J. Toward scalable learning with nonuniform class and cost distributions: A case study in credit card fraud detection[C]// Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining. New York, USA: AAAI Press, 1998:164-168.
  • 4Phua C, Alahakoon D, Lee V. Minority report in fraud detection:Classification of skewed data[J]. SIGKDD Explore, 2004,6 (1) :50-59.
  • 5Sun Aixin, Lira E P, Liu Ying. On strategies for imbalaneed text classification using SVM: A comparative study[J]. Decision Support Systems, 2009,48 : 191-201.
  • 6Turney P D. Learning algorithms for keyphrase extraction[J]. Information Retrieval, 2000,2 (4) : 303-336.
  • 7Ling C X, Li C. Data mining for direct marketing: Problems and solutions[C] // Proceeding of the 4th International Conference on Knowledge Discovery and Data Mining. 1998:73-79.
  • 8Bauer E,Kohavi R. An empirical comparison of voting classication algoirthm: Bagging, boosting and variants [J]. Machine Learning, 1999,36 : 105-142.
  • 9Japkowicz N, Stephen S. The class imbalance problem: A systematic study[J]. Intelligent Data Analysis Journal, 2002,6 (5): 429-450.
  • 10Joshi M V. Learning Classier Models for Predicting Rare Phonemena[D]. University of Minnesota USA, 2002.

共引文献263

同被引文献24

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部