期刊文献+

基于多特征融合的医疗社区问题文本聚类研究 被引量:2

Research on text clustering of medical community questions based on multi-feature fusion
暂未订购
导出
摘要 目的:医学问题文本数据存在上下文语义缺失且特征稀疏高维等特点,为提高其聚类效果,提出将文本语义特征和主题特征相融合的文本表示方法用于文本聚类。方法:以医疗社区中的问题文本为数据源,将加权fastText词汇语义特征和LDA文档主题特征融合对问题文本进行表示,构建融合特征用于问题文本聚类,聚类效果评估采用聚类准确度(ACC)和标准互信息(NMI)。结果:与其他方法相比,特征融合的聚类模型表现最佳,其聚类准确度和标准互信息为0.577和0.429,高于其他相关基线模型。结论:实验表明,将特征进行融合能够更加全面准确有效地表征医学问题文本,为医学问题文本特征表示和聚类知识发现提供参考。 Objective The text data of medical questions have the characteristics of context semantics deficiency and sparse and high-dimensional features.In order to improve the effect of text clustering,this paper proposes a text representation method fusing text semantic features and topic features for text clustering.Methods Taking the questions text in the medical community as the data source,the weighted fastText lexical semantic features and LDA’s document-topic features were fused to represent the questions text,and the integrated features were constructed for the questions text clustering.The clustering effect was evaluated by Clustering Accuracy(ACC) and Standard Mutual Information(NMI).Results Compared with other methods,the clustering model based on feature fusion performed the best,and its clustering accuracy and standard mutual information were 0.577 and 0.429,which were higher than other relevant baseline models.Conclusion The Experiment shows that the feature fusion method can represent the medical questions text more comprehensively accurately and effectively,and can provide reference for feature representation and clustering knowledge discovery of medical questions text.
作者 申喜凤 李美婷 张维宁 南嘉乐 孙媛媛 付玉伟 高东平 Shen Xifeng;Li Meiting;Zhang Weining;Nan Jiale;Sun Yuanyuan;Fu Yuwei;Gao Dongping(Institute of Medical Information,Peking Union Medical College Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College,Beijing 100020,China;Peking Union Medical College Hospital,Chinese Academy of Medical Sciences&Peking Union Medical College)
出处 《中国数字医学》 2022年第12期28-34,共7页 China Digital Medicine
基金 科技创新2030-“新一代人工智能”重大项目(2020AAA0104905)。
关键词 LDA fastText模型 特征融合 聚类 问题文本 LDA FastText model Feature fusion Clustering Question text
  • 相关文献

参考文献5

二级参考文献54

共引文献73

同被引文献10

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部