摘要
为了解生物医学文本挖掘的研究现状和评估未来的发展方向,以美国国立图书馆Pub Med中收录的2000年1月-2015年3月发表的生物医学文本挖掘研究文献记录为样本来源,提取文献记录的主要主题词进行频次统计后截取高频主题词,形成高频主题词-论文矩阵,根据高频主题词在同一篇论文中的共现情况对其进行聚类分析,根据高频主题词聚类分析结果和对应的类标签文献,分析当前生物医学文本挖掘研究的热点。结果显示,当前文本挖掘在生物医学领域应用的主要研究热点为文本挖掘的基本技术研究、文本挖掘在生物信息学领域里的应用、文本挖掘在药物相关事实抽取中的应用3个方面。
The high frequency subject terms were extracted from the PubMed-covered papers published from January 2000 to March 2015 on text mining of biomedical field to generate the matrix of high frequency subject terms and their source papers. The co-occurrence of high frequency subject terms in a same paper was analyzed by clustering analysis. The hotspots in text mining of biomedical field were analyzed according to the clustering analysis of high frequency subject terms and their corresponding class labels, which showed that the hotspots in text mining of bio- medical field were the basic technologies of text mining, application of text mining in biomedical informatics and in extraction of drugs-related facts.
出处
《中华医学图书情报杂志》
CAS
2016年第2期27-33,共7页
Chinese Journal of Medical Library and Information Science
关键词
文本挖掘
生物医学研究
研究热点
Text mining
Research on biomedicine
Hotspots in research