摘要
医学文本相似性问题是医学文本挖掘中的重要内容,如何能够快速计算出大数据量下的医学文本的相似性情况是医学文本相似性计算的重点。针对基于传统余弦公式医学文本相似性分析算法在性能上的缺陷,提出了一种基于全文索引技术与余弦公式医学文本相似性分析算法,对医学文本相似性进行分析。采用全文索引技术对医学文本数据相关关键词进行索引,并根据若干关键词在索引中检索出部分数据,从而减少计算复杂度,提高效率。实验表明,该方法比基于传统余弦公式医学文本相似性分析算法具有更优的性能。
Medical text similarity is an important content of medical text mining,how to quickly calculate the similarity from a large number of medical text data is a key problem of medical text similarity calculation.For medical text similarity analysis based on traditional cosine formula algorithm on the performance of defects,this paper proposes a algorithm of medical text similarity analysis which based on full-text index and cosine formula,It can be analyzed in the similarity of medical text.It uses full-text indexing technology to index medical text data relevant keywords,and according to the number ofkeywords retrieve part of the data from the index,so as to reduce the computational complexity and improve efficiency.Experiments show that,the method of similarity analysis algorithm has better performance than the traditional medical text based on the cosine formula.
出处
《微型电脑应用》
2014年第1期25-27,共3页
Microcomputer Applications
基金
湛江市科技计划项目(编号:2012C3102009)
广东医学院青年基金项目(编号:XQ1353)