期刊文献+

A New Indexing Method Based on Word Proximity for Chinese Text Retrieval 被引量:1

A New Indexing Method Based on Word Proximity for Chinese Text Retrieval
原文传递
导出
摘要 This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-based indexing methods, such as bi-gram or tri-gram indexing, have high false drops due to the mismatches between queries and documents. On the other hand, it's difficult to efficiently identify all the proper nouns, terminology of different domains, and phrases in the word-based indexing systems. The new indexing method uses both proximity and mutual information of the word pairs to represent the text content so as to overcome the high false drop, new word and phrase problems that exist in the character-based and word-based systems. The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5. This paper proposed a novel text representation and matching scheme for Chinese text retrieval. At present, the indexing methods of Chinese retrieval systems are either character-based or word-based. The character-based indexing methods, such as bi-gram or tri-gram indexing, have high false drops due to the mismatches between queries and documents. On the other hand, it's difficult to efficiently identify all the proper nouns, terminology of different domains, and phrases in the word-based indexing systems. The new indexing method uses both proximity and mutual information of the word pairs to represent the text content so as to overcome the high false drop, new word and phrase problems that exist in the character-based and word-based systems. The evaluation results indicate that the average query precision of proximity-based indexing is 5.2% higher than the best results of TREC-5.
作者 杜林 孙玉芳
出处 《Journal of Computer Science & Technology》 SCIE EI CSCD 2000年第3期280-286,共7页 计算机科学技术学报(英文版)
关键词 information retrieval vector space model automatic indexing proximity-based indexing information retrieval, vector space model, automatic indexing,proximity-based indexing
  • 相关文献

参考文献7

  • 1Du L,SCIPL ’98,1998年,32页
  • 2Liu K Y,Appl Linguistics,1997年,21卷,1期,101页
  • 3Leong M K,Text Retrieval Conference (TREC-6), NlST, Gaithersburg,1997年,551页
  • 4Wu Lide,Commun COLIPS,1996年,6卷,1期,35页
  • 5He J,Text RetrievalCOndrence (TREC-5), NIST, Gaithersburg,1996年,191页
  • 6Sun M,Commun COLIPS,1994年,4卷,2期,113页
  • 7Liu Y,Modern Chinese Word Segmentation Specification and Methodology for Information P,1994年

同被引文献31

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部