摘要
文中论述了在开发中文信息检索系统中所涉及到的两项关键技术 ,即中文分词技术和检索技术。针对中文分词技术 ,介绍了一种改进的正向最大匹配切分算法 ,以及为消除歧义引入的校正策略 ,并在此基础上结合统计方法处理未登录词。针对检索技术 ,综述了几种最常用的检索模型的原理 ,并对每种模型的优缺点进行了简要分析。最后对给出的分词算法进行了测试 。
Two key techniques in the development of Chinese Information Retrieval System are discussed in this paper, i.e., Chinese word segmentation and search technique. For Chinese word segmentation, the paper presents an improved MM segmentation algorithm, the revise strategy for disambiguation, and the statistic method for unknown words recognition based on the previous methods. For search technique, the paper summarizes the principle of several kinds of search models, and analyzes the advantages and disadvantages of each model simply. At last, the given segmentation algorithm is evaluated, and the results reveal that the veracity and efficiency of the algorithm can satisfy the applied request.
出处
《计算机应用》
CSCD
北大核心
2004年第7期128-131,共4页
journal of Computer Applications
关键词
信息检索
搜索引擎
分词技术
检索技术
information retrieval
search engine
word segmentation
search technique