摘要
对于是否是中文机构名或机构名简称的自动判别,已经有广泛和深入的研究;但是对机构名简称和全称的匹配,目前鲜有研究成果。本文针对基于中文机构名简称的检索方法,研究了机构名的结构特征,总结出两种规则,定制了一个基于关键词类的分词工具,提出简称和全称匹配的一种算法,并且结合多级索引技术,实现了基于中文机构名简称的检索系统。实验结果表明,本文所提方法的准确性较好,首选准确率达到近95%,在全称机构名总数达到51万的情况下,检索平均耗时约0.21秒,达到实用要求。
Many research has been done on automatic recognition of Chinese organization names or abbreviated Chinese organization names, but almost none Of them focuses on matching the full names with the abbreviated ones. This paper aims at retrieving full Chinese organization names based on their abbreviated names. After studying the structural features of the organization names, two types of rules were firstly proposed and then a keyword-based segmentation system was implemented, after that, a novel algorithm of matching an abbreviated name with a full name was proposed. Finally, a retrieval system was implemented using a multi-level indexing technique. The experimental results show that our approach could achieve an accuracy of nearly 95% where the total number of organization names was 510 000, and the average retrieval time was about 0.21 seconds per query.
出处
《中文信息学报》
CSCD
北大核心
2007年第1期38-42,共5页
Journal of Chinese Information Processing
关键词
计算机应用
中文信息处理
多级索引
模糊匹配
分词算法
computer application, Chinese information processing
multi-level indexing
fuzzy matching
word segmentation