摘要
根据汉字在文献中使用的频度和分布规律,利用图书分类的原则,提出了基于汉字在成组(连续索引号)文献中出现的集合索引方案。与基于单值表达的常规索引方案作了比较,并对一类短语料进行了实验,给出了部分比较结果以及索引调整的一些分析结果。模拟图书分类中汉字非均匀分布的特点,从理论上分析文献索引各参量间的关系。分析表明,集合索引法与单值表达索引相比,能够显著提高素引效率,减少索引空间,还可进行随机动态调整。
Chinese word has its own usage frequency rule in the text, slightly different from English word.According to the classification of text, this paper proposed the set-based approach on Chinese word index.Following the results of some experimenis on comparing with the single-value method, the set-based method can reduce index space and accelerate retrieve speed, its efficiency can be enhanced later.In this paper, an analytical model was established for study purpose.
出处
《计算机工程》
CAS
CSCD
北大核心
1998年第7期5-7,49,共4页
Computer Engineering
关键词
集合索引
文献检索
单汉字检索
Set-based index
Chinese word index
Succession degree