摘要
从政务文献主题标引的需求出发 ,面向《电子政务主题词表》的网络应用 ,探讨政务文献主题标引的方法和技术 ,提出了词典法和N gram提取技术相结合的实用算法。这一算法可以弥补传统的词典法由于政务文献涉及层面广泛和新词出现频繁所带来的词汇不足的问题。同时 ,讨论了利用该词表进行赋词标引的有关问题。
As one of the first e government standard project, e government thesaurus draft has been primly completed. Facing on its internet application, this paper makes a tentative discussion on the techniques and method about the requirement of government document subject indexing. This algorithm combines the techniques of dictionary based segment and the techniques of N gram based feature catching.It can solve the problem of lack of words when using traditional dictionary based segment is used in government document involved in wide fields and frequent term renewal. The method of extraction of thesaurus and
出处
《高技术通讯》
EI
CAS
CSCD
2003年第10期15-19,共5页
Chinese High Technology Letters
基金
86 3计划 (2 0 0 2AA1Z6 711)资助项目
关键词
电子政务
主题词表应用系统
主题标引算法
汉语
中国
政务文献
are also discussed. Key words: e government, Subject indexing, Dictionary based segment, N gram based feature catching