摘要
近年来随着社会经济科技和因特网的迅速发展,文献中不断涌现出大量未登录词。未登录词的存在严重影响了汉语自动分词与自动标引的准确率和速率。本文对1000篇经济类网页的关键部位———题名、摘要、关键词、首段进行未登录词挖掘试验,侧重对未登录词挖掘步骤设计和处理方法的讨论。
In recent years, with the rapid development of social economy, scientific technology and the Internet, a lot of unlisted words appear on economic Webs. The existence of unlisted words seriously affects the accuracy and speed of automatic segmentation of Chinese words. This paper conducts a mining test for unlisted words in the key part of 1000 homepages of economic Webs, that is, rifles, abstracts, keywords and first paragraphs with the emphasis on the design of the mining steps and the processing approaches.
出处
《情报理论与实践》
CSSCI
北大核心
2005年第5期478-481,共4页
Information Studies:Theory & Application