摘要
针对文本分类中训练(测试)集获得较难、分类系统不合理的问题,我们构建了一个超大规模层级网页分类语料库。该语料库字段信息丰富,分类系统科学,存储格式可扩展性强、语义结构化。适合构建文本分类、话题识别和信息检索的大型训练(测试)集。
Aimming at the problem of training and test eorpus in text classing, we have built a super classed and denoted corpus, which has abundant field information, scientific class system, extensible storage format and structured semantic denotations. It adapts to the construction of training and test corpus for text classing,topic identify and IR.
出处
《现代图书情报技术》
CSSCI
北大核心
2006年第1期71-73,70,共4页
New Technology of Library and Information Service