摘要
文本分类有助于用户有选择地阅读和处理海量文本 ,给出了基于示例的文本标题分类机制 .它以具有确定分类标准的标题分类为应用背景 ,在计算标题与分类主题词表直接匹配的基础上 ,利用基于分类树的上位概念匹配机制和基于潜在语义空间的相似度判定 ,综合评价文本标题与类别的相关关系 .其特点是充分利用上下文环境来确定标题与类别相关程度 ,而不是单纯地依赖于其共现信息 .
Text classification can help users selectively process huge volumes of texts in the Internet. Text title classification based on example texts is presented in this paper. It not only considers the direct matches between titles and the keyword sets of classes, but also takes into account the upper concept matches and semantic similarities. It uses vector space model as the representation for texts. It adopts the mechanism of indirect matches (upper concept matches), and calculates the similarities between texts and classes in a semantic space rather than term's space. As a result, it makes full use of the context ofKeywords instead of their frequencies, to determine the degree of correlation between keywords and classes.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2001年第9期1132-1136,共5页
Journal of Computer Research and Development
关键词
潜在语义索引
文本标题分类
示例
信息处理
计算机
text classification, latent semantic indexing, vector space model, title classification