摘要
区别于传统的基于词的中文短文本自动分类方法,以训练数据作为背景语料,利用关联规则挖掘算法挖掘训练集文本中的共现关系,创建特征共现集作为扩展词表。用特征共现集分别对训练文本和测试文本进行特征扩展建立短文本分类模型。实验表明,改进后的两种方法使短文本分类系统具有较高的精度。
Different from the conventional automatic classification method of Chinese short texts that based on word, makes the training data as background corpus and uses association rules mining algo- rithm mining "the co-occurrence relationship of training set text, creates the co-occurrence set of feature as extension vocabulary, and uses it making feature extension of training text and test text respectively, establishes a short text classification model. Experimental result shows that the two improved methods make short text classfication system with higher accuracy.
出处
《现代计算机》
2010年第7期28-31,共4页
Modern Computer
关键词
短文本分类
共现关系
特征扩展
Short Text Classification
Co-Occurrence Relationship
Feature Extension