摘要
将文本自动分类技术应用于图书书目的自动分类中,利用ICTCLAS分词系统对书名和摘要信息进行中文分词,为标题和摘要的特征词赋予不同的权重。在构建基于文本特征矩阵的基础上,结合SVM算法对实验语料进行学习和测试。为了验证TFIDF权重对分类结果的影响,还对词频特征矩阵、TFIDF特征矩阵和混合特征矩阵进行测试和对比。实验证明,基于混合特征矩阵的SVM算法具有良好的分类效果。据此,构建基于SVM的书目自动分类系统。
This paper applies texts automatic classification to bibliographies automatic classification. Firstly, books' names and ab- stracts are segmented by ICTCLAS Chinese segmentation system, and the characters of names and abstracts are endowed with different weights. Secondly, the authors learn and test the experiment data by support vector machine(SVM) algorithm based on character ma- trix of text. At last, term frequency character matrix, TDIDF character matrix and mix character matrix are tested and compared for val- idating the influence of TFIDF weights on the results of automatic classification. The conclusion is that the SVM algorithm assisted by TFIDF weights is every effective. At last, the authors design an automatic classification system on the base of SVM technology.
出处
《图书情报工作》
CSSCI
北大核心
2012年第9期114-119,共6页
Library and Information Service
关键词
机器学习
支持向量机
自动分类
TFIDF
machine learning support vector machine automatic classification TFIDF