摘要
文本分类是信息检索和文本挖掘的关键技术之一。提出了一种基于支持向量数据描述(SVDD)的多类文本分类算法,用支持向量描述训练求得包围各类样本的最小超球体,并使得分类间隔最大化,在测试阶段,引入基于核空间k-近邻平均距离的判别准则,判断样本所属类别。实验结果表明,该方法具有很好的泛化能力和很好的时间性能。
Text categorization is one of the key technology to retrieve information and mine text. This paper proposes a multi-class text categorization algorithm based on maximal classification margin SVDD( Support Vector Data Description) . This algorithm trains multi-class samples with support vector data description, then computes a minimal super spherical structure which can surround all samples and has maximal margin between each class. In the phase of testing,this algorithm classifies samples with a criterion of average dis-tance based on KNN( K-Nearest Neighbor) . The test result shows this algorithm has good generalization capability and good time efficiency of text categorization.
出处
《电讯技术》
北大核心
2014年第4期496-499,共4页
Telecommunication Engineering
关键词
信息检索
文本挖掘
文本分类
支持向量数据描述
多类分类器
information retrieving
text mining
text categorization
support vector data description(SVD)
multi-class classifier