摘要
文本分类是在给定的分类体系下,根据文本的内容自动确定文本类别的过程。在文本分类中,特征的提取对于分类的结果相当重要。从特征提取这一阶段出发,提出了一个集成合并的特征提取方法,该方法主要集成多种特征提取方法并合并关系密切的特征,并利用支持向量机SVM(Support Vector Machine)分类的高准确率,能够求出全局最优方法等优点来对得到的特征向量进行分类评估。实验证明,此种特征提取能够降低分类时间和提高分类的准确率。
Text categorization is the process that determines the category of the given text depends on its contents automatically. In text categorization, the feature selection is a very important process. So from the stage of feather selection, we post a feature selection method with integration and combination that assembles main methods for feature selection and gathers the correlated similar features together. At last we use the SVM ( Support Vector machine) to classify and evaluate the feature vectors we get, whose advantages are that of high accuracy and of getting best point by global optimized approach in whole space. In our experiment, we can reduce the classification time and get higher accuracy of classification using our selection method with SVM.
出处
《计算机应用与软件》
CSCD
北大核心
2008年第10期212-213,233,共3页
Computer Applications and Software