摘要
介绍两种基于统计的自动分类技术(朴素贝叶斯分类器、支持向量机分类器),剖析了基于统计的自动分类的优势及不足。基于统计的自动分类的不足主要表现为:当类别之间分类特征的交叉变大时,分类精度呈下降趋势,在多层分类的情况下,此局限尤为突出。针对此局限性,为了提高自动分类的精度,我们引入了基于规则的自动分类来对其进行改进和扩充,并整合两种自动分类技术的优点,设计出了混合分类器系统,应用于铁路运输信息系统,进行分类分析,从而获得了比较理想的分类效果。
The technique of data automatic classification is to classify data into one or more classes according to certain strategy.This paper firstly reports two kinds of technique of data automatic category based on statistics(austerity Bayes classifier and supporting vector machine classifier),and analyses their advantages and disadvantages. The weakness of statistics-based automatic category is that the category precision decreases while the character intersection within classes increases, especially in the case of multi-layers classifying. In order to improve automatic category performance, rule-based automatic category is used.Combining statistics-based category with rule-based classifying method, this paper designs and realizes a system of mixing category lastly applied to TMIS, which has very good performance in category.
出处
《信息技术》
2005年第7期70-73,共4页
Information Technology
关键词
信息处理
数据挖掘
数据分类
规则分类
information processing
data mining
data classification
rule-based classifying