摘要
提出一个文本分类器性能评价模型,对文本分类结果的可信度进行了估计,给出计算可信度的公式。将每一个子分类器的可信度指标用于Bagging集成学习算法,得到了改进的基于子分类器性能评价的Bagging算法(PBagging)。应用支持向量机作为子分类器基本模型,对日本共同社大样本新闻集进行分类。实验表明,与Bagging算法相比,PBagging算法分类准确率有了明显提高。
This paper presents an evaluation model for the text classifier. The reliability of classifying result of a classifier is computed according to its learning result and naive Bayesian. Based on the performance evaluation model, Performance Bagging(PBagging), an improved text classification algorithm is proposed. In the algorithm, the reliability is served as the weight of sub-classifier's result when using Bagging, an ensemble learning method. Using SVM as the sub-classifier model, it applies the PBagging algorithm to classify news corpus in kyodo news agent, the result shows that PBagging performs better than Bagging with more accuracy.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第1期61-63,共3页
Computer Engineering