摘要
针对NaiveBayes方法中条件独立性假设常常与实际相违背的情况,提出了CLIF-NB文本分类学习方法,利用互信息理论,计算特征属性之间的最大相关性概率,用变量集组合替代线性不可分属性,改善条件独立性假设的限制,并通过学习一系列分类器,缩小训练集中的分类错误,综合得出分类准确率较高的CLIF-NB分类器.
The method of CLIF_NB text classification learning based on Naive Bayes is proposed. To solve the problem that independence hypothesis is not coincident with the actual situation in Naive Bayes classification method, the paper uses the theory of mutual information, and calculate the maximum relative probability during training the text feature properties, and import variables set to combine and replace line inseparable attributes. So the method can improve the limit of conditional independence hypothesis, and also decrease the classification errors in training dataset by learning from a series of classifiers, high accuracy CLIF_NB classifying model can be gained.
出处
《小型微型计算机系统》
CSCD
北大核心
2005年第9期1575-1577,共3页
Journal of Chinese Computer Systems
基金
国家"九七三"重点基础研究项目(G1998030414)资助