摘要
当前已有的数据流分类模型都需要大量已标记样本来进行训练,但在实际应用中,对大量样本标记的成本相对较高。针对此问题,提出了一种基于半监督学习的数据流混合集成分类算法SMEClass,选用混合模式来组织基础分类器,用K个决策树分类器投票表决为未标记数据添加标记,以提高数据类标的置信度,增强集成分类器的准确度,同时加入一个贝叶斯分类器来有效减少标记过程中产生的噪音数据。实验结果显示,SMEClass算法与最新基于半监督学习的集成分类算法相比,其准确率有所提高,在运行时间和抗噪能力方面有明显优势。
The existing data stream classification algorithms require a large number of labeled data samples for training.But in prac-tical applications,the cost of labeling vast data is quite high.As for this problem, this paper proposed a data stream mixture ensem-ble classification algorithm based on semi-supervised learning-SMEClass that uses mixed mode to organize the base classifier. Firstly,using K C4.5 classifiers label the unlabeled data with the majority vote , which improves the label confidence of data and enhances the accuracy of ensemble classifier.What’s more,algorithm joins a Na?ve Bayes classifier to effectively reduce the noise in the process of labeling data.The experimental results showed that the accuracy of SMEClass algorithm is high compared with the latest semi-supervised ensemble classification algorithm.Especially,the SMEClass algorithm have obvious superiority in run-ning time and anti-noise ability.
作者
任钊婷
王治和
杨晏
REN Zhao-ting,WANG Zhi-he,YANG Yan (School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China)
出处
《电脑知识与技术》
2013年第12期7770-7775,7781,共7页
Computer Knowledge and Technology
关键词
数据流
半监督学习
集成分类
概念漂移
混合集成
data stream
semi-supervised learning
ensemble classification
concept drifting
mixture ensemble