摘要
决定集成学习性能的主要因素是集成的个体学习器之间的差异性.使用聚类技术来加速AdaBoost.在不同噪声水平环境下,新算法的性能都接近AdaBoost.对AdaBoost噪声敏感问题提出了新的解决思路,使用该项技术可以实现快速的噪声探测和噪声剔除后的再学习,从而在对含噪声数据基进行处理时,在综合性能和效率上都明显优于AdaBoost.
According as the main factor deciding the performance of ensemble learning is the diversity of component learners, clustering technology is used to speed up AdaBoost in this paper. The performance of the new algorithm is very close to the AdaBoost in learning deferent noise levels data sets. The new algorithm can be used to detect and eliminate noisy data quickly, and could achieve rapid learning on data sets after eliminating noise. It overcomes the noise-sensitive shortcoming of AdaBoost. The general performance and efficiency of the new algorithm are much better than AdaBoost in processing data sets containing noise.
出处
《软件学报》
EI
CSCD
北大核心
2010年第8期1889-1897,共9页
Journal of Software
基金
国家自然科学基金No.60632050~~