摘要
支持向量机对数据的学习往往因为规模过大造成学习困难,增量学习通过把数据集分割成历史样本集和新增样本集,利用历史样本集的几何分布信息,通过定义样本的遗忘因子,提取历史样本集中的那些可能成为支持向量的边界向量进行初始训练。在增量学习过程中对学习样本的知识进行积累,有选择地淘汰学习样本。实验结果表明,该算法在保证学习的精度和推广能力的同时,提高了训练速度,适合于大规模分类和在线学习问题。
To learn for large scale datasets is difficult using support vector machine. Datasets are divided into history dataset and incremental datasets, a new algorithm based on the geometrical knowledge of history samples is presented. Firstly, the border vectors of history samples are selected by redefining forgetting factor of sample, and then SVM is fast trained by these border vectors. Secondly, all samples' knowledge is accumulated and some samples is discarded effectively in the incremental learning process. The numerical experiments on benchmark datasets show that the proposed algorithm is considerably faster than the standard SVM and the classical incremental algorithm.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第1期161-163,171,共4页
Computer Engineering and Design
基金
国家自然科学基金项目(70601033
10771213)
关键词
支持向量机
增量学习
边界向量
遗忘因子
核函数
support vector machine
incremental learning
border vector
forgetting factor
kernel function