摘要
为了对大规模训练样本进行缩减,提出了k近邻向量,给出了一种新的样本差异度的计量方法,证明了该差异度关于噪声识别和类边界距离的几个性质。依据此性质提出了一个高效的SVM训练样本缩减算法,算法首先根据样本差异度的性质剔除噪声样本,然后用类间差异度近似表示类边界距离,结合样本相似性,直接从原始样本空间剔除次要的训练样本。仿真结果表明,减样算法可以有效缩减样本,提高训练效率。
To reduce large-scale training sample set, the concept of k-nearest vectors is proposed, and a new account method for dissimilarity is given accordingly. Then, the paper proposes and proves the methods of noise identification and boundaries distance description. Based on these methods, an efficient sample reduction algorithm is proposed. The algorithm removes noise samples according to the dissimilarity at first step, then according to the similarity of samples, and the dissimilarity which describes the distance between sample and classification boundary, the algorithm removes minor training samples from the original sample space directly. Experiments indicate that the reduction algorithm can effectively reduce the sample, and improve the training efficiency.
出处
《计算机工程与应用》
CSCD
2012年第7期20-22,共3页
Computer Engineering and Applications
基金
国家自然科学基金(No.61005010)
安徽省高校省级自然基金(No.KJ2012B149)
合肥学院人才科研基金(No.11RC06)
关键词
大规模样本集
减样
去噪
支持向量机
样本差异度
large-scale sample set
samples reduction
de-noising
support vector machine
sample dissimilarity