摘要
针对支持向量机中两类不平衡数据的分离超平面的偏移问题提出一种校正方法:先对两类样本数据在核空间中进行核主成分分析,分别求出两类样本数据的在特征空间中的主要特征值;然后根据两样本容量以及各自的特征值所提供的信息,对两类数据给出惩罚因子比例;最后通过优化训练产生一个新的分离超平面。该分类面校正了标准支持向量机的分类误差,与标准的支持向量机相比,该方法不仅平衡了错分率,同时还能减少错分率。实验结果验证了方法的有效性。
A revision method was proposed for the offset of separation hyperplane of binary-classification imbalaneed data in Support Vector Machine (SVM). Firstly, the principal values were found respectively of the two classes of samples in feature space by using Kernel Principal Component Analysis (KPCA). Secondly, one penalty proportion was given based on the information provided by the sizes of the two sample data and their values. Finally, a new separation hyperplane was generated through the optimization training. The hyperplane revised the error of the standard support vector machines. Experiment results prove the validity of the method. Compared with standard support vector machines, the proposed method can not only balance but also decrease the classification error.
出处
《计算机应用》
CSCD
北大核心
2007年第12期2896-2898,共3页
journal of Computer Applications
基金
国家自然科学基金资助项目(60574075)
关键词
不平衡数据
核主成分分析
支持向量机
偏移
imbalanced data
kernel principal component analysis
Support Vector Machines (SVM)
offset