摘要
支持向量机方法具有良好的分类准确率、稳定性与泛化性,在网络流量分类领域已有初步应用,但在面对大规模网络流量分类问题时却存在计算复杂度高、分类器训练速度慢的缺陷。为此,提出一种基于比特压缩的快速SVM方法,利用比特压缩算法对初始训练样本集进行聚合与压缩,建立具有权重信息的新样本集,在损失尽量少原始样本信息的前提下缩减样本集规模,进一步利用基于权重的SVM算法训练流量分类器。通过大规模样本集流量分类实验对比,快速SVM方法能在损失较少分类准确率的情况下,较大程度地缩减流量分类器的训练时间以及未知样本的预测时间,同时,在无过度压缩前提下,其分类准确率优于同等压缩比例下的随机取样SVM方法。本方法在保留SVM方法较好分类稳定性与泛化性能的同时,有效提升了其应对大规模流量分类问题的能力。
SVM has been applied for network traffic classification preliminarily because of its high classification accuracy, sta- bility and generalization. However, scaling up SVM to large-scale network traffic classification is still an open problem because of the high computation complexity as well as long training and prediction time. This paper proposed a hit-reduction based fast SVM. Firstly, it applied the bit-reduction algorithm to reduce the cardinality of the samples by weighting representative exam- ples, and reduced the scale of training dataset with minimum loss of initial sample information. Then it developed SVM trained on weighted samples. The experiment results of large-scale network traffic classification show that bit-reduction SVM produces a significant reduction in the time required for both classifier training and prediction of unknown samples with minimum loss in accuracy. Meanwhile, its results in more accurate classifiers than random sampling based SVM when the dataset are not overcompressed. This method scales up SYM to large-scale network traffic classification with retaining the stability and generalization performance of SVM.
出处
《计算机应用研究》
CSCD
北大核心
2012年第6期2301-2305,共5页
Application Research of Computers
基金
国家自然科学基金-广东省联合基金重点资助项目(U0935002)
广东省重大科技专项资助项目(2009A080207008)
广州市科技计划资助项目(2010Z1-D00061)
广东省高校优秀青年创新人才培养计划资助项目(LYM11057)
关键词
支持向量机
大规模流量分类
比特压缩
权重SVM
分类器
分类准确率
support vector machine (SVM)
large-scale network traffic classification
bit reduction
weighted SVM
classifi- er
classification accuracy