摘要
根据文本分类通常包含多异类数据源的特点,提出了多核SVM学习算法。该算法将分类核矩阵的二次组合重新表述成半无限规划,并说明其可以通过重复利用SVM来实现有效求解。实验结果表明,提出的算法可以用于数百个核的结合或者是数十万个样本的结合,对于多异类数据源的文本分类具有较高的查全率和查准率。
According to the feature of text classification which often involves multiple, heterogeneous data sources, this paper puts forward the algorithm of multiple kernel learning. It considers that conic combinations of kernel matrices for classification leads to a convex quadratically constraint quadratic program, and it can be efficiently solved by recycling the standard SVM implementations. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and it has higher recall rate and higher precision rate for classification of text email with multiple, heterogeneous data sources.
出处
《计算机工程》
CAS
CSCD
北大核心
2007年第9期196-198,共3页
Computer Engineering
基金
浙江省科技厅基金资助项目(2005D40089)
关键词
文本分类
SVM
多核学习
Text classification
SVM
Multiple kernel learning