摘要
支持向量机(SVM)方法通过核函数进行空间映射并构造最优分类超平面解决分类器的构造问题,该方法在文本自动分类应用中具有明显优势.XML文档是文本内容信息与结构信息的综合体,作为一种新的数据形式,成为当前的研究热点.文中以结构链接向量模型为基础,研究了基于支持向量机的XML文档自动分类方法,提出了适合XML文档分类的核函数及其参数的学习方法,从而将XML文档的结构分析与内容分析有机地结合起来.在INEX数据集上的测试结果表明,该方法的分类准确性明显高于INEX评测中所公布各方法的评测结果.
The Support Vector Machines(SVM) construct best hyper-plane for classification by space map via kernel function.The SVM is one of best methods for document classification.The XML document as a new data model contains structure information and content information.Based on the Structured Link Vector Model(SLVM),Support Vector Machines for XML document classification was studies and the kernel function suitable to XML document classification and being trained based on support vector machine(SVM)'s regression is proposed in the paper,which effectively integrates the structural information and content information.For performance evaluation,the authors apply the method on INEX dataset.The experiment's results show that the XML document classification method based on the kernel method outperform significantly the methods published by INEX.
出处
《计算机学报》
EI
CSCD
北大核心
2011年第2期353-359,共7页
Chinese Journal of Computers
基金
国家自然科学基金(60642001
60875033)
国家"八六三"高技术研究发展计划项目基金(2008AA01Z421)资助
关键词
XML文档
文档分类
核函数
支持向量机
文档模型
XML document
document classification
kernel method
support vector machines
document model