期刊文献+

基于核方法的XML文档自动分类 被引量:14

XML Document Classification Based on Kernel Method
在线阅读 下载PDF
导出
摘要 支持向量机(SVM)方法通过核函数进行空间映射并构造最优分类超平面解决分类器的构造问题,该方法在文本自动分类应用中具有明显优势.XML文档是文本内容信息与结构信息的综合体,作为一种新的数据形式,成为当前的研究热点.文中以结构链接向量模型为基础,研究了基于支持向量机的XML文档自动分类方法,提出了适合XML文档分类的核函数及其参数的学习方法,从而将XML文档的结构分析与内容分析有机地结合起来.在INEX数据集上的测试结果表明,该方法的分类准确性明显高于INEX评测中所公布各方法的评测结果. The Support Vector Machines(SVM) construct best hyper-plane for classification by space map via kernel function.The SVM is one of best methods for document classification.The XML document as a new data model contains structure information and content information.Based on the Structured Link Vector Model(SLVM),Support Vector Machines for XML document classification was studies and the kernel function suitable to XML document classification and being trained based on support vector machine(SVM)'s regression is proposed in the paper,which effectively integrates the structural information and content information.For performance evaluation,the authors apply the method on INEX dataset.The experiment's results show that the XML document classification method based on the kernel method outperform significantly the methods published by INEX.
作者 杨建武
出处 《计算机学报》 EI CSCD 北大核心 2011年第2期353-359,共7页 Chinese Journal of Computers
基金 国家自然科学基金(60642001 60875033) 国家"八六三"高技术研究发展计划项目基金(2008AA01Z421)资助
关键词 XML文档 文档分类 核函数 支持向量机 文档模型 XML document document classification kernel method support vector machines document model
  • 相关文献

参考文献10

  • 1Kc M,Hagenbuchner M et al.XML document mining using contextual self-organizing maps for structures//Proceedings of the Initiative for the Evaluation of XML Retrieval(INEX'06).Schlosa Dagstuhl,Germany,2006:510-524.
  • 2Yang J W,Cheung W K,Chen X O.Integrating element kernel and term semantics for similarity-based XML document clustering//Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence(WI ' 05).Compiegne,France,2005:222-228YANG Jian-Wu,born in 1973,Ph.D.,associate professor.His research interests focus on text mining and information retrieval.
  • 3Yi J,Sundaresan N.A classifier for semi-structured documents//Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD'00).Boston,MA,USA,2000:340-344.
  • 4Vapnik V N.The Nature of Statistical Learning Theory.New York:Springer-Verlag,1995.
  • 5Salton G,McGill M J.Introduction to Modern Information Retrieval.New York:McGraw-Hill,1983.
  • 6Yong S L,Hagenbuchner M et al.XML document mining using graph neural network//Proceedinga of the Initiative for the Evaluation of XML Retrieval(INEX'06).Schloss Dagstuhl,Germany,2006:458-472.
  • 7Yang J W,Chen X O.A semi-structured document model for text mining.Journal of Computer Science and Technology,2002,17(5):603-610.
  • 8Yang Y,Liu X.A re-examination of text categorization methods//Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR' 99).Berkeley,USA,1999:42-49.
  • 9Zhang Z P,Li R,Cao S L,Zhu Y Y.Similarity metric for XML documents//Proceedings of the 2003 Workshop on Knowledge and Experience Management(FGWM'03).Karlsruhe,2003.
  • 10杨建武,陈晓鸥.A Semi—Structured Document Model for Text Mining[J].Journal of Computer Science & Technology,2002,17(5):603-610. 被引量:5

二级参考文献10

  • 1Bray T, Paoli J, Sperberg-McQueen C M. Extensible Markup Language (XML) 1.0. W3C Recommendation. World Wide Web Consortium, Feb. 1998. http://www.w3.org/TR/1998/REC-xml-19980210.
  • 2Chakrabarti S, Dom B, Indyk P. Enhanced hypertext categorization using hyperlinks. In Proc. ACM SIGMOD Conference, Seattle, Washington, 1998.
  • 3Damien Guillaume, Fionn Murtagh. Clustering of XML documents. Computer Physics Communications, 2000,(127): 215-227.
  • 4Jeonghee Yi, Neel Sundaresan. A classifier for semi-structured documents. In KDD 2000, 2000 Boston, MA USA.
  • 5Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. University of Minnesota,Technical Report #00-034 (2000). http://www.cs.umn.edu/tech_reports/
  • 6Gerard Salton, McGill M J. Introductionto Modern Information Retrieval. McGraw-Hill, 1983.
  • 7Gerard Salton, Chris Buckley. Term weighting approaches in automatic text retrieval. Technical Report 87-881,Cornell University, Computer Science Department, November, 1987.
  • 8Charles F Goldfarb, Paul Prescod. The XML Handbook. Prentice Hall, PTR, 1998
  • 9Papakonstantinou Y, Garcia-Molina H, Widom J. Object exchange across heterogeneous information sources. In Proceedings of the Eleventh International Conference on Data Engineering, Taipei, March, 1995, pp.251-260.
  • 10Bjorner Larsen, Chinatsu Aone. Fast and effective text mining using linear-time document clustering. In KDD-99,San Diego, California, 1999.

共引文献4

同被引文献115

引证文献14

二级引证文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部