期刊文献+

多层次web文本分类 被引量:12

Multi-hierarchial Classification of Web Text
在线阅读 下载PDF
导出
摘要 传统的文本分类大多基于向量空间,分类体系为甲面体系,忽视了类别间的层次关系.根据LSA理论提出了一种多层次web文本分类方法.建立类模型时,根据类别的层次关系树由下到上逐层为具有相同父节点的类别建立一个类模型;分类时,由上到下,根据相应的类模型存LS空间上分类.这种分类方法解决了LSA模型中高维矩阵难以进行奇异值分解的问题.同时体现了web文本中词条的语义关系,注重了词条在网页中的表现形式.实验表明,多层次web文本分类方法比基于平面分类体系的分类方法在查全率和准确率方面要好. The traditional text classifications are mostly based on the vectorial space, and the structure of classification is flat structure. These methods ignore the structural relationships among the categories. This text put forward a kind of multi-hierarchy web text classification according to LSA theory. This method set up a classifier for nodes that have the same father node from leaves to root according to classification tree. And it classifies a new web text according to the corresponding classifier in LS space from root to leaves. This method solved a flaw of LSA model. This flaw is that it is difficult to execute singular value decomposition for a large sparse matrix. This method not only reflects the semantic relationships of the terms in web text but also pays attention to the expressive form of terms in the webpage. Experiments show such multi-hierarchy web text classification method is more accurate than some methods which based on fiat structure.
出处 《情报学报》 CSSCI 北大核心 2005年第6期684-689,共6页 Journal of the China Society for Scientific and Technical Information
基金 浙江省自然科学基金
关键词 文本分类 网页净化 LSA LS空间 text classification, pape cleaning, LSA, LS space.
  • 相关文献

参考文献11

  • 1Dumais ST, et al. Using latent semantic analysis to improve information retrieval. CHT$8 Proceedings, 1988,281-285.
  • 2S. Dumains, H. Chen. Hierarchical Classification of Web Content. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval,2000, 256 - 263.
  • 3S.D. Alessio, K. Murray, R. Schiaffino, A. Kershenbaum. The Effect of Using Hierarchical Classifiers in Text Categorization.Proceeding of RIAO-OO, 6th International Conference "Recherche d'Information Assistee par Ordinateur", 2000,302-313.
  • 4Chen, J., Zhou, B., Shi, J., Zhang, H.-J., Qiu, F. Function-Based Object Model Towards Website Adaptation. Procrrdings of the 10th World Wide Web conference,2001,587-596.
  • 5Kovaceivic, M., Diligenti, M., Gori, M., Milutinovic, V..Recognition of Common Areas in a Web Page Using Visual Information: a possible application in a page classification.Proceedings of 2002 IEEE International Conference on Data Mining( ICDM'02), 2002,250.
  • 6Yu, S. , Cai, D. , Wen, J.-R., Ma, W.-Y.. Improving Pseudo-Relevance Feedback in Web Information retrieval Using Web Page Segmentation. Proceedings of twelfth World Wide Web Conference( WWW 2003 ), 2003,11 - 18.
  • 7Lan Yi, Bing Liu, Xiaoli Li. Eliminating Noisy Information in Web Pages for Data Ming. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003,296 - 305.
  • 8Burges CJC. A tutorial on support vector machines for pattern recognition. Knowledge Discovery and Data Mining, 1998,2(2):955 - 974.
  • 9Lewis D D et al. Training algorithms for linear text classifiers. In Proceedings of the Nineteenth International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996, 298 - 306.
  • 10Aixin Sun, Ee-Peng Lim, Wee-Keong Ng, Jaideep Srivastava.Blocking Reduction Strategies in Hierarchical Text Classification. IEEE Transactions on Knowledge and Data Engineering,2004,10(16).

同被引文献211

引证文献12

二级引证文献51

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部