期刊文献+

组合降维技术在中文网页分类中的应用 被引量:3

Web page categorization based on LSA and features selection
在线阅读 下载PDF
导出
摘要 基于向量空间模型的文本分类中特征向量是极度稀疏的高维向量,只有降低向量空间维数才能提高分类效率。在利用统计方法选择文本分类特征降低特征空间维数的基础上,采用隐含语义分析技术,挖掘文档特征间的语义信息,利用矩阵奇异值分解理论进一步降低了特征空间维数。实验结果表明分类结果宏平均F1约提高了5%,验证了该方法的有效性。 The feature vector of Chinese Web page is high dimension and very sparse for text categorization.How to reduce the dimensionality of feature space is a very key problem for practical text classification.In this paper a new method is described.The approach is to take advantage of latent semantic analysis and feature selection that use statistical methods.The K-Nearest Neighbor method is selected as the evaluating classifiers.The experimental result shows that the proposed method for Chinese Web page categorization to be promising.
作者 李新福
出处 《计算机工程与应用》 CSCD 北大核心 2007年第24期169-171,共3页 Computer Engineering and Applications
基金 河北省自然科学基金(the Natural Science Foundation of Hebei Province Grant No.F2006001020) 河北省教育厅科学基金(the Founda-tion of Education Bureau of Hebei Province Grant No.2005347) 河北大学科学基金(the Fundation of Hebei University Grant No.Y2004045)
关键词 网页分类隐含语义分析特征选择KNN Web Page categorization latent semantic analysis feature selection KNN
  • 相关文献

参考文献10

  • 1申红,吕宝粮,内山将夫,井佐原均.文本分类的特征提取方法比较与改进[J].计算机仿真,2006,23(3):222-224. 被引量:28
  • 2代六玲,黄河燕,陈肇雄.中文文本分类中特征抽取方法的比较研究[J].中文信息学报,2004,18(1):26-32. 被引量:230
  • 3周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:166
  • 4Constantine Kotropoulos,Athanasios Papaioannou.A novel updating scheme for probabilistic latent semantic indexing[C]//LNCS 3955:Lecture Notes in Artificial Intelligence:2006:137-147.
  • 5Marin Simina,Costin Barbu.Meta latent semantic analysis[C]//2004IEEE Interantional Conference on Systems,Man & Cybernetics,2004(4):3720-3724.
  • 6何伟.LSI潜在语义信息检索模型[J].数学的实践与认识,2003,33(9):1-10. 被引量:9
  • 7Schutze H,Hull D,Pedersen J O.A comparison of Classifiers and document representations for the routing problem[C]//SIGIR Conference in Research and Development in Information Retrieval,1995,229-237.
  • 8Zhang Hao,Berg A C,Maire M,et al.SVM-KNN:discriminative nearest neighbor classification for visual cate gory recognition[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2006:2126-2136.
  • 9Chiang Jung-Hsien,Chen Yan-Cheng.Hierarchical fuzzy-KNN networks for news documents categorization[C]//10th IEEE International Conference on Fuzzy Systems,2001 (2):720-723.
  • 10Renato Fernandes Correa,Teresa Bernarda Ludermir.Web documents categorization using neural networks[C]//LNCS 3316:Neural Information Processing,2004:758-762.

二级参考文献27

  • 1黄昌宁 等.对自动分词的反思[A]..语言计算与基于内容的文本处理[C].北京:清华大学出版社,2003,7.26-38.
  • 2Golub G, Loan V Van. Matrix Computations[M]. 3rd ed. The Johns Hopkins University Press, Baltimore, MD,1996.
  • 3Mirsky L. Symmetric gage functions and unitarilly invariant norm[J]. Q J Math, 1960,11:50-59.
  • 4Michael Berry, Jack Dongarra. Atlanta organizers put mathematics to work for the math sciences community[J].SIAM News, 1999,32 : 10-11.
  • 5Scott Deerwester, Susan T Dumais, George W Furnas, Thomas K Landauer, Richard harshman. Indexing by latent semantic analysis[J]. J of the Amer Soc for Inform Sci, 1990,41:391-407.
  • 6Dumais S T. Improving the retrieval of information from external sources[J]. Behavior Res Meth & Comp, 1991,23:229-236.
  • 7Salton G, Buckley C. Improving retrieval performance by relevance feedback[J]. J Amer Soc for Inform Sci, 1990,41:288-297.
  • 8Michael W Berry, Zlatko Drmac, Elizabeth R Jessup. Matrices, vector spaces, and information retrieval[J].SIAM Rev, 1999,41:335-362.
  • 9Yang Yiming,Pederson J O.A Comparative Study on Feature Selection in Text Categorization [A].Proceedings of the 14th International Conference on Machine learning[C].Nashville:Morgan Kaufmann,1997:412-420.
  • 10Y.Yang.Noise reduction in a statistical approach to text categorization[A].Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR95)[C].Seattle:ACM Press,1995:256-263.

共引文献398

同被引文献73

引证文献3

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部