期刊文献+

中文短文本分类方法研究 被引量:5

Research on Chinese Short Text Classification Method
在线阅读 下载PDF
导出
摘要 区别于传统的基于词的中文短文本自动分类方法,以训练数据作为背景语料,利用关联规则挖掘算法挖掘训练集文本中的共现关系,创建特征共现集作为扩展词表。用特征共现集分别对训练文本和测试文本进行特征扩展建立短文本分类模型。实验表明,改进后的两种方法使短文本分类系统具有较高的精度。 Different from the conventional automatic classification method of Chinese short texts that based on word, makes the training data as background corpus and uses association rules mining algo- rithm mining "the co-occurrence relationship of training set text, creates the co-occurrence set of feature as extension vocabulary, and uses it making feature extension of training text and test text respectively, establishes a short text classification model. Experimental result shows that the two improved methods make short text classfication system with higher accuracy.
出处 《现代计算机》 2010年第7期28-31,共4页 Modern Computer
关键词 短文本分类 共现关系 特征扩展 Short Text Classification Co-Occurrence Relationship Feature Extension
  • 相关文献

参考文献7

  • 12008年第二次手机短信息状况调查报告[EB/OL].http://www.12321.cn/viewnews.php?id=10753.
  • 2Healy,M Delany,S,Zamolotskikh,A.An Assessment of Case Base Reasoning for Short Text Message Classification[C].In:Norman Creaney (ed.) Proceedings of the 16th Irish Conference on Artificial Intelligence & Cognitive Science (AICS'05),257-266,2005.
  • 3Zelikovitz,S,Marquez,F.Transductive Learning for Short-Text Classification Problems using Latent Semantic Indexing[J].International Journal of Pattern Recognition and Artificial Intelligence,Vol.19(2),143-163,2005.
  • 4Zelikovitz,S.Transductive LSI for Short Text Classification Problems[C].In:Proceedings of the 17th International Flairs Conference,556-561,2004.
  • 5Han Jia-wei,Pei Jian,Yin Yi-wen.Minning Frequent Patterns Without Candidate Generation[C].In:Chen Wei-dong,Jeffrey F M,Philip A B.Proceedings of the 2000 ACM Sigmod Internal Conference on Management of Data.Dallas,Texas:ACM Press,2000.1-12.
  • 6王元珍,钱铁云,冯小年.基于关联规则挖掘的中文文本自动分类[J].小型微型计算机系统,2005,26(8):1380-1383. 被引量:13
  • 7中文停用词表[EB/OL].http://download.csdn.net/source.

二级参考文献12

共引文献12

同被引文献52

  • 1王细薇,樊兴华,赵军.一种基于特征扩展的中文短文本分类方法[J].计算机应用,2009,29(3):843-845. 被引量:36
  • 2王元珍,钱铁云,冯小年.基于关联规则挖掘的中文文本自动分类[J].小型微型计算机系统,2005,26(8):1380-1383. 被引量:13
  • 3郑海清,林琛,牛军钰.一种基于紧密度的半监督文本分类方法[J].中文信息学报,2007,21(3):54-60. 被引量:11
  • 4Metaler D, I)umais S C, Meek C. Similarity Measures for Short Segments of Text[ C ]. In : Proceedings of the 29th European Con- ference on Information Retrieval. Berlin : Springer - Verlag, 2007.
  • 5Sahami M, Heilman T D. A Web -based Kernel Function for Measuring the Similarity of Short Text Snippets [ C ]. In : Proceed- ings of the 15th International World Wide Web Conference Committee (1W3C2) , Edinburgh, Scotland. New York: ACM Press, 2006: 377 - 386.
  • 6Hynek J, Jezek K, Rohlik O. Short Document Categorization - Itemsets Method[ C ]. In : Proceedings of the 4th European Confer- ence on Principles and Practice of Knowledge Discovery in Databas- es, Workshop Machine Learning and Textual luformation Access, Lyon, France. 2000 : 14 - 19.
  • 7Zelikovitz S, Transductive M F. Learning for Short - Text Classifi- cation Problem Using Latent Semantic Indexing Intematiotaal [ J ]. Journal of Pattern Recognition and Artificial Intelligence, 2005, 19 (2) :143 - 163.
  • 8Wang P, Domeniconi C. Building Semantic Kernels for Text Classi- fication Using Wikipedia [ C ]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada,USA. ACM :New York ,2008:713 - 721.
  • 9Wikipedia[ EB/OL]. [2011 - 12 - 08 ]. http://zh, wikipedia. org.
  • 10I ; Saltort G, McGillM J. Introduction to Modern Information Retrieval [M]. New York, NY, USA:McGraw Hill, 1983.

引证文献5

二级引证文献53

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部