期刊文献+

结合语言模型的自动文本分类的应用研究

Application Study of Automatic Text Classification Combined with Language Model
在线阅读 下载PDF
导出
摘要 研究统计语言模型中bigram模型在自动文本分类中的应用,针对传统的向量空间模型在计算文本相似度时假设特征项相互独立的缺点,提出一种利用词对及词序信息来改善文本分类结果的方法。实验结果表明该方法是可行且有效的。 Tiffs paper studies the application of bigram model from statistical language model in the automatic text classification. Referring to the shortcoming of the hypothesis that the terms are independent from each other in VSM (Vector Space Model), it puts forward a method to improve the result of text classification with mutual words' information and sequence. The experiment shows that the method is feasible and efficient.
作者 赵敏涯
出处 《计算机与现代化》 2010年第3期141-143,共3页 Computer and Modernization
关键词 统计语言模型 文本分类 平滑 bigram statistical language model text classification smoothing bigram
  • 相关文献

参考文献15

  • 1孙丽华,张积东,李静梅.一种改进的kNN方法及其在文本分类中的应用[J].应用科技,2002,29(2):25-27. 被引量:36
  • 2Joaehims T. A probabilistic analysis of the Roeehio algorithm with TFIDF for text categorization [ C ]//Proceedings of the Fourteenth International Conference on Machine Learning. 1997 : 143-151.
  • 3Mladenic D. Machine Learning on Non-homogeneous, Distributed Text Data Mining[ D ]. Doctoral Dissertation:University of Ljubljana, 1998.
  • 4Rosenfeld R. A maximum entropy to adaptive statistical language learning[ J ]. Computer Speech and Language, 1996, 10( 3 ) : 187-228.
  • 5徐望,王炳锡.N-gram语言模型中的插值平滑技术研究[J].信息工程大学学报,2002,3(4):13-15. 被引量:13
  • 6Yang Y,Pederson J O. A comparative study on feature selection in text categorization [ C]//Proceedings of the Fourteenth International Conference on Machine Learning. 1997,412-420.
  • 7Woosung Kim, Sanjeev Khudanpur. Smoothing issues in the structured language model [ C]//Proc. 7th European Conf on Speech Communication and Technology. 2001:717-720.
  • 8Kneser R, Ney H. hnproved backing-off for m-gram language modeling[ C]//Proc. ICASSP'95. 1995:181-184.
  • 9Stanley F. Chen, Joshua Goodman. An empirical study of smoothing techniques for language modeling[ C ]//Proceedings of tile 34th Annual Meeting on Association for Computational Linguistics. 1998 : 310-318.
  • 10张敬芝,高强,耿桦,潘金贵.统计自然语言处理中的线性插值平滑技术[J].计算机科学,2007,34(6):223-225. 被引量:5

二级参考文献76

共引文献105

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部