期刊文献+

基于统计的语言处理模型的局限性 被引量:7

The Limitations of the Statistically-based NLP Models
原文传递
导出
摘要 本文通过介绍和评论基于统计的语言处理模型的工作原理和有关的应用实例 ,从语言学理论的角度来说明统计模型的局限性。通过讨论N元语法模型及其在词类标注上的应用 ,展示基于统计的语言处理模型的工作原理及其应用。讨论了语言结构的递归性特点和语言学知识的结构依赖性特点 ,指出递归嵌入会使得统计规律被任意数目的嵌入词语打乱 。 This paper demonstrates the limitations of the statistically based natural language processing (NLP) models in the perspective of linguistic theory by introducing and commenting the mechanism of the statistical language models (SLM) and their applying cases. Firstly, it introduces the studies of the statistical structure of language under the influence of information theory, especially Chomsky's demonstration that finite state grammar (FSG) based on Markov process is not suited to description of natural language. Then, it reveals mechanism and possible applying fields of SLM by discussing N gramm and its applying in parts of speech tagging. It discusses the recursion property of linguistic structure and the structure dependent property of linguistic knowledge, and argues that recursive nested constructions would upset the statistic regularity and the structure dependent property of linguistic knowledge would make the independence assumption, whereby SLM can be realized, lose effectiveness. Finally, it suggests that the right track of NLP may be integration of rule based approach and statistics based approach, because natural language is a miscellaneous system.
作者 袁毓林
机构地区 北京大学中文系
出处 《语言文字应用》 CSSCI 北大核心 2004年第2期99-108,共10页 Applied Linguistics
基金 教育部"跨世纪优秀人才培养计划"基金资助 教育部"十五"规划项目 (0 1JB74 0 0 0 6 )基金资助
关键词 统计模型 有限状态语法 马尔科夫过程 递归性 结构依赖性 language processing statistical models finite state grammar Markov process recursion structure dependent property
  • 相关文献

参考文献48

  • 1白栓虎 黄昌宁 夏莹主编.基于统计的汉语语料库词性自动标注的研究与实现[A].黄昌宁,夏莹主编.语言信息处理专论[C].清华大学出版社,1996..
  • 2范继淹.汉语语法结构的层次分析问题[A]..语法研究和探索(第1辑)[C].北京大学出版社,1983..
  • 3黄昌宁.统计语言模型能做什么?[J].语言文字应用,2002(1):77-84. 被引量:31
  • 4黄昌宁 李涓子.语料库语言学[M].北京:商务印书馆,2002..
  • 5Chatman, Seymour. Immediate Constructions and Expansion Analysis. Word, 1955 ,Vol. 11.
  • 6Chomsky, Noam. Three Models for the Description of Language. IRE Transactions on Information Theory, 1956, IT-2.
  • 7Chomsky,Noam.Syntactic Structure(句法结构),邢公畹等译.中国社会科学出版社,1979.
  • 8Chomsky, Noam. On Cognitive Structures and Their Development: A Reply to Piaget. In Piattelli - Palmarini, Massimo (ed.) Language and Learning: The Debate between Jean Piaget and Noam Chomsky, Cambridge: Harvard University Press, 1980.
  • 9Collins, M and J. Brooks. Preposition Phrase Attachment through a Backed-off Model. in Proceedings of the 3rd WVLC, Cambridge, MA,1995.
  • 10Corder, S. Pit. Introducing Applied Linguistics. Penguin, 1979.

二级参考文献8

  • 1黄昌宁.关于处理大规模真实文本的谈话[J].语言文字应用,1993(2):1-10. 被引量:25
  • 2夸克等.英语语法大全[M].华东师范大学出版社,1988.
  • 3白拴虎.汉语词性自动标注系统研究[D].清华大学计算机科学与技术系硕士学位论文,1992.
  • 4Collins, M. and Brooks, J. Preposition phrase attachment through a backed-off model. In: Proceedings of the 3rd WVLC, Cambridge, MA, 1995.
  • 5Schank, R., and Abelson, R. Scripts, Plans, Goals and Understanding: An Inquiry into Human Knowledge Structures. Hillsdale: Lawrence Erlbaum Associates, Publishers, 1977.
  • 6Rich, Elaine. Artificial Intelligence. London: McGraw-Hill Book Company, 1983,295--344.
  • 7In: Artificial Intelligence at MIT: Expending Frontiers, Vol.1. Winston, P. H., and Shellard, S.A. (eds.). Cambridge, Mass: MIT Press, 1990.
  • 8Garside, R., Leech, G. and Sampson, G. (eds.). The Computational Analysis of English: A Corpus-Based Approach. London: Longman, 1989.

共引文献86

同被引文献82

引证文献7

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部