摘要
本文通过介绍和评论基于统计的语言处理模型的工作原理和有关的应用实例 ,从语言学理论的角度来说明统计模型的局限性。通过讨论N元语法模型及其在词类标注上的应用 ,展示基于统计的语言处理模型的工作原理及其应用。讨论了语言结构的递归性特点和语言学知识的结构依赖性特点 ,指出递归嵌入会使得统计规律被任意数目的嵌入词语打乱 。
This paper demonstrates the limitations of the statistically based natural language processing (NLP) models in the perspective of linguistic theory by introducing and commenting the mechanism of the statistical language models (SLM) and their applying cases. Firstly, it introduces the studies of the statistical structure of language under the influence of information theory, especially Chomsky's demonstration that finite state grammar (FSG) based on Markov process is not suited to description of natural language. Then, it reveals mechanism and possible applying fields of SLM by discussing N gramm and its applying in parts of speech tagging. It discusses the recursion property of linguistic structure and the structure dependent property of linguistic knowledge, and argues that recursive nested constructions would upset the statistic regularity and the structure dependent property of linguistic knowledge would make the independence assumption, whereby SLM can be realized, lose effectiveness. Finally, it suggests that the right track of NLP may be integration of rule based approach and statistics based approach, because natural language is a miscellaneous system.
出处
《语言文字应用》
CSSCI
北大核心
2004年第2期99-108,共10页
Applied Linguistics
基金
教育部"跨世纪优秀人才培养计划"基金资助
教育部"十五"规划项目 (0 1JB74 0 0 0 6 )基金资助
关键词
统计模型
有限状态语法
马尔科夫过程
递归性
结构依赖性
language processing
statistical models
finite state grammar
Markov process
recursion
structure dependent property