摘要
本文探讨了面向汉语信息处理的词语义项区分应该遵守的原则和方法。先界定了作为计算机自动词义消歧对象的多义词的范围;然后指出面对大规模真实文本,词语义项区分应具有可操作性,即应具有完备性和离散性;最后论证了上下文语境是计算机区分词语意义的最终凭借,信息处理用词语义项区分的依据应主要来自词语的句法行为。
This paper reviews the principles and methods that should be followed by sense discrimination for Chinese language processing. The range of polysemous words as the object of the computer automatic word sense disambiguation is delimited. The sense discrimination should be operationalized when processing the large natural texts, that is, word senses should be completed and discrete. Also discriminating the word senses should mainly rely on the word syntactic behavior on the context.
出处
《语言文字应用》
CSSCI
北大核心
2006年第2期126-133,共8页
Applied Linguistics
基金
国家973项目(2004CB318102)
中国博士后科学基金(2004035029)
863项目(2001AA114210
2002AA117010)的支持
关键词
义项
多义词
词义标注
词义消歧
语料库
sense
polysemous words
word sense tagging
word sense disambiguation
corpus