摘要
在基于实例的机器翻译(EBMT)的语句相似度研究中,确定谓语中心词以把握句子的整体结构是至关重要的。以标注了谓语中心词的3000句汉语单句作为训练集,将候选词本身的语法属性以及上下文环境作为该候选词的归类特征,通过建立统计决策树模型获取谓语中心词的识别知识。应用统计决策树进行了谓语中心词的自动识别,并获得了较为满意的测试结果。
It is necessary to grasp the main structure of the sentence through its predicate head for the sentence similarity calculation in EBMT.Taking 3000 tagged Chinese simple sentences as training set and the syntactic attributes and the contextual information as the classification features,this research acquires the knowledge of recognizing the predicate head through constructing a statistical decision tree model.The problem of applying the statistical decision tree to recognize the predicate head is also discussed.
出处
《北京大学学报(自然科学版)》
CAS
CSCD
北大核心
1998年第2期221-230,共10页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家863计划
国家自然科学基金
关键词
自然语言处理
机器翻译
知识获取
谓语中心词
natural language processing
corpus
machine translation
knowledge acquisition
predicate head
statistical decision tree