摘要
提出一种新的搭配(Collocation)翻译方法,该方法在最大熵模型框架下,充分利用各种从单语和双语语料库中获取的信息.与过去的过分依赖双语语料库的方法不同,新的搭配翻译方法可以使用单语语料库训练翻译模型,在搭配内在信息的基础上,进一步引入了上下文信息.采用EM(Expectation Maximization)算法估计基于上下文的词汇翻译概率.本模型同时具备集成来自双语语料库信息的能力.实验表明,本文方法优于现有的基于单语语料库的搭配翻译方法,在双语语料库的支持下还可以得到更好的结果.
This paper proposes model that can make full use of a new method for collocation translation. We exploit a collocation translation all available information derived from both monolingual and bilingual corpora. Instead of heavily relying on bilingual parallel corpora, our approach can train translation models using mono- lingual corpora. Both inside-collocation information and contextual information are exploited in our model. EM algorithm is applied to estimate contextual word translation probabilities using a monolingual corpus. model also has the ability to integrate bilingual derived features if they are available. Experiments show our approach outperforms the existing monolingual The Our that corpus based on methods in collocation translation and achieves better results when making use of available bilingual corpus.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2007年第11期1790-1795,共6页
Journal of Harbin Institute of Technology
关键词
搭配
最大熵
单语语料库
EM算法
collocation
maximum entropy
monolingual corpora
expectation maximization algorithm