摘要
基于实例的机器翻译系统EBMT都需要有一个非常大的实例模式库,其数量级通常在百万句对以上.因此,如何从中快速地选择出一定数量的与待翻译的输入句子比较相似的候选实例,提供给后续句子相似度计算、类比译文构造等模块作进一步的处理,是EBMT系统所必须解决的一大难题.文章基于句子的词表层特征和信息熵提出了一种多层次候选实例模式检索算法,通过在多策略机器翻译系统IHSMTS上的运行测试,结果表明该算法较好的解决了这一难题.
EBMT system often requires a large corpus of translation examples which is on the order of millions sentence pairs. So the difficulty how to fast and effectively retrieve an amount of candidate translation examples which are useful for latter translation by analogy reasoning from the corpora must be resolved for any application EBMT system. In this paper, a multi-layer retrieval approach of candidate translation examples is proposed based on word surface features and word entropy. The approach is tested on an application MT system, and the test result show that the approach effectively resolves the problem of the retrieval of candidate translation examples.
出处
《小型微型计算机系统》
CSCD
北大核心
2005年第3期330-334,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60272088)资助.
关键词
基于实例的机器翻译
实例模式库
候选实例
词表层特征
信息熵
EBMT
corpora of translation example
candidate translation example
word surface features
entropy