摘要
双语对齐是自然语言处理研究的重要课题之一,结合基于句子长度和基于词典的两种经典的对齐算法,通过段内寻找锚点的算法对双语互译文本进行划分,实现了双语句子对齐,为双语语料库的建设提供了工具,并为双语教学词典的编纂做了基础性工作。
The alignment research on bilingual corpora is one of the most important topics in the field of natural language processing. Combining length-based method with lexicon-based method,we splitted bilingual text into several pieces by anchor in paragraph and fulfilled the work of alignment of bilingual texts at the sentence level. It offeres a support to the construction of bilingual corpus and did a basic work for the compilation of teaching dictionary.
出处
《计算机与数字工程》
2007年第11期153-157,共5页
Computer & Digital Engineering
关键词
双语语料库
双语对齐
句子对齐
词典编纂
parallel corpora, parallel corpora alignment,sentence alignment,dictionary compilation