摘要
提出了一种利用句子长度和位置信息的双语句子对齐方法,该方法的根本思想是:一定长度的句对在双语文本中的位置分布是相似的,利用(1∶1)型的句珠代替高频词作为候选锚点,使这种方法具有通用性.利用多种形式的测试数据进行的评价结果显示,这种方法有着良好的健壮性和语言无关性,有效地解决了双语真实文本的句子对齐问题.
This paper describes a new method for aligning real bilingual texts using sentence pairs' length and location information. The model was motivated by the observation that the location of a sentence pair with certain length is distributed in the whole text similarly. It uses ( 1 : 1 ) sentence beads instead of high frequency words as the candidate anchors to make the method general. The method was developed and evaluated through many different test data. The results show that it can achieve good aligned performance and be robust and language independent. It can resolve the alignment problem on real bilingual text.
出处
《哈尔滨工业大学学报》
EI
CAS
CSCD
北大核心
2006年第5期689-692,共4页
Journal of Harbin Institute of Technology
基金
国家自然科学基金资助项目(60435020)
关键词
句子对齐
双语语料库
锚点
长度和位置
sentence alignment
bilingual corpus
anchors
length and location