期刊文献+

基于长度的扩展方法的汉英句子对齐 被引量:24

Aligning Sentences in Chinese-English Corpora with Extended Length-based Approach
在线阅读 下载PDF
导出
摘要 本文提出了一种用于汉英平行语料库对齐的扩展方法。该扩展方法以基于长度的统计对齐方法为主,然后根据双语词典引入了词汇信息,而基于标点的方法作为对齐的后处理部分。这种扩展方法不仅避免了复杂的中文处理,例如,汉语分词和词性标注,而且在统计方法中引入了关键词信息,以提高句子对齐的正确率。本文中所用的双语语料是LDC的关于香港的双语新闻报道。动态规划算法用于系统的实现。和单纯的基于长度的方法和词汇方法相比,我们的扩展方法提高了句子对齐的正确率,并且结果是比较理想的。 In this paper, we present a new approach to align Chinese-English sentences in the parallel texts. This approach Ks mainly based on statistical approach, here is length-based alignment approach, and simultaneously considers lexical informarion from the bilingual lexicon. Punctoation-based approach is the post-processing for alignment. This extended approach not only avoids complicated Chinese processing further, such as segmentation and part-of-speech tagging, but also uses some Chinese key words in the statistical approach to improve accuracy of sentence alignment. The bilingual corpus in this paper is LDC parallel texts in Hong Kong newspaper. Then dynamic programming algorithm is used to accomplish the alignment processing. Compared with length-based approach and lexical approach, our approach improves the alignment accuracy and the experiment result is desirable.
出处 《中文信息学报》 CSCD 北大核心 2005年第5期31-36,58,共7页 Journal of Chinese Information Processing
基金 日本情报通信研究机构的委托研究项目
关键词 人工智能 机器翻译 句子对齐 中文处理 双语语料库 artificial intelligence machine translation sentence alignment Chinese processing bilingual corpus
  • 相关文献

参考文献2

二级参考文献11

  • 1[1]Rajashekar T B, Croft W B. Combining Automatic and Manual Index Representations in Probabilistic Retrieval. Journal of the American Society for Information Science, 1995,46(4) :272 - 283
  • 2[2]Nagao M. A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. In: Elithom A and Banerji R. Artificial and Human Intelligence,Edited Review Papers presented at the International NATO Symposium. Amsterdam: NATO Publications, 1984,173 - 180
  • 3[3]Nagao M. Machine Translation: How Far Can It Go? New York: Oxford University Press, 1989
  • 4[4]Brown P F, Cocke J, Della P S, et al. A Statistical Approach to Machine Translation. Computational linguistics, 1990,16(2) :79 - 85
  • 5[5]Brown, P F, Della P S, Della P V, et al. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational linguistics, 1993,19(2): 263 - 312
  • 6[6]Abney S. Statistical Methods and Linguistics. In: Judith L K and Philip R. The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Cambridge: MIT Press, 1996:1 -26
  • 7[7]Brown P F, Lai J C, Mercer R L. Aligning Sentences in Parallel Corpora. In: Proceedings of 29th Annual Meeting of the Association for Computational Linguistics. Berkeley, CA: ACL, 1991,169 - 176
  • 8[8]Gale W A,Church K W. A Program for Aligning Sentences in Bilingual Corpora. Computational linguistics, 1993,19(1): 75 - 102
  • 9[9]Kay M,Roscheisen M. Text - Translation Alignment. Computational linguistics, 1993,19(1): 121 - 142
  • 10[10]Chen S F. Aligning Sentences in Bilingual Corpora Using Lexical Information. In: Proceedings of 31 st Annual Meeting of the Association for Computational Linguistics. Columbus, OH: ACL, 1993,9 - 16

共引文献18

同被引文献262

引证文献24

二级引证文献105

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部