期刊文献+

混合策略的汉维句子对齐 被引量:3

Chinese-Uyhur sentence alignment based on hybrid strategy
在线阅读 下载PDF
导出
摘要 提出了一种混合算法对齐汉维句子,不需要汉语分词、词性标注预处理,利用双语语料的词汇共现信息,自动抽取汉维语词汇搭配,作为基于词汇对齐的词典,并结合基于长度的方法进行句子对齐,实验结果验证了该混合算法的有效性,汉维语句子对齐的正确率和召回率,达到了97.5%和97.1%。 This paper proposes a new approach to align Chinese-Uyhur sentences in the parallel texts.This approach avoids complicated Chinese processing further, such as segmentation and part of speech tagging.The lexical correspondence information is extracted from the bilingual corpora and used as the lexicon of lexicon-method model, combined with length-based approach, the hybrid approach improves the alignment accuracy and recall,and gets an encouraging 97.5% precision and 97.1% recall.
出处 《计算机工程与应用》 CSCD 北大核心 2010年第34期143-145,170,共4页 Computer Engineering and Applications
基金 国家自然科学基金(No.60663006 No.60963017) 新疆自治区高校科研计划项目(No.XJEDU2009I05 No.XJEDU2009S08)~~
关键词 双语语料 句子对齐 混合策略 bilingual corpora sentence alignment hybrid strategy
  • 相关文献

参考文献12

  • 1Dolan W B, Pinkham J, Richardson S D.MSR-MT, the microsoft research machine translation system[C]//LNCS 2499, AM- TA, 2002: 237-239.
  • 2Wu D, Xia X.Large-scale automatic extraction of an English-Chinese translation lexicon[J].Machine Translation, 1995, 9 (3/4) :285-313.
  • 3Fattah M A, Ren F, Shingo K.Adaptive threshold parameters for bilingual dictionary extraction from the intemet archive[J]. International Journal Information, 2005,8 ( 1 ) : 165-175.
  • 4Dejean H,Gaussier E, Sadat F.Bilingual terminology extraction: An approach based on a multilingual thesaurus applicable to comparable corpora[C]//Proceedings of the 19th International Conference on Computational Linguistics,COLING 2002,Taipei, Taiwan, 2002 : 218-224.
  • 5Chuang T C ,Yeh K C.Aligning parallel bilingual corpora statistically with punctuation criteria[J].Computational Linguistics and Chinese Language Processing, 2005,10 ( 1 ) : 95-122.
  • 6Brown P F, Lai J C, Mercer R L.Aligning sentences in parallel corpora[C]//Proceedings of 29th Annual Meeting of the Association for Computational Linguistics Berkeley.CA: ACL, 1991: 169-176.
  • 7Gale W A, Church K W.A program for aligning sentences in bilingual corpora[J].Computational Linguistics, 1993,19( 1 ) :75-102.
  • 8Kay M,Roscheisen M.Text-translation alignment[J].Computational Linguistics, 1993,19( 1 ) : 121-142.
  • 9Wu D.Aligning a parallel English-Chinese corpus statistically with lexical criteria[C]//Proceedings of the 32th Annual Conference of the Association for Computational Linguistics, Las Cruces.NM: ACL, 1994:80-87.
  • 10张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量:24

二级参考文献9

  • 1BROWN P,LAI J,MERCER R.1991.Aligning Sentences in Parallel Corpora[A].ACL-91[C].1991.
  • 2WU,Dekai.Aligning a parallel English -Chinese corpus statistically with lexical criteria[A].In Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics[C].1994,80-87,Las Cruces,New Mexico.
  • 3GALE W A,CHURCH K W.A Program for Aligning Sentences in Bilingual Corpora[J].Computational Linguistics,1993,19(2):75-102.
  • 4Church,Kenneth W.Char_ align:A Program for Aligning Parallel Texts at the Character Level[A].Proceedings of ACL -93,Columbus OH[C].1993.
  • 5CHEN Stanley.Aligning Sentences in Bilingual Corpora Using Lexical Information[A].Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics[C].1993.
  • 6KAY M,ROSCHEISEN M.Text-Translation Alignment[A].Computational Linguistics[C].1993.
  • 7刘昕,周明,朱胜火,黄昌宁.基于自动抽取词汇信息的双语句子对齐[J].计算机学报,1998,21(S1):151-158. 被引量:18
  • 8王斌,刘群,张祥.汉英双语库自动分段对齐研究[J].软件学报,2000,11(11):1547-1553. 被引量:13
  • 9吕学强,李清隐,陈文亮,姚天顺.汉英法律文献的子条级自动索引和对齐[J].中文信息学报,2002,16(4):52-59. 被引量:2

共引文献37

同被引文献30

  • 1淑琴,那顺乌日图.面向EBMT系统的汉蒙双语语料库的构建[J].内蒙古社会科学,2006,27(1):140-144. 被引量:6
  • 2李维刚,刘挺,张宇,李生.基于长度和位置信息的双语句子对齐方法[J].哈尔滨工业大学学报,2006,38(5):689-692. 被引量:25
  • 3黄红梅,李鹏,赵济民.宇称模糊逻辑与自然语言理解[J].现代电子技术,2007,30(8):84-86. 被引量:1
  • 4Dolan W B,Pinkham J,Richardson S D.MSR-MT, the micro-soft research machine translation system [C]//LNCS 2499,AM-TA,2002: 237-239.
  • 5Wu D,Xia X.Large-scale automatic extraction of all English-Chinese translation lexicon[J].Machine Translation,1995,9(3/4):285-313.
  • 6Fattah M A,Ren F,Shingo K.Adaptive threshold parameters for bilingual dictional~~ extraction from the interact archive[J].International Journal Information,2005,8(1): 165-175.
  • 7Dejean H,Gaussier E,Sadat F.Bilingual terminology extraction:An approach based on a muhilingual thesaurus applicable to comparable corpora[C]//Proeeedings of the 19th International Conference on Computational Linguistics,COLING 2002.Taipei,2002:218-224.
  • 8Chuang T C,Ych K C.Aligning parallel bilingual corpora staffstically with punctuation criteria [J].Computational Linguistics and Chinese Language Processing,2005,10(1):95-122.
  • 9Brown P F,Della Pietra V J,Della Pietra S A,et al.The mathematics of Statistical Machine Translation: Parameter Estimation[J].Compu- tational Linguistics,1993,19(2):263-311.
  • 10Church K W.Char align: A program for aligning parallel texts at the character level[C]//Proeeedings of the 31st Annual Meeting of the Association for Computational Linguistics.Columbus,Ohio, 1993:1-8.

引证文献3

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部