期刊文献+

汉英双语词典的自动构建技术研究 被引量:1

The Automatic Construction of Bilingual Dictionary
在线阅读 下载PDF
导出
摘要 汉英词典作为沟通中文与英语两种不同语言的桥梁,是中国与世界交流的工具。在信息时代飞速发展的今天,双语词典的自动构建技术在机器翻译和跨语言检索领域起着重要的作用,本文对双语词典的自动构建方法和其中的关键技术进行了比较全面的分析和总结,并提出一种从汉英平行语料库中抽取双语词语,自动构建双语词典的方法,在实现汉英句子级对齐后,对双语语料分别进行分词和词性标注处理,通过抽取汉英词语单元并计算其关联概率来实现汉英的词语对齐,最终生成双语词典。该方法在对真实语料的双语词典构建实验中取得了较好的结果,词对齐效果优于传统的IBM模型方法。 Chinese-English bilingual dictionary is a communication tool between China and the world.With the rapid development of information age,the automatic construction of bilingual dictionaries plays an important role in the area of machine translation and cross-language retrieval.The method of automatic construction of bilingual dictionary and the key technologies are comprehensive analysis and summary in this paper.It proposes a way of automatic bilingual dictionaries that terms are collected from parallel corpus.Parallel corpora are first aligned,and tagged with their part-of-speech categories respectively.Through Chinese-English word units extracting,the associated probability between every Chinese word unit and its English word unit is calculated.Eventually a bilingual dictionary is generated.A better performance is obtained in the experiments of bilingual dictionary construction on real corpora,and the result of words alignment is better than traditional IBM model method.
出处 《情报学报》 CSSCI 北大核心 2011年第4期402-409,共8页 Journal of the China Society for Scientific and Technical Information
基金 中国博士后科学基金资助项目(编号:20100470392) 中国科学技术信息研究所预研基金项目“多语言科技文献术语抽取及匹配的理论与方法研究”(编号:YY-2010019) 中国科学技术信息研究所学科建设重点工作项目(编号:2009KP01-3-3)资金支持 国家科技支撑计划项目(编号:2006BAH03B02)
关键词 双语字典 自动构建 词对齐 bilingual dictionary automatic construction words alignment
  • 相关文献

参考文献16

  • 1Wu D K, Xia X Y. Learning an EngJish Chinese lexicon from a parallel corpus [ C ]//Proceedings of the 1st Conference of the Association for machine Translation in the American, 1994.
  • 2Gale W, Church K. Identifying word correspondences in parallel texts [ C ]//Proceedings of the 4th DARPA Workshop on Speech and Natural Language, 1991.
  • 3Fung P. A statistical view on bilingual lexicone extraction: from parallel corporation non-parallel corpora [ C ]// Proceeding of AMTA-98 Conference, Machine Translation and the Information Soup Pennsylvania, 1998.
  • 4Melamed D. Models of Translational Equivalence among Words [ J ]. Computational Linguistics, 2000, 26 ( 2 ) : 221-249.
  • 5Yamamoto K, Matsumoto Y, Kitamura M. A Comparative Study of Translation Units For Bilingual Lexicon Extraction[C]//Proceedings of ACL-2001 Workshop On Data-Driven Methods in Machine Translation,2001.
  • 6L Y, L Q. Log-linear Models for Word Alignment[ C ]// The 43rd Annual Meeting of Association of Computational Linguistics ( ACL-05 ) [ C ]. Michigan, USA: Publisher Associaton for Computational Linguistics,2005.
  • 7Wang H F, Wu H. Word alignment for languages with scarce resources using bilingual corpora of other language pairs[ C ]//Proceedings of the COLING/ACL on Main Conference Poster Sessions Table of Contents. [ C ]. Michigan, USA: Publisher Association for Computational Linguistics ,2006.
  • 8Blunsom P. Discriminative word alignment with conditional random fields [ C ]//Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL Table of Contents, 2006.
  • 9Boescheise K. Text-translation alignment [ J ]. Computational Linguistics, 1993,19 ( 1 ) : 121-142.
  • 10Chen S F. Aligning sentences in bilingual corpora using lexical information [ C ]//Proceedings of 31st Annual Meeting of the Association for Computational Linguistics (ACL-31), 1993.

二级参考文献30

  • 1张孝飞,陈肇雄,黄河燕,王建德.基于锚点词对的双语词对齐算法[J].小型微型计算机系统,2006,27(2):330-334. 被引量:11
  • 2董振东,董强,郝长伶.知网的理论发现[J].中文信息学报,2007,21(4):3-9. 被引量:99
  • 3王斌.汉语语料库自动对齐研究(博士学位论文)[M].北京:中国科学院计算技术研究所,1999..
  • 4Yang LIU, Qun LIU, and Shouxun LIN. Log-linear Models for Word Alignment[C]. Morristown, NJ, USA: The 43rd Annual Meeting of Association of Computational Linguistics (ACL-05). Publisher Association for Computational Linguistics, 2005: 25-30.
  • 5Wang Haifeng, Wu Hua, Liu Zhanyi. Word alignment for languages with scarce resources using bilingual corpora of other language pairs [C]. Morristown, NJ, USA: Proceedings of the COLING/ACL on Main Conference Poster Sessions Table of Contents.Publisher Association for Computational Linguistics, 2006:874-881.
  • 6Phil BI e word alignment with conditional random fields[C]. Morristown, N J, USA: Proceedings of the 21 st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL Table of Contents. Publisher Association for Computational Linguistics,2006:65-67.
  • 7Dan Tufis,Radu Ion,Alexandru Ceausu, et al.Combined word alignments[C].Morristown,NJ,USA:Proc of the ACL-2005 Workshop on Building and Using Parallel Texts:Data-driven Machine Translation and Beyond, Publisher Association for Computational Linguistics, 2005:107-110.
  • 8Shankar Kumar, Franz Och,Wolfgang Macherey.lmproving word aligmnent with bridge languages [C]. Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007: 42-50.
  • 9The Giza++ Toolkit[EB/OL]. http: //www.fjoch.com/GIZA++. html.
  • 10刘划 蔡东风 代翠.一种基于知网的双语词对齐方法.小型微型计算机系统,2007,28(8):436-437.

共引文献43

同被引文献48

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部