期刊文献+

基于双语对齐的汉语–新蒙古文命名实体翻译 被引量:4

Chinese-Slavic Mongolian Named Entity Translation Based on Word Alignment
在线阅读 下载PDF
导出
摘要 汉语–新蒙古文命名实体翻译在跨汉语–新蒙古文信息处理中具有重要意义,而直接使用机器翻译的方法不能达到满意的结果。针对上述问题,提出一种从汉语–新蒙古文平行语料中自动抽取汉语–新蒙古文命名实体翻译对的方法。该方法只需对汉语端进行命名实体标注;然后基于双语HMM词对齐结果,利用滑动窗口的方法抽取所有候选命名实体翻译对;最后基于融合5种特征的最大熵模型,对所有候选翻译单位进行过滤,选取与汉语端命名实体相对应的置信度最高的新蒙古文命名实体翻译单位。实验结果表明,该方法优于基于HMM的方法,在对齐模型只是部分准确的情况下,也获得较高准确率的汉语–新蒙古文命名实体翻译对。 Chinese to Slavic Mongolian Named Entity Translation in cross Chinese and Slavic Mongolian information processing has a very important significance. However, using the machine translation method directly cannot achieve satisfactory result. In order to solve the above problem, a novel approach was proposed to extract Chinese-Slavic Mongolian Named Entity pairs automatically. Only the Chinese named entities need to be identified, then extracting all of the candidate named entity pairs using sliding window method based on HMM word alignment result. Finally filtering all of the candidate named entity translation units based on Max Entropy Model integrated with five features, and choose the most probable aligned Slavic Mongolian NEs to the Chinese NEs. Experimental results show that this approach outperforms HMM model, achieves high quality of ChineseSlavic Mongolian named entity pairs with relatively high precision, even though sometimes the word alignment result is partially correct.
出处 《北京大学学报(自然科学版)》 EI CAS CSCD 北大核心 2016年第1期148-154,共7页 Acta Scientiarum Naturalium Universitatis Pekinensis
基金 国家自然科学基金(61362028)资助
关键词 命名实体 识别 翻译 双语对齐 named entity recognition translation bilingual word alignment
  • 相关文献

参考文献16

  • 1Bikel D M, Miller S, Schwartz R, et al. Nymble: a high-performance learning name-finder//Proceedings of the Fifth Conference on Applied Natural Language Processing. Stroudsburg, PA: Association for Computa- tional Linguistics, 1997:194-201.
  • 2赵军.命名实体识别、排歧和跨语言关联[J].中文信息学报,2009,23(2):3-17. 被引量:51
  • 3A1-Onaizan Y, Knight K. Translating named entities using monolingual and bilingual resources // Proce- edings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA: Asso- ciation for Computational Linguistics, 2002:400-408.
  • 4Knight K, Graehl J. Machine transliteration. Compu- tational Linguistics, 1998, 24(4): 599-612.
  • 5Tsuji K. Automatic extraction of translational Japanese-KATAKANA and English word pairs from bilingual corpora. International Journal of Computer Processing of Oriental Languages, 2002, 15(3): 261- 279.
  • 6Lee J S, Choi K S. A statistical method to generate various foreign word transliterations in multilingual information retrieval system//Proceedings of the 2nd International Workshop on Information Retrieval with Asian Languages (IRAL'97). New York, 1997:123 128.
  • 7Huang F, Vogel S, Waibel A. Automatic extraction of named entity translingual equivalence based on multi- feature cost minimization // Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-Language Named Entity Recognition- Volume 15. Stroudsburg, PA: Association for Computational Linguistics, 2003: 9-16.
  • 8Wan S, Verspoor C M. Automatic English-Chinese name transliteration for development of multilingual resources // Proceedings of the 17th International Conference on Computational Linguistics-Volume 2Stroudsburg, PA: Association for Computational Linguistics, 1998:1352-1356.
  • 9Feng D, Lfi Y, Zhou M. A new approach for English- Chinese named entity alignment//Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, 2004:372-379.
  • 10那顺乌日图,雪艳,淑琴,等.蒙古文人名自动识别研究//语言计算与基于内容的文本处理:全国第七届计算语言学联合学术会议论文集.北京:清华大学出版社,2003:97-102.

二级参考文献67

  • 1孙茂松,黄昌宁,高海燕,方捷.中文姓名的自动辨识[J].中文信息学报,1995,9(2):16-27. 被引量:89
  • 2蒋龙,周明,简立峰.利用音译和网络挖掘翻译命名实体[J].中文信息学报,2007,21(1):23-29. 被引量:11
  • 3NIST. The ACE 2007 (ACE07) Evaluation Plan: Evaluation of the Detection and Recognition of ACE Entities, Values, Temporal Expressions, Relations, and Events [EB/OL]. [-2007]. http://www, hist. gov/ speech/tests/ace/2OOT/doc/aceOT-evalplan, vl. 3a. pdf.
  • 4Nancy A. Chinchor. Overview of MUC-7/MET-2[C]//Proceedings of the Seventh Message Under- standing Conference (MUC-7), Fairfax, Virginia, 1998.
  • 5Gina Anne Levow. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition[C]//Proceedings of the Fifth SigHAN Workshop on Chinese Language Processing, Sydney: Association for Computational Lin- guistics, 2006:108 117.
  • 6A. Mikheev, C. Grover, Moens M. Description of the LTG System Used for MUC-7[C]//Proceedings of 7th Message Understanding Conference ( MUC-7 ), Fairfax, Virginia, 1998.
  • 7863计划中文信息处理与智能人机接口技术评测组.2004年度863计划中文信息处理与智能人机交互技术评测:命名实体评测结果报告[R].北京:863计划中文信息处理与智能人机接口技术评测组,2004.
  • 8Ralph Grishman, Beth Sundheim. Design of the MUC-6 evaluation [C]//Proceedings of 6th Message Under- standing Conference, Columbia, MD, 199S.
  • 9G. R. Krupka, K. Hausman. IsoQuest. Inc.:Description of the NetOwl TM Extractor System as Used for MUC-7 [C]//Proceedings of the 7th Message Understanding Conference. (MUC-7), Fairfax, Virginia, 1998.
  • 10W.J. Black, F. Rinaldi, D. Mowart. FACILE: Description of the NE System Used for MUC-7 [C]// Proceedings of the 7th Message Understanding Conference. (MUC-7), Fairfax, Virginia, 1998.

共引文献50

同被引文献20

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部