基于多特征的搭配翻译模型研究被引量：1

Study on the feature-rich collocation translation

下载PDF

导出

摘要提出一种新的搭配(Collocation)翻译方法,该方法在最大熵模型框架下,充分利用各种从单语和双语语料库中获取的信息.与过去的过分依赖双语语料库的方法不同,新的搭配翻译方法可以使用单语语料库训练翻译模型,在搭配内在信息的基础上,进一步引入了上下文信息.采用EM(Expectation Maximization)算法估计基于上下文的词汇翻译概率.本模型同时具备集成来自双语语料库信息的能力.实验表明,本文方法优于现有的基于单语语料库的搭配翻译方法,在双语语料库的支持下还可以得到更好的结果. This paper proposes model that can make full use of a new method for collocation translation. We exploit a collocation translation all available information derived from both monolingual and bilingual corpora. Instead of heavily relying on bilingual parallel corpora, our approach can train translation models using mono- lingual corpora. Both inside-collocation information and contextual information are exploited in our model. EM algorithm is applied to estimate contextual word translation probabilities using a monolingual corpus. model also has the ability to integrate bilingual derived features if they are available. Experiments show our approach outperforms the existing monolingual The Our that corpus based on methods in collocation translation and achieves better results when making use of available bilingual corpus.

作者陈鄞吕雅娟李生

机构地区哈尔滨工业大学国家教育部微软重点实验室微软亚洲研究院

出处《哈尔滨工业大学学报》 EI CAS CSCD 北大核心 2007年第11期1790-1795,共6页 Journal of Harbin Institute of Technology

关键词搭配最大熵单语语料库 EM算法 collocation maximum entropy monolingual corpora expectation maximization algorithm

分类号 TP391.2 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献23

1SMADJA F. Retrieving collocations from text: Xtract[J]. Computational Linguistics, 1993, 19 ( 1 ) : 143 - 177.
2LU Yajuan, ZHOU Ming. Collocation translation Acquisition using monolingual corpora[ C ]//The 42th annual conference of the Association for Computational Linguistics. Barcelona: [ s. n. ] , 2004 : 120 - 127.
3GAO Jianfeng, NIE Jianyun, HE Hongzhao, et al. Resolving query translation ambiguity using a decaying cooccurrence model and syntactic dependence relations [C]//The 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Tampere, Finland: Is. n. ], 2002:183 - 190.
4WU Hua, ZHOU Ming. Synonymous collocation extraction using translation Information [ C ]//The 41 th annual conference of the Association for Computational Linguistics. Sapporo, Japan: [ s. n. ], 2003 : 120 - 127.
5BROWN P F, PIETRA S A D, PIETRA V J D, et al. The mathematics of machine translation : parameter estimation[J]. Computational Linguistics, 1993, 19(2): 263 -313.
6OCH F J, NEY H. Discriminative training and maximum entropy models for statistical machine translation [ C]//The 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, USA: [ s. n. ],2002:295 -302.
7SHEN Libin, JOSHI A K. An SVM based voting algorithm with application to parse reranking [ C ]//Proc of CoNLL. Edmonton, Canada : [ s. n. ] , 2003.
8SMADJA F, MCKEOWN K R, HATZIVASSILOGLOU V. Translation collocations for bilingual lexicons: a statistical approach[ J]. Computational Linguistics, 1996, 22:1 -38.
9ECHIZEN-YA H, ARAKI K, MOMOUCHI Y, et al. Effectiveness of automatic extraction of bilingual collocations using recursive chain-link-type learning [ C ]//The 9th Machine Translation Summit. New Orleans, Louisiana, USA: [s. n. ] , 2003 : 102 - 109.
10KUPIEC J. An algorithm for finding noun phrase correspondences in bilingual corpora [ C ]//The 31st Annual Meeting of the Association for Computational Linguistics. Columbus, USA: [s. n. ] , 1993:23 -30.

同被引文献17

1李刃之,吴建成.西方国家机器翻译的勃兴[J].上海翻译,1992(4):37-42. 被引量：1
2戴伟长.国内外机器翻译进展状况[J].软件世界,1994(12):2-4. 被引量：5
3娜步青.基于统计的蒙汉机器翻译系统研究[J].内蒙古农业大学学报（自然科学版）,2005,26(4):151-154. 被引量：2
4高原.简谈俄汉语词序对比[J].理论观察,2006(5):128-129. 被引量：4
5杨攀,张建,李淼,乌达巴拉,雪艳.汉蒙统计机器翻译中的形态学方法研究[J].中文信息学报,2009,23(1):50-57. 被引量：10
6何中军,刘群,林守勋.基于短语相似度的统计机器翻译模型[J].高技术通讯,2009,19(4):337-341. 被引量：3
7吕冬,黄长奇.2010年国际翻译日主题多样化的语言高质量的翻译[J].中国翻译,2010,31(4):19-19. 被引量：1
8何彦青,张均胜,王惠临.基于词与短语的多机器翻译系统融合方法研究[J].情报学报,2011,30(12):1268-1273. 被引量：1
9刘树杰,李志灏,李沐,周明.一种面向统计机器翻译的协同权重训练方法[J].软件学报,2012,23(12):3101-3114. 被引量：3
10胡连影.概述机器翻译理论对汉俄人工翻译的几点启示[J].外语研究,2013,30(3):82-86. 被引量：2

引证文献1

1苏依拉,乌尼尔,刘婉婉.基于统计分析的蒙汉自然语言的机器翻译[J].北京工业大学学报,2017,43(1):36-42. 被引量：4

二级引证文献4

1格根塔娜.蒙汉新闻翻译如何利用网络资源[J].传播力研究,2018,2(27):244-244. 被引量：1
2霍小静.人工智能理论的机器自动翻译系统[J].微型电脑应用,2020,36(11):77-79. 被引量：3
3项恒,张驰,李猛.基于NLP的不规范航行通告识别方法[J].中国民航大学学报,2022,40(2):14-18. 被引量：3
4赵旭,苏依拉,仁庆道尔吉,石宝.非自回归翻译模型在蒙汉翻译上的应用[J].计算机工程与应用,2022,58(12):310-316. 被引量：2

1张明庆,滕枫,梁春军.基于嵌入式技术的信息显示原理演示仪[J].微计算机信息,2006(09Z):124-126.
2LIN Jianfang,LI Sheng,CAI Yuhan.Collocation Extraction Using Web Feedback Data[J].Chinese Journal of Electronics,2009,18(2):312-316.
3张孝飞,陈肇雄,黄河燕,蔡智.词性标注中生词处理算法研究[J].中文信息学报,2003,17(5):1-5. 被引量：13
4邓红莉,杨韬.一种基于深度学习的异常检测方法[J].信息通信,2015,28(3):3-4. 被引量：5
5赵建新,王堃.边缘特性及边缘检测在图像插值算法中的应用研究[J].硅谷,2009,2(22). 被引量：2
6王晨.基于SPSS的消防数据分析[J].廊坊师范学院学报（自然科学版）,2014,14(4):16-18. 被引量：1
7吕雅娟,李生,赵铁军,杨沐昀.基于双语语料库的翻译等价对自动抽取[J].高技术通讯,2003,13(5):19-24. 被引量：8
8巢佳媛,贡正仙.主题模型在统计机器翻译中的应用[J].中国科技信息,2013(11):99-100. 被引量：1
9曹杰,吕雅娟,苏劲松,刘群.利用上下文信息的统计机器翻译领域自适应[J].中文信息学报,2010,24(6):50-56. 被引量：4
10秦瀚钰,耳机林Sir,玄宇,自来也大人,黄浩,移动信息图库.不只是HiFi神器数码达人们的步步高vivo X1全解析[J].移动信息,2012(11):83-89.

哈尔滨工业大学学报

2007年第11期

浏览历史

内容加载中请稍等...

基于多特征的搭配翻译模型研究被引量：1

参考文献23

同被引文献17

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于多特征的搭配翻译模型研究 被引量：1

参考文献23

同被引文献17

引证文献1

二级引证文献4

相关作者

相关机构

相关主题

浏览历史

基于多特征的搭配翻译模型研究被引量：1