语法错误纠正,面临着较为严重的数据稀疏问题,这给机器翻译方法在该任务上的应用带来了直接的困难。本文首次提出在back-translation阶段采用sampling解码策略,并对比基于不同解码策略合成的伪平行句对给训练语法错误纠正模型带来的影...语法错误纠正,面临着较为严重的数据稀疏问题,这给机器翻译方法在该任务上的应用带来了直接的困难。本文首次提出在back-translation阶段采用sampling解码策略,并对比基于不同解码策略合成的伪平行句对给训练语法错误纠正模型带来的影响。在标准数据集CoNLL-2014 Test Set上的实验结果表明,本文提出的方法能显著提升语法错误纠正的性能。展开更多
Back-translation is the translation of a translated text back into the language of the original text, which is an effective way to ensure accurate translation and assumes to be beneficial to improving learner's Engli...Back-translation is the translation of a translated text back into the language of the original text, which is an effective way to ensure accurate translation and assumes to be beneficial to improving learner's English competence. This thesis, based on the author's back translation training exercise, intends to make a constrastive study of source text in English and its back-translation precisely from the aspects of word choice, collocation, and sentence mechanics. Results have shown that by doing back-translation practice, the learners will strengthen their awareness of language differences between English and Chinese. Besides, as a special kind of translation, back-translation has its implications on English learning and Chinese-English translation, such as being a recitation method and testing tool.展开更多
Large language models(LLMs)excel in multilingual translation tasks,yet often struggle with culturally and semantically rich Chinese texts.This study introduces the framework of back-translation(BT)powered by LLMs,or L...Large language models(LLMs)excel in multilingual translation tasks,yet often struggle with culturally and semantically rich Chinese texts.This study introduces the framework of back-translation(BT)powered by LLMs,or LLM-BT,to evaluate Chinese→intermediate language→Chinese translation quality across five LLMs and three traditional systems.We construct a diverse corpus containing scientific abstracts,historical paradoxes,and literary metaphors,reflecting the complexity of Chinese at the lexical and semantic levels.Using our modular NLPMetrics system,including bilingual evaluation understudy(BLEU),character F-score(CHRF),translation edit rate(TER),and semantic similarity(SS),we find that LLMs outperform traditional tools in cultural and literary tasks.However,the results of this study uncover a high-dimensional behavioral phenomenon,the paradox of poetic intent,where surface fluency is preserved,but metaphorical or emotional depth is lost.Additionally,some models exhibit verbatim BT,suggesting a form of data-driven quasi-self-awareness,particularly under repeated or cross-model evaluation.To address BLEU’s limitations for Chinese,we propose a Jieba-segmentation BLEU variant that incorporates word-frequency and n-gram weighting,improving sensitivity to lexical segmentation and term consistency.Supplementary tests show that in certain semantic dimensions,LLM outputs approach the fidelity of human poetic translations,despite lacking a deeper metaphorical intent.Overall,this study reframes traditional fidelity vs.fluency evaluation into a richer,multi-layered analysis of LLM behavior,offering a transparent framework that contributes to explainable artificial intelligence and identifies new research pathways in cultural natural language processing and multilingual LLM alignment.展开更多
文摘语法错误纠正,面临着较为严重的数据稀疏问题,这给机器翻译方法在该任务上的应用带来了直接的困难。本文首次提出在back-translation阶段采用sampling解码策略,并对比基于不同解码策略合成的伪平行句对给训练语法错误纠正模型带来的影响。在标准数据集CoNLL-2014 Test Set上的实验结果表明,本文提出的方法能显著提升语法错误纠正的性能。
文摘Back-translation is the translation of a translated text back into the language of the original text, which is an effective way to ensure accurate translation and assumes to be beneficial to improving learner's English competence. This thesis, based on the author's back translation training exercise, intends to make a constrastive study of source text in English and its back-translation precisely from the aspects of word choice, collocation, and sentence mechanics. Results have shown that by doing back-translation practice, the learners will strengthen their awareness of language differences between English and Chinese. Besides, as a special kind of translation, back-translation has its implications on English learning and Chinese-English translation, such as being a recitation method and testing tool.
基金Project supported by the Brazilian National Council for Scientific and Technological Development(CNPq)(No.309545/2021-8)。
文摘Large language models(LLMs)excel in multilingual translation tasks,yet often struggle with culturally and semantically rich Chinese texts.This study introduces the framework of back-translation(BT)powered by LLMs,or LLM-BT,to evaluate Chinese→intermediate language→Chinese translation quality across five LLMs and three traditional systems.We construct a diverse corpus containing scientific abstracts,historical paradoxes,and literary metaphors,reflecting the complexity of Chinese at the lexical and semantic levels.Using our modular NLPMetrics system,including bilingual evaluation understudy(BLEU),character F-score(CHRF),translation edit rate(TER),and semantic similarity(SS),we find that LLMs outperform traditional tools in cultural and literary tasks.However,the results of this study uncover a high-dimensional behavioral phenomenon,the paradox of poetic intent,where surface fluency is preserved,but metaphorical or emotional depth is lost.Additionally,some models exhibit verbatim BT,suggesting a form of data-driven quasi-self-awareness,particularly under repeated or cross-model evaluation.To address BLEU’s limitations for Chinese,we propose a Jieba-segmentation BLEU variant that incorporates word-frequency and n-gram weighting,improving sensitivity to lexical segmentation and term consistency.Supplementary tests show that in certain semantic dimensions,LLM outputs approach the fidelity of human poetic translations,despite lacking a deeper metaphorical intent.Overall,this study reframes traditional fidelity vs.fluency evaluation into a richer,multi-layered analysis of LLM behavior,offering a transparent framework that contributes to explainable artificial intelligence and identifies new research pathways in cultural natural language processing and multilingual LLM alignment.