摘要
针对传统的英语翻译系统存在无法准确识别说话者语音和语气的问题。设计一个基于语音识别和语气语音合成的英语翻译系统,该系统终端主要包括语音识别、语言翻译、语气识别、语气转换和语气语音合成模块。基于CVAE语气语音合成模型对语音识别和语言翻译的英语语句进行语气语音合成,以进行便携式英语翻译终端设计与实现。实验表明,基于CVAE的语气语音合成模型合成语气语音的基频曲线与原始语音间的误差仅为0.02,两者基频曲线十分接近。且在主观评价方面,本模型的语音合成自然度MOS评分为3.84分,方差仅为0.004;情感语气一致性平均打分为 3.72,方差为 0.002 。综合分析可知,本模型可取得较好的语音生成效果,生成语音具备多样性和准确性。系统应用发现,本模型在系统中可提升英语翻译系统终端的语音识别和语气语音合成效果,系统性能 优越。
In view of the traditional English translation system, there is the problem of low speech recognition accuracy, flat output speech and intonation, which cannot accurately express the tone of the speaker. An English translation system based on speech recognition and tone speech synthesis is proposed. The system terminal mainly includes speech recognition, language translation, tone recognition, tone conversion and tone speech synthesis module. Based on the conditional variational autoencoder (CVAE) tone speech synthesis model for speech recognition and language translation, in order to design and realize the portable English translation terminal. The experiment shows that the error between the base frequency curve and the original speech based on CVAE is only 0.02, and the two base frequency curves are very close. For subjective evaluation, the MOS score of speech synthesis is 3.84, and the variance is only 0.004;the average score is 3.72, and the variance is 0.002. Comprehensive analysis shows that this model can achieve good speech generation effect, and generate speech with diversity and accuracy. The system application shows that this model can improve the speech recognition and tone speech synthesis effect of the English translation system terminal, and the system performance is superior.
作者
涂琼引
成南
TU Qiongyin;CHENG Nan(Chongqing Vocational College of Light Industry,Chongqing 401329,China)
出处
《自动化与仪器仪表》
2023年第1期251-256,共6页
Automation & Instrumentation
基金
《新时代高职院校教师教学能力提升途径研究》(203767)
《“提质培优”背景下高职公共英语课程思政教学改革的研究》(22SKGH607)。
关键词
语音识别
英语翻译
CVAE
系统终端
语气语音合成
speech recognition
English translation
CVAE
system terminal
tone voice synthesis