期刊文献+

短时傅里叶逆变换的苗语语声合成方法

Inverse short-time Fourier transform-based Hmong language speech synthesis method
在线阅读 下载PDF
导出
摘要 少数民族语言的语声合成研究作为语声合成研究的一个重要方向,在人机交互领域备受关注。针对现有两阶段语声合成模型复杂度高、演算速度慢的问题,提出一种基于短时傅里叶逆变换的苗语语声合成方法。该方法根据语声特征提取的过程,减少过采样卷积的使用,以降低模型的复杂度,同时结合短时傅里叶逆变换进行语声波形相位和幅度谱的重建,实现从频域到时域的快速转换。此外,文中采用残差编码器对文本进行特征提取,以保留更多的输入文本信息。为了验证所提方法的有效性,以自建苗语语声语料库HmongSpeech(下载链接:http://sxjxsf.gzmu.edu.cn/info/1728/1214.htm)作为基准数据集,与典型的两阶段和单阶段模型进行对比分析。实验结果表明,所提方法在没有降低合成语声质量的同时提高了45倍的演算速度,且实时因子为0.01,满足实时应用要求;同时具有较强的鲁棒性,合成的词错误率仅为1.02%。 As an important area of speech synthesis research,the synthesis of minority languages has garnered significant attention in the field of human-computer interaction.In light of the challenges posed by the high complexity and slow inference speed of the existing two-stage speech synthesis model,a Hmong language speech synthesis method based on inverse short-time Fourier transform has been proposed.This technique diminishes the need for upsampling convolution in speech feature extraction,in order to simplify the model.At the same time,the phase and amplitude spectrum of speech waveform are restored by combining inverse short-time Fourier transform,which realizes fast conversion from frequency domain to time domain.Furthermore,the residual encoder is used to extract the features of the text to retain more input text information.In order to verify the effectiveness of the proposed method,the self-built Hmong speech corpus,HmongSpeech(download link:http://sxjxsf.gzmu.edu.cn/info/1728/1214.htm),is used as the benchmark dataset to compare with the typical two-stage and single-stage models.The experimental results show that the proposed method can improve the inference speed between 4 to 5 times without reducing the quality of synthesized speech and the real-time factor is 0.01,which meets the requirements of real-time application.At the same time,it has demonstrated a strong level of robustness,with a synthesized word error rate of only 1.02%.
作者 蔡姗 王林 郭胜 邹雪 吴磊 CAI Shan;WANG Lin;GUO Sheng;ZOU Xue;WU Lei(College of Data Science and Information Engineering,Guizhou Minzu University,Guiyang 550025,China;Key Laboratory of Pattern Recognition and Intelligent System of Guizhou Province,Guiyang 550025,China)
出处 《应用声学》 北大核心 2025年第2期339-349,共11页 Journal of Applied Acoustics
基金 贵州省科技计划项目(黔科合基础-ZK[2023]一般143) 贵州省教育厅自然科学研究项目(黔教技[2023]061号,黔教技[2023]012号) 贵州省科技厅众创空间项目《黔民筑梦众创空间》(黔科合平台人才ZCKJ[2021]007)。
关键词 苗语语声合成 短时傅里叶逆变换 演算速度 残差编码器 Hmong language speech synthesis Inverse short-time Fourier transform Inference speed Residual encoder
  • 相关文献

参考文献6

二级参考文献33

共引文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部