摘要
口音转换(Accent Conversion,AC)旨在将源口音语音转换为目标口音语音,并保持源说话人音色和语音内容不变。现有的AC模型缺乏对训练数据分布以外的语音口音转换的泛化性。本文提出基于内容特征k-邻近(kNN)回归的零样本AC模型。一方面,采用WavLM第23层提取源和目标口音语音的内容特征,并利用kNN回归将源口音语音内容特征置换为目标口音语音及其最邻近的内容特征以实现口音转换;另一方面,为了保持转换后语音中源说话人音色,构建多说话人声码器对含有目标口音的语音内容特征和源说话人音色特征进行融合,以合成目标口音语音。该模型无需源口音语音参与训练,即可实现多种源口音到目标口音的转换。实验结果表明,该模型取得了比并行或非并行AC模型更好的客观与主观评价结果。
Accent Conversion(AC)aims to convert speech from the source accent to the target accent while preserving the source speaker's timbre and the speech content at the same time.Existing AC models cannot achieve good generalization capability for AC on speech that does not follow the distribution of the training data,as limits their applications seriously.To this end,a zero-shot AC model based on the kNN regression of speech content features is proposed.On the one hand,the 23rd layer of WavLM is adopted as the content encoder to extract the content features from both source and target accented speech,and kNN regression is employed to replace the source accented content feature with its nearest neighbors in the pool constructed by the target accented content features to achieve accent conversion.On the other hand,to preserve the source speaker's timbre in the converted speech,a multi-speaker vocoder is constructed to fuse the obtained target accented content features with the source speaker's timbre feature extracted by the speaker encoder to synthesize the speech with the target accent.In the proposed model,no source accented speech is required at the training stage,so it can convert various source accented speech to the target accented speech.That is,the proposed model achieves good generalization ability.Experimental results demonstrate that the proposed model achieves better objective and subjective evaluation results than available parallel or non-parallel AC models.
作者
罗宜鑫
陈宁
薛宇航
肖阳阳
LUO Yixin;CHEN Ning;XUE Yuhang;XIAO Yangyang(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;China Telecom Wanwei Information Technology Co.Ltd,Lanzhou 730000,China)
出处
《华东理工大学学报(自然科学版)》
北大核心
2025年第4期497-504,共8页
Journal of East China University of Science and Technology
基金
国家自然科学基金面上项目(61771196)。
关键词
口音转换
kNN回归
零样本学习
语音转换
声码器
accent conversion
kNN regression
zero-shot learning
voice conversion
vocoder