基于内容特征kNN回归的零样本口音转换模型

Zero-Shot Accent Conversion Model Based on the kNN Regression of Content Features

下载PDF

导出

摘要口音转换(Accent Conversion,AC)旨在将源口音语音转换为目标口音语音,并保持源说话人音色和语音内容不变。现有的AC模型缺乏对训练数据分布以外的语音口音转换的泛化性。本文提出基于内容特征k-邻近(kNN)回归的零样本AC模型。一方面,采用WavLM第23层提取源和目标口音语音的内容特征,并利用kNN回归将源口音语音内容特征置换为目标口音语音及其最邻近的内容特征以实现口音转换;另一方面,为了保持转换后语音中源说话人音色,构建多说话人声码器对含有目标口音的语音内容特征和源说话人音色特征进行融合,以合成目标口音语音。该模型无需源口音语音参与训练,即可实现多种源口音到目标口音的转换。实验结果表明,该模型取得了比并行或非并行AC模型更好的客观与主观评价结果。 Accent Conversion(AC)aims to convert speech from the source accent to the target accent while preserving the source speaker's timbre and the speech content at the same time.Existing AC models cannot achieve good generalization capability for AC on speech that does not follow the distribution of the training data,as limits their applications seriously.To this end,a zero-shot AC model based on the kNN regression of speech content features is proposed.On the one hand,the 23rd layer of WavLM is adopted as the content encoder to extract the content features from both source and target accented speech,and kNN regression is employed to replace the source accented content feature with its nearest neighbors in the pool constructed by the target accented content features to achieve accent conversion.On the other hand,to preserve the source speaker's timbre in the converted speech,a multi-speaker vocoder is constructed to fuse the obtained target accented content features with the source speaker's timbre feature extracted by the speaker encoder to synthesize the speech with the target accent.In the proposed model,no source accented speech is required at the training stage,so it can convert various source accented speech to the target accented speech.That is,the proposed model achieves good generalization ability.Experimental results demonstrate that the proposed model achieves better objective and subjective evaluation results than available parallel or non-parallel AC models.

作者罗宜鑫陈宁薛宇航肖阳阳 LUO Yixin;CHEN Ning;XUE Yuhang;XIAO Yangyang(School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China;China Telecom Wanwei Information Technology Co.Ltd,Lanzhou 730000,China)

机构地区华东理工大学信息科学与工程学院中电万维信息技术有限责任公司

出处《华东理工大学学报(自然科学版)》北大核心 2025年第4期497-504,共8页 Journal of East China University of Science and Technology

基金国家自然科学基金面上项目(61771196)。

关键词口音转换 kNN回归零样本学习语音转换声码器 accent conversion kNN regression zero-shot learning voice conversion vocoder

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

1张红亮.浅谈园林植物美学原则及审美特征[J].现代园艺,2025,48(19):131-134.
2戴岭.城市公园植物净初级生产力的客观评估与主观感受差异——以上海世纪公园为例[J].未来城市设计与运营,2025(6):59-62.
3吕本修.王阳明论恶及其矫治[J].晓庄学院社会科学学报,2025,54(4):64-72.
4张智泉,毛叶凡.基于控制面和数据面分离的零信任架构数字化身份和动态访问控制设计[J].自动化与仪器仪表,2025(8):230-233.
5高笑瑜,闫伟平,蔡峥,刘颖,王丽嫄.某三甲医院门诊药房调配差错的回顾性分析与改进思考[J].首都食品与医药,2025,32(17):85-88.

华东理工大学学报(自然科学版)

2025年第4期

浏览历史

内容加载中请稍等...

基于内容特征kNN回归的零样本口音转换模型

相关作者

相关机构

相关主题

浏览历史