期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
基于联合微调CLIP和Fastspeech2的盲文图像-语音生成
1
作者 孙恩威 徐春 《计算机时代》 2025年第5期28-34,39,共8页
为提升视障人士的阅读效率,构建了一种适用于中文语言场景下的盲文图像-语音转换框架:CLIPViT-H/14-KNN-FastSpeech2。采取先独立预训练再联合微调的策略:首先,将中文CLIP和FastSpeech2文本转语音模型在公开数据集中分别预训练并验证其... 为提升视障人士的阅读效率,构建了一种适用于中文语言场景下的盲文图像-语音转换框架:CLIPViT-H/14-KNN-FastSpeech2。采取先独立预训练再联合微调的策略:首先,将中文CLIP和FastSpeech2文本转语音模型在公开数据集中分别预训练并验证其收敛性;然后,在此基础上利用盲文图像数据集进行联合微调。实验结果表明:模型在PER等指标上均有所提高,验证了模型在有限数据下仍具备合成高质量语音的能力以及联合训练策略的有效性。 展开更多
关键词 盲文图像 图像-语音转换 CLIP fastspeech2 联合微调
在线阅读 下载PDF
Study of Prosody Enhancement of FastSpeech2 Speech Synthesis System Based on BERT
2
作者 WEI Yi ZHAO Si-jia SI Zhan-jun 《印刷与数字媒体技术研究》 2025年第6期303-314,共12页
The traditional FastSpeech2 has high generation efficiency and speech naturalness,but it still has limitations in metrical modeling,especially in the lack of effective linkage between semantics and metre.To enhance th... The traditional FastSpeech2 has high generation efficiency and speech naturalness,but it still has limitations in metrical modeling,especially in the lack of effective linkage between semantics and metre.To enhance the performance of synthesized speech in terms of rhythmic expression,ProsodySpeech speech synthesis system that incorporates BERT pre-trained language model was proposed in this study.By introducing the Pre-trained Language Model Adapter(PLM Adapter)and the Semantic-Prosody Mapping Network(SPMN),and by fully utilizing the deep semantic information extracted by BERT,the system enhanced its control over rhythmic features such as pitch,energy,and duration.The proposed model achieved effective alignment and mapping between semantic information and prosody parameters by introducing a shared semantic processing layer,a global self-attention mechanism,and a specially designed prosody mapping branch.Experimental results showed that the model proposed in this study outperforms VITS and StyleTTS2 in terms of Mean Opinion Score(MOS),and the synthesized speech has a more obvious advantage in terms of rhythmic naturalness and expressive richness,which verified the effectiveness of the proposed model in enhancing the expression of speech rhythms,and the synthesized speech is closer to the expression of natural human speech. 展开更多
关键词 Speech synthesis BERT fastspeech2 Prosody enhancement
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部