摘要
基于G auss ian混合模型的音色变换算法在预测目标说话人频谱时会出现过平滑问题,导致声音转换结果的音质下降。该文分析了造成过平滑问题的原因,并提出一种考虑帧间动态特征的音色变换改进算法,在估计参数的目标函数中加入了连续性和方差的影响,从而改善了映射结果的帧间连续性,并使方差最大化,克服了过平滑现象。实验表明该算法在保证变换结果的目标倾向性的同时,能够使变换语音的音质主观意见得分由3.11提高到3.89,证明动态特征对提高音色变换的音质有重要意义。
In conventional Gaussian mixture model (GMM)-based voice conversion systems, speech quality of converted utterances is degraded by over-smoothing of the predicted spectrum. A conversion method using dynamic inter-frame features was developed to alleviate the over-smoothing by taking account the continuity and variations of the object function. As a result, the predicted features are continuous and the variance is maximized into one syllable. Experimental results show that the method improves the opinion score of converted speech quality from 3.11 to 3.89, while effectively changing the speaker's individuality which shows that the dynamic features are important for quality voice conversion.
出处
《清华大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2006年第10期1767-1770,1775,共5页
Journal of Tsinghua University(Science and Technology)
基金
国家自然科学基金资助项目(60275014)