考虑帧间动态特征的音色变换算法被引量：1

Voice conversion using dynamic inter-frame features

导出

摘要基于G auss ian混合模型的音色变换算法在预测目标说话人频谱时会出现过平滑问题,导致声音转换结果的音质下降。该文分析了造成过平滑问题的原因,并提出一种考虑帧间动态特征的音色变换改进算法,在估计参数的目标函数中加入了连续性和方差的影响,从而改善了映射结果的帧间连续性,并使方差最大化,克服了过平滑现象。实验表明该算法在保证变换结果的目标倾向性的同时,能够使变换语音的音质主观意见得分由3.11提高到3.89,证明动态特征对提高音色变换的音质有重要意义。 In conventional Gaussian mixture model （GMM）-based voice conversion systems, speech quality of converted utterances is degraded by over-smoothing of the predicted spectrum. A conversion method using dynamic inter-frame features was developed to alleviate the over-smoothing by taking account the continuity and variations of the object function. As a result, the predicted features are continuous and the variance is maximized into one syllable. Experimental results show that the method improves the opinion score of converted speech quality from 3.11 to 3.89, while effectively changing the speaker＇s individuality which shows that the dynamic features are important for quality voice conversion.

作者张晓洲黄德智蔡莲红

机构地区清华大学计算机科学与技术系普适计算教育部重点实验室北京法国电信研发中心有限公司

出处《清华大学学报（自然科学版）》 EI CAS CSCD 北大核心 2006年第10期1767-1770,1775,共5页 Journal of Tsinghua University(Science and Technology)

基金国家自然科学基金资助项目(60275014)

关键词音色变换 Gaussian混合模型动态特征方差 voice conversion GMM （Gaussian mixture model） dynamic feature variance

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献7

1左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量：32
2Stylianou Y,Cappe O,Moulines E.Continuous probabilistic transform for voice conversion[J].IEEE Trans Speech and Audio Proc,1998,6:131-142.
3Kawahara H,Masuda-katsuse I,De Cheveign A.Restructuring speech representations using a pitchadaptive time-frequency smoothing and an instantaneous frequency-based f0 extraction:Possible role of a repetitive structure in sounds[J].Speech Communication,1999,27:187-207.
4牟晓隆,胡起秀,吴文虎.与文本无关的复合策略说话人辨识系统[J].清华大学学报（自然科学版）,1997,37(3):16-19. 被引量：6
5Toda T,Saruwatari H,Shikano K.Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum[C]∥ Proc ICASSP,IEEE International Conference on Acoustics,Speech and Signal Processing,2001,2:841-844.
6CHEN Yining,CHU Min,Chang E,et al,Voice conversion with smoothed GMM and MAP adaptation[C]∥ Proc Eurospeech,Geneva,Switzerland,2003:24132416.
7Toda T,Black A W,Tokuda K.Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter[C]∥ Proc ICASSP,IEEE International Conference on Acoustics,Speech and Signal Processing,Philadelphia,USA,2005:9-12.

二级参考文献56

1H Kuwabara and Y Sagisaka.Acoustic characteristics of speaker individuality:control and conversion[J].Speech Communication.1995,16(2):165-173.
2D Klatt and L C Klatt.Analysis,synthesis,and perception of voice quality variations among female and male talkers[J].J Acoust Soc Am,1990,87(2):820-857.
3P H Milenkovic.Voice source model for continuous control of pitch period[J].J Acoust Soc Am,1993,93(2):1087-1096.
4H Matsumoto,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans Audio and Electroacoustics,1973,21(5):428-436.
5S Furui.Research on individuality features in speech waves and automatic speaker recognition techniques [J].Speech Communication,1986,5(2):183-197.
6K S Lee,et al.A new voice transformation based on both linear and nonlinear prediction[A].Proc ICSLP[C].Philadelphia,USA:ESCA,1996.1401-1404.
7L M Arslan.Speaker transformation algorithm using segmental codebooks (STASC)[J].Speech Communication,1999,28(3):211-226.
8H Mizuno and M Abe.Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt[J].Speech Communication.1995,16(2):165-173.
9T Yoshimura,et al.Speaker interpolation in HMM-based speech synthesis system[A].Proc.Eurospeech [C].Rhodes,Greece:ESCA,1997.2523-2526.
10D G Childers.Glottal source modeling for voice conversion [J].Speech Communication.1995,16 (2):127-138.

共引文献36

1吴梅,冯瑞杰.试论一种语音转换系统的设计与实现[J].中亚信息,2010(S1):61-63.
2左国玉,刘文举,阮晓钢.语音转换技术在电话语音识别中的应用研究(英文)[J].系统仿真学报,2005,17(2):448-452.
3左国玉,刘文举,阮晓钢.一种使用声调映射码本的汉语声音转换方法[J].数据采集与处理,2005,20(2):144-149. 被引量：4
4陆小珊,王俊法,田岚.音高特征在说话人识别中的可分性及应用研究[J].山东大学学报（工学版）,2005,35(4):56-58.
5符敏,程德福.支持向量回归在声音转换中的应用[J].电声技术,2006,30(3):45-48. 被引量：1
6康永国,双志伟,陶建华,张维.基于混合映射模型的语音转换算法研究[J].声学学报,2006,31(6):555-562. 被引量：13
7王海祥,戴蓓蒨,陆伟,张剑.基于共振峰参数和分类线性加权的源-目标声音转换[J].中国科学技术大学学报,2006,36(11):1153-1159.
8王海祥.基于RBF神经网络的源——目标话音转换[J].电子测量技术,2006,29(6):60-63.
9孙俊,戴蓓蒨,张剑.基于基元段特征和GMM的源-目标说话人F_0～t转换[J].信号处理,2007,23(2):283-287.
10王卉,王小军,马骏.基于CMOS工艺的音频前置放大器的设计与实现[J].电子器件,2007,30(3):870-873.

同被引文献13

1左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量：32
2孙俊,戴蓓蒨,张剑.基于GMM和概率修正码本的源-目标说话人声门波转换[J].数据采集与处理,2007,22(1):19-24. 被引量：2
3Abe M, Nakamura S, Shikano K. Voice Conversion Through Vector Quantization, Proc. Of ICASSP, 1988, ( 1 ) :655-658.
4Arslan L Speaker transformation algorithm using segment codebook. Speech Communication. 1999,28 (3) :211-226.
5Stylianou Y, Cappe o, Moulines E. Continuous Probabilistic Transformation for Voice Conversion. IEEE Tran. on Speech and Audio Processing, 1998,6 (2) : 131-142.
6Kain A. , Macon M.W. Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction. Proc. Of ICASSP,2001, (2) :813- 816.
7Quafiefi T.F.离散时间语音信号处理-原理与应用.北京:电子工业出版社,2004.
8Toda T, Saruwatari H, Shikano K. Voice conversion algo-rithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum . Proc. Of ICASSP, 2001, (2) : 841- 844.
9CHEN Yining, CHU Min, Chang E. Voice conversion with smoothed GMM and MAP adaptation [ C ] Proc of Euro-speech, Geneva, Switzerland, 2003, ( 1 ) : 2413- 2416.
10Toda T, Black A W , Tokuda K. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter, Proc. Of ICASSP, 2005, (1) : 9-12.

引证文献1

1张炳,俞一彪.基于改进GMM和韵律联合短时谱的说话人转换[J].信号处理,2009,25(4):548-552. 被引量：2

二级引证文献2

1李力,俞一彪.采用超音段韵律特征联合短时频谱的语音转换[J].信号处理,2012,28(2):289-294. 被引量：3
2刘永俊,张立飞,刘巍.面向噪声环境下医疗语音信号端点检测方法[J].常熟理工学院学报,2017,31(4):75-79. 被引量：1

1赵永刚,唐昆,崔慧娟.基于Gaussian混合模型的LSF参数量化方法[J].清华大学学报（自然科学版）,2006,46(10):1727-1730. 被引量：2
2李燕诚,崔慧娟,唐昆.基于似然比测试的语音激活检测算法[J].计算机工程,2009,35(10):214-216. 被引量：5
3王伟明,王敏,杨科峰.一种基于误码率分析的无线通信网干扰效能评估方法[J].舰船电子对抗,2012,35(2):75-78. 被引量：1
4张楹,封志敏,鲍晓.基于海量扫频数据的用户感知提升策略[J].电信工程技术与标准化,2014,27(7):46-51. 被引量：3
5携号转网政策倾向性明显用户受益多少尚难确定[J].中国新通信,2010(22):63-64.
6小寺信良.NAB2011看点很多[J].数码影像时代,2011(5):24-25.
7陈庆鸿,赵知劲,吴杰.MBE声码器的基音估计研究[J].杭州电子工业学院学报,2004,24(1):72-75. 被引量：4
8赵庆平,姜恩华,李素文.有纹理保护的SAR海冰图像分割[J].安徽大学学报（自然科学版）,2014,38(3):61-67. 被引量：4
9刘金骐,王喆.关于电视体育解说倾向性的探讨[J].当代电视,2006(7):34-36. 被引量：8
10黄草坪.网络媒体更需要信任[J].中国信息界,2004(11X):28-29.

清华大学学报（自然科学版）

2006年第10期

浏览历史

内容加载中请稍等...

考虑帧间动态特征的音色变换算法被引量：1

参考文献7

二级参考文献56

共引文献36

同被引文献13

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

考虑帧间动态特征的音色变换算法 被引量：1

参考文献7

二级参考文献56

共引文献36

同被引文献13

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

考虑帧间动态特征的音色变换算法被引量：1