期刊文献+

基于共振峰参数和分类线性加权的源-目标声音转换

Voice conversion based on classified linearly weighted transformation of formant parameters
在线阅读 下载PDF
导出
摘要 源-目标说话人声音转换是一种变换说话人声音特征的技术,它将源说话人的声音转换成目标说话人的声音.其中,声道参数的转换是获得高质量重建语音的关键,所以选择声道共振峰参数作为待转换的特征参数,利用线性预测求根法提取共振峰参数.为了克服分类线性转换算法(CLT)中分类不准带来的误差,引入了分类线性加权转换的策略,给出了一种基于径向基函数神经网络的分类线性加权转换算法(WCLT).在微软汉语普通话语音数据库上对转换语音分别作了客观和主观评估,验证了分类数目和训练集对两种转换算法的影响.实验结果表明,WCLT算法的转换效果优于CLT算法,一定程度上克服了高斯混合模型的转换算法(GMM)转换语音时,频谱过分光滑的现象,并在只有较少训练集数据时也能得到较好的转换效果. Voice conversion is a method which transforms the source speech to a speech signal with the acoustic characteristics of the target speaker. The vocal-tract mapping algorithm is the key part, so formant parameters which are estimated by the root finding method based on LP analysis, are chosen for the transformation parameters. A classified linearly weighted transformation based on a radial basis function neural network was presented to reduce transformation error caused by inaccurate classification of classified linearly transformation. Objective evaluations and subjective evaluations were conducted in MSRA Mandarin speech database, and some experiments about the number of class and the training data were carried out. Experimental results prove that WCLT has a better performance than CLT, which can overcome the excessive smoothness of GMM, and the performance of WCLT has little bearing on training data.
出处 《中国科学技术大学学报》 CAS CSCD 北大核心 2006年第11期1153-1159,共7页 JUSTC
关键词 声音转换 共振峰参数 径向基函数神经网络 分类线性转换 分类线性加权转换 voice conversion formant parameters radial basis function neural network classified linearly transformation classified linearly weighted transformation
  • 相关文献

参考文献16

  • 1Moulines E,Sagiska Y.Voice conversion:state of the art and perspectives[J].Speech Communication,1995,16(2):125-126.
  • 2左国玉,刘文举,阮晓钢.声音转换技术的研究与进展[J].电子学报,2004,32(7):1165-1172. 被引量:32
  • 3Valbret H,Moulines E,Tubach J P.Voice conversion using PSOLA technique[J].Speech Communication,1992,11(2 3):175-187.
  • 4Stylianou Y,Cappe O,Moulines E.Statistical methods for voice quality transformation[C]//Proc.Eurospeech.Madrid,Spain,1995:447-450.
  • 5Kain A,Macon M.Spectral voice conversion for text-to-speech synthesis[C]//Proceedings of ICASSP.Seattle,USA,1998,1:285-288.
  • 6Klatt D H,Klatt L C.Analysis,synthesis and perception of voice quality variations among female and male talkers[J].The Journal of the Acoustical Society of America,1990,87(2):820-857.
  • 7Matsumoto H,Hiki S,Sone T,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans.Audio and Electroacoustics,1973,21 (5):428-436.
  • 8Hsiao Y S,Childers D G.A modified root-finding formant estimation algorithm based on LP analysis[C]//Proceedings of the IASTED International Conf.On Signal and Image Processing,Florida.1996,11:30-33.
  • 9Furui S.Digital Speech Processing,Synthesis,and Recognition[M].New York:Marcel Dekker Inc.,1989.
  • 10Haykin S.Neural Networks:A Comprehensive Foundation[M].2ed..New Jersey:Prentice Hall,1998.

二级参考文献56

  • 1H Kuwabara and Y Sagisaka.Acoustic characteristics of speaker individuality:control and conversion[J].Speech Communication.1995,16(2):165-173.
  • 2D Klatt and L C Klatt.Analysis,synthesis,and perception of voice quality variations among female and male talkers[J].J Acoust Soc Am,1990,87(2):820-857.
  • 3P H Milenkovic.Voice source model for continuous control of pitch period[J].J Acoust Soc Am,1993,93(2):1087-1096.
  • 4H Matsumoto,et al.Multidimensional representation of personal quality of vowels and its acoustical correlates[J].IEEE Trans Audio and Electroacoustics,1973,21(5):428-436.
  • 5S Furui.Research on individuality features in speech waves and automatic speaker recognition techniques [J].Speech Communication,1986,5(2):183-197.
  • 6K S Lee,et al.A new voice transformation based on both linear and nonlinear prediction[A].Proc ICSLP[C].Philadelphia,USA:ESCA,1996.1401-1404.
  • 7L M Arslan.Speaker transformation algorithm using segmental codebooks (STASC)[J].Speech Communication,1999,28(3):211-226.
  • 8H Mizuno and M Abe.Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt[J].Speech Communication.1995,16(2):165-173.
  • 9T Yoshimura,et al.Speaker interpolation in HMM-based speech synthesis system[A].Proc.Eurospeech [C].Rhodes,Greece:ESCA,1997.2523-2526.
  • 10D G Childers.Glottal source modeling for voice conversion [J].Speech Communication.1995,16 (2):127-138.

共引文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部