This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using A...This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.展开更多
A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency ba...A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.展开更多
A coding method of speech compression, which is based on Wavlet Transform and Vector Quantization (VQ), is developed and studied. The Wavlet Thansform or Wavlet Packet Thansform is used to process the speech signal, t...A coding method of speech compression, which is based on Wavlet Transform and Vector Quantization (VQ), is developed and studied. The Wavlet Thansform or Wavlet Packet Thansform is used to process the speech signal, then VQ is used to compress the coefficients of Wavlet Thansform, and the entropy coding is used to decrease the bit rate. The experimental results show that the speech signal, sampled by 8 kHz sampling rate and 8 bit quatisation,i.e., 64 kbit/s bit rate, can be compressed to 6 - 8 kbit/s, and still have high speech quality,and the low-delay, only 8 ms.展开更多
基金Supported by the National Natural Science Foundation of China (No. 60872105)the Program for Science & Technology Innovative Research Team of Qing Lan Project in Higher Educational Institutions of Jiangsuthe Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD)
文摘This paper improves and presents an advanced method of the voice conversion system based on Gaussian Mixture Models(GMM) models by changing the time-scale of speech.The Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum(STRAIGHT) model is adopted to extract the spectrum features,and the GMM models are trained to generate the conversion function.The spectrum features of a source speech will be converted by the conversion function.The time-scale of speech is changed by extracting the converted features and adding to the spectrum.The conversion voice was evaluated by subjective and objective measurements.The results confirm that the transformed speech not only approximates the characteristics of the target speaker,but also more natural and more intelligible.
基金supported by the National Natural Science Foundation of China(61271359,61071215)Suzhou Science and Technology Development Plan(SYG201001)Key Joint Laboratory of Soochow University and JieMei Biomedical Engineering Instrument
文摘A method of conversion from whispered speech to normal speech using the extended bilinear transformation was proposed. On account of the different deviation degrees of the whisper's formants in different frequency bands, the spectrum of the whispered speech will be processed in the separate partitions of this paper. On the basis of this spectrum, we will establish a conversion function able to usefully convert whispered speech to normal speech. Because of the whisper's non-linear offset in relation to normal speech, this paper introduces an expansion factor in the bilinear transform function making it correspond more closely to the actual conversion demands of whispered speech to normal speech. The introduction of this factor takes the non-linear move of the spectrum and the compression of the formant bandwidth into consideration, thus effectively reducing the spectrum distortion distance in the conversion. The experiment results show that the conversion presented in this paper effectively improves both the sound quality and the intelligibility of whispered speech.
文摘A coding method of speech compression, which is based on Wavlet Transform and Vector Quantization (VQ), is developed and studied. The Wavlet Thansform or Wavlet Packet Thansform is used to process the speech signal, then VQ is used to compress the coefficients of Wavlet Thansform, and the entropy coding is used to decrease the bit rate. The experimental results show that the speech signal, sampled by 8 kHz sampling rate and 8 bit quatisation,i.e., 64 kbit/s bit rate, can be compressed to 6 - 8 kbit/s, and still have high speech quality,and the low-delay, only 8 ms.