Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable thresh...Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable threshold function for speech enhancement was presented. Firstly, the function adopted narrow threshold areas, preserved the smaller signal speech, and improved the speech quality; secondly, based on the properties of the continuous differentiable and non-fixed deviation, each area function was attained gradually by using the method of mathematical derivation. It ensured that enhanced speech was continuous and smooth; it removed the auditory oscillation distortion; finally, combined with the Bark wavelet packets, it further improved human auditory perception. Experimental results show that the segmental SNR and PESQ (perceptual evaluation of speech quality) of the enhanced speech using this method increase effectively, compared with the existing speech enhancement algorithms based on wavelet threshold.展开更多
Many speech enhancement algorithms that deal with noise reduction are based on a binary masking decision(termed as the hard decision), which may cause some regions of the synthesized speech to be discarded. In view of...Many speech enhancement algorithms that deal with noise reduction are based on a binary masking decision(termed as the hard decision), which may cause some regions of the synthesized speech to be discarded. In view of the problem, a soft decision is often used as an optimal technique for speech restoration. In this paper, considering a new fashion of speech and noise models, we present two model-based soft decision techniques. One technique estimates a ratio mask generated by the exact Bayesian estimators of speech and noise. For the second technique, we consider one issue that an optimum local criterion(LC) for a certain SNR may not be appropriate for other SNRs. So we estimate a probabilistic mask with a variable LC. Experimental results show that the proposed method achieves a better performance than reference methods in speech quality.展开更多
A new speech enhancement method, which performs soft thresholding de noising in wavelet domain to suppress the acoustic background noise, was proposed. In order to prevent the quality degradation of the unvoiced sou...A new speech enhancement method, which performs soft thresholding de noising in wavelet domain to suppress the acoustic background noise, was proposed. In order to prevent the quality degradation of the unvoiced sound during the de noising process, the unvoiced regions were separated from the noisy speech signal and treated in a different way from the voiced regions. By doing such pre processing, the noise components of the corrupted speech can be effectively suppressed, while the important information in the unvoiced regions is not distorted. The simulation results demonstrate the validity of this method.展开更多
针对时频谱模型估计语音不准确的问题,文中提出采用模型变换的方式来获得噪声和语音的对数概率密度函数,同时借助带噪语音、干净语音和噪声之间的对数关系并结合最小均方误差(Minimum Mean Square Error,MMSE)估计理论推导出估计语音对...针对时频谱模型估计语音不准确的问题,文中提出采用模型变换的方式来获得噪声和语音的对数概率密度函数,同时借助带噪语音、干净语音和噪声之间的对数关系并结合最小均方误差(Minimum Mean Square Error,MMSE)估计理论推导出估计语音对数谱的时频掩模。基于语音和噪声的对数概率分布推导出了一种软掩模,该软掩模可对带噪语音的对数子带进行加权以降低噪声,提高语音估计的准确性。仿真结果表明,与未处理的含噪语音相比,所提方法在噪声抑制方面具有3 dB以上的提升,基于最小均方误差的时频掩模和软掩模在听觉感知方面的平均提升量分别为27.7%和29.4%,在可懂度方面的平均提升量分别为12.7%和14.3%。展开更多
基金Project(61072087) supported by the National Natural Science Foundation of ChinaProject(2011-035) supported by Shanxi Province Scholarship Foundation, China+2 种基金Project(20120010) supported by Universities High-tech Foundation Projects, ChinaProject (2013021016-1) supported by the Youth Science and Technology Foundation of Shanxi Province, ChinaProjects(2013011016-1, 2012011014-1) supported by the Natural Science Foundation of Shanxi Province, China
文摘Enhanced speech based on the traditional wavelet threshold function had auditory oscillation distortion and the low signal-to-noise ratio (SNR). In order to solve these problems, a new continuous differentiable threshold function for speech enhancement was presented. Firstly, the function adopted narrow threshold areas, preserved the smaller signal speech, and improved the speech quality; secondly, based on the properties of the continuous differentiable and non-fixed deviation, each area function was attained gradually by using the method of mathematical derivation. It ensured that enhanced speech was continuous and smooth; it removed the auditory oscillation distortion; finally, combined with the Bark wavelet packets, it further improved human auditory perception. Experimental results show that the segmental SNR and PESQ (perceptual evaluation of speech quality) of the enhanced speech using this method increase effectively, compared with the existing speech enhancement algorithms based on wavelet threshold.
基金supported by the National Natural Science Foundation of China (Grant No.61471014,61231015)
文摘Many speech enhancement algorithms that deal with noise reduction are based on a binary masking decision(termed as the hard decision), which may cause some regions of the synthesized speech to be discarded. In view of the problem, a soft decision is often used as an optimal technique for speech restoration. In this paper, considering a new fashion of speech and noise models, we present two model-based soft decision techniques. One technique estimates a ratio mask generated by the exact Bayesian estimators of speech and noise. For the second technique, we consider one issue that an optimum local criterion(LC) for a certain SNR may not be appropriate for other SNRs. So we estimate a probabilistic mask with a variable LC. Experimental results show that the proposed method achieves a better performance than reference methods in speech quality.
文摘A new speech enhancement method, which performs soft thresholding de noising in wavelet domain to suppress the acoustic background noise, was proposed. In order to prevent the quality degradation of the unvoiced sound during the de noising process, the unvoiced regions were separated from the noisy speech signal and treated in a different way from the voiced regions. By doing such pre processing, the noise components of the corrupted speech can be effectively suppressed, while the important information in the unvoiced regions is not distorted. The simulation results demonstrate the validity of this method.
文摘针对时频谱模型估计语音不准确的问题,文中提出采用模型变换的方式来获得噪声和语音的对数概率密度函数,同时借助带噪语音、干净语音和噪声之间的对数关系并结合最小均方误差(Minimum Mean Square Error,MMSE)估计理论推导出估计语音对数谱的时频掩模。基于语音和噪声的对数概率分布推导出了一种软掩模,该软掩模可对带噪语音的对数子带进行加权以降低噪声,提高语音估计的准确性。仿真结果表明,与未处理的含噪语音相比,所提方法在噪声抑制方面具有3 dB以上的提升,基于最小均方误差的时频掩模和软掩模在听觉感知方面的平均提升量分别为27.7%和29.4%,在可懂度方面的平均提升量分别为12.7%和14.3%。