期刊文献+

声门下共鸣的谱规整用于非特定人的语音识别

Spectrum warping based on sub-glottal resonances in speaker-independent speech recognition
原文传递
导出
摘要 提出在参数的提取过程中用不同的感知规整因子对不同人的参数归一化,从而实现在非特定人语音识别中对不同人的归一化处理。感知规整因子是基于声门上和声门下之间耦合作用产生声门下共鸣频率来估算的,与采用声道第三共振峰作为基准频率的方法比较,它能较多的滤除语义信息的影响,更好地体现说话人的个性特征。本文提取抗噪性能优于Mel倒谱参数的感知最小方差无失真参数作为识别特征,语音模型用经典的隐马尔可夫模型(HMM)。实验证明,本文方法与传统的语音识别参数和用声道第三共振峰进行谱规整的方法相比,在干净语音中单词错误识别率分别下降了4%和3%,在噪声环境下分别下降了9%和5%,有效地改善了非特定人语音识别系统的性能。 In an effort to reduce the degradation caused by variation of different speaker in speech recognition,a new perceptual frequency warping based on sub-glottal resonances to speaker normalization is investigated.A new warping factor is extracted from the second sub-glottal resonance that is based on acoustic coupling between the sub-glottal and vocal tract.Second sub-glottal resonance is independent of the speech content,and it embodiment speaker character more than the third format.Then it is used to normalize the PMVDR coefficients,which are a speech coefficients based on perceptual Minimum variance distortionless response(PMDVR) and is more robustness and anti-noise than traditional MFCC,utilizing the normalized coefficients to speech mode training and recognition. The experiments show that the word error rate comparing with mel frequency cepstrum and the spectrum warping by third formant decreases 4%and 3%in clean speech recognition,9%and 5%in noise speech recognition.The results demonstrate this method to improve word recognition accuracy of speaker independent recognition system.
出处 《声学学报》 EI CSCD 北大核心 2010年第5期580-586,共7页 Acta Acustica
  • 相关文献

参考文献12

  • 1李虎生,刘加,刘润生.语音识别说话人自适应研究现状及发展趋势[J].电子学报,2003,31(1):103-108. 被引量:32
  • 2Rohit Sinha, Umesh S. A shift-based approach to speaker normalization using non-linear frequency-scaling model. Speech Communication, 2008; 50(3): 191--202.
  • 3Bharath Kumar S V, Umesh S, Sinha R. Study of non-linear frequency warping functions for speaker normalization. Acoustics, Speech and Signal Processing, ICASSP 2006; 1: I- 1245--I-1248.
  • 4CHI Xuemin, Morgan Sonderegger. Subglottal coupling and its influence on vowel formants. J. Acoust. Soc. Am., 2007; 122(3):1735--1745.
  • 5Lulich S M. A role for the second subglottal resonance in lexical access. J. Acoust. Soc. Am., 2007; 122(4): 2320--2327.
  • 6Wang S, Alwan A, Lulich S M. Speaker normalization based on subglottal resonances. ICASSP, 2008:4277-4280.
  • 7Wang S, Lulich S M, Alwan A. A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation. Interspeech, 2008:1717--1720.
  • 8Murthi M N, Rao B D. All-pole modeling of speech based on the minimum variance distortionless response spectrum. IEEE Trans. Acoustic Speech Signal Process., 2000; 8(3): 221--239.
  • 9Yapanel U H, Satya Dharanipragada. Perceptual MVDR-based cepstral coefficients (PMCCs) for noise robust speech recognition. IEEE ICASSP, 2003; 1:644 -647.
  • 10Yapanel U H, Hansen J H L. A new perceptually motivated MVDR- based acoustic front-end (PMVDR) for robust automatic speech recognition. Speech Communication, 2008; 50(2): 142--152.

二级参考文献19

  • 1王韫佳.音高和时长在普通话轻声知觉中的作用[J].声学学报,2004,29(5):453-461. 被引量:33
  • 2张昊天.[D].北京:清华大学电子工程系,2000.
  • 3王欢良,钱瑶,F.K.Soong,韩纪庆.基于声调建模的带噪汉语数字串语音识别[J].声学学报,2007,32(5):454-460. 被引量:2
  • 4Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, 2000; 3:1523-1526
  • 5Lei X, S M, Hwang M, Ostendorf M et al. Improved tone modeling for mandarin broadcast news speech recognition. In: Proceedings of Interspeech (ICSLP), Pittsburgh, USA, 2006:1277-1280
  • 6Wang H L, Qian Y, Soong F K, Zhou J L et al. Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone models. In: Proceedings of International Symposium on Chinese Spoken Language Processing, 2006: 445-443
  • 7Yang W J, Lee J C, Chang Y C et al. Hidden Markov Model for Mandarin lexical tone recognition. IEEE Trans. on Acoustic Speech and Signal Processing, 1988; 36(7): 988-992
  • 8Thubthong N, Kijsirikul B, Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2001; 9(6): 815-825
  • 9CAO Yang, ZHANG Shu Wu, HUANG Tai Yi et al. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 2004; 7(2-3): 115-128
  • 10Wong P F, Siu M H. Decision tree based tone modeling for Chinese speech recognition. Proceedings of the 29th International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada, 2004; 1:905-908

共引文献32

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部