期刊文献+

用于噪声鲁棒性语音识别的子带能量规整感知线性预测系数 被引量:15

Sub-band power normalized perceptual linear predictive coefficients for robust automatic speech recognition
原文传递
导出
摘要 为了提高感知线性预测系数(PLP)在噪声环境下的识别性能,使用子带能量偏差减的方法,提出了一种基于子带能量规整的感知线性预测系数(SPNPLP)。PLP有效地集中了语音中的有用信息,在安静环境下自动语音识别系统使用PLP可以取得良好的识别率;但是在噪声环境中其识别性能急剧下降。通过使用能量偏差减的方法对PLP的子带能量进行规整,抑制背景噪声激励,提出了SPNPLP,增强自动语音识别系统在噪声环境下的鲁棒性。在一个语法大小为501的孤立词识别任务和一个大词表连续语音识别任务上做了测试,SPNPLP在这两个任务上,与PLP相比,汉字识别精度分别绝对提升了11.26%和9.2%。实验结果表明SPNPLP比PLP具有更好的噪声鲁棒性。 In order to improve the noise robustness of perceptual linear predictive (PLP) coefficients, one kind of features called sub-band power normalized perceptual linear predictive (SPNPLP) coefficients using power bias subtraction is presented. PLP captures the most useful information of speech and fits well with the assumptions used in hidden Markov models. Automatic speech recognition (ASR) systems with PLP have obtained satisfactory performance in benign environments. Nevertheless, performance of ASR drops dramatically in noisy environments. In this work, power bias subtraction that suppresses background excitation is introduced to normalize the sub-band power of PLP, and SPNPLP is proposed to increase the robustness of ASR against additive background noise. Recognition performances are evaluated on an isolated-word recognition task with 501 items and a large vocabulary continuous speech recognition (LVCSR) task. The average improvements upon the standard PLP are 11.26 and 9.2 respectively on these two tasks. The experimental results show that the proposed SPNPLP is consistently more robust than PLP.
出处 《声学学报》 EI CSCD 北大核心 2012年第6期667-672,共6页 Acta Acustica
基金 国家自然科学基金资助项目(10925419 90920302 10874203 60875014 61072124 11074275)
关键词 语音识别系统 线性预测系数 噪声鲁棒性 子带能量 感知 量规 噪声环境 连续语音识别 Continuous speech recognition Hidden Markov models
  • 相关文献

参考文献22

  • 1Gong Y. Speech recognition in noisy environments: A sur- vey. Speech Communication, 1995; 16:261--291.
  • 2Huang X, Hon H W. Spoken Language Processing: A Guide to Theory, Algorithm and System Development. Prentice Hall PTR, 2001.
  • 3Moreno P. Speech recognition in noisy environments. Ph.D. thesis, Carnegie Mellon University, 1996.
  • 4Gales M J F. The generation and use of regression class trees for MLLR adaptation. Cambridge University, Tech. Rep. CUED/FINFENG/TR263, 1996.
  • 5Varga A, Moore R. Hidden Markov model decomposition of speech and noise. ICASSP, 1990; 2:845--848.
  • 6Ghitza O. Temporal non-plaze information in the auditory- nerve firing patterns as a front-end for speech recognition in a noisy environment. Journal of Phonetics, 1988; 16: 109--123.
  • 7Gajic B, Paliwal K K. Robust speech recognition in noisy environments based on subband spectral centroid his- tograms. IEEE Trans. Audio, Speech, and Language Pro- cessing, 2006; 14:600----608.
  • 8De La Torre Aet al. Non-linear transformations of the feature space for robust speech recognition. ICASSP, 2006: 401--404.
  • 9Du J, Wang R H. Cepstral shape normalization (CSN) for robust speech recognition. ICASSP, 2008:4389--4392.
  • 10Honig F et al. Revising perceptual linear prediction (PLP). Eurospeech, 2005:2997--3000.

二级参考文献43

  • 1梁维谦,王国梁,刘加,刘润生.基于音素的发音质量评价算法[J].清华大学学报(自然科学版),2005,45(1):5-8. 被引量:12
  • 2付强,Peter Murphy,颜永红.一种基于联合源-滤波器模型优化的语音声门源模型估计方法[J].电子学报,2007,35(5):982-986. 被引量:3
  • 3魏思,刘庆升,胡郁.带方言口音普通话自动水平测试.第八届全国人机语音通讯学术会议,北京.2006:22-25.
  • 4Denes P, Pinson E. The speech chain. 2nd ed. Worth Publishers, New York, 1993.
  • 5Quatieri T F,赵胜辉等译.离散时间语音信号处理-原理与应用.北京:电子工业出版社,2004.
  • 6Rabinov C R, Kreiman B R, Gerratt et al. Comparing reliability of perceptual ratings of roughness and acoustic measure of jitter. Journal of Speech, Language and Hearing Research, 1995; 38(1): 26-32.
  • 7Titze I. Summary statement. Workshop on Acoustic Voice Analysis, Denver, Colorado, 1995.
  • 8Jamieson P V. Acoustic discrimination of pathological voice: Sustained vowels versus continuous speech. Journal of Speech, Language, and Hearing Research, 2001; 44: 327-339.
  • 9Lofqvist A. Inverse filtering as a tool in voice research and therapy. Logopedics Phoniatrics Vocology, 1991; 16:8-16.
  • 10Qiang Fu, Peter Murphy. A robust joint estimation algorithm of glottal source and vocal tract models. IEEE Trans. on Speech and Audio Processing, 2006; 14(2): 492-501.

共引文献3

同被引文献107

引证文献15

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部