期刊文献+

Stream Weight Training Based on MCE for Audio-Visual LVCSR 被引量:1

Stream Weight Training Based on MCE for Audio-Visual LVCSR
原文传递
导出
摘要 In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re- scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental re- sults show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments. In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re- scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental re- sults show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments.
作者 刘鹏 王作英
出处 《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期141-144,共4页 清华大学学报(自然科学版(英文版)
基金 Supported by the National High-Tech Research and Development (863) Program of China (No. 863-306-ZD03-01-2)
关键词 audio-visual speech recognition (AVSR) large vocabulary continuous speech recognition (LVCSR) discriminative training minimum classification error (MCE) audio-visual speech recognition (AVSR) large vocabulary continuous speech recognition (LVCSR) discriminative training minimum classification error (MCE)
  • 相关文献

同被引文献11

  • 1刘鹏,王作英.多模式汉语连续语音识别中视觉特征的提取和应用[J].中文信息学报,2004,18(4):79-84. 被引量:6
  • 2谢磊,付中华,蒋冬梅,赵荣椿,Werner Verhelst,Hichem Sahli,Jan Conlenis.一种稳健的基于VisemicLDA的口形动态特征及听视觉语音识别[J].电子与信息学报,2005,27(1):64-68. 被引量:4
  • 3CHEN T H.Audiovisual speech processing[J].IEEE Signal Processing Magazine,2001,18(1):9-21.
  • 4NETI C,POTAMIANOS G,LUETTIN J,et al.Audio-visual speech recognition,Final Workshop Report[R].[S.l.]:Center for Language and Speech Processing,2000.
  • 5MIYAJIMA C,TOKUDA K,KITAMURA T.Audio-visual speech recognition using MCE-based HMMs and model-dependent stream weights[C]//Proc of ICSLP2000.2000:1023-1026.
  • 6NAKAMURA S,ITO H,SHIKANO K.Stream weight optimization of speech and lip image sequence for audio-visual speech recognition[C]//Proc of ICSLP2000.2000:20-24.
  • 7POTAMIANOS G,GRAF H P.Discriminative training of HMM stream exponents for audio-visual speech recognition[C]//Proc of Int Conf Acoust Speech Signal Process.Seattle:[s.n.],1998:3733-3736.
  • 8TAMURA S,IWANO K,FURUI S.A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs[C]//Proc of ICASSP2004.Montreal:[s.n.],2004:857-860.
  • 9CHOW Y L.Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm[C]//Proc of IEEE Intl Conf Acoust,Speech,Signal Processing.1990:701-704.
  • 10ZHANG Xiao-zheng,MERSEREAU R M,CLEMENTS M.Bimodal fusion in audio-visual speech recognition[C]//Proc of International Conference on Image Processing.2002:964-967.

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部