Stream Weight Training Based on MCE for Audio-Visual LVCSR 被引量：1

Stream Weight Training Based on MCE for Audio-Visual LVCSR

导出

摘要 In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re- scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental re- sults show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments. In this paper we address the problem of audio-visual speech recognition in the framework of the multi-stream hidden Markov model. Stream weight training based on minimum classification error criterion is discussed for use in large vocabulary continuous speech recognition (LVCSR). We present the lattice re- scoring and Viterbi approaches for calculating the loss function of continuous speech. The experimental re- sults show that in the case of clean audio, the system performance can be improved by 36.1% in relative word error rate reduction when using state-based stream weights trained by a Viterbi approach, compared to an audio only speech recognition system. Further experimental results demonstrate that our audio-visual LVCSR system provides significant enhancement of robustness in noisy environments.

作者刘鹏王作英

机构地区 Department of Electronic Engineering

出处《Tsinghua Science and Technology》 SCIE EI CAS 2005年第2期141-144,共4页 清华大学学报（自然科学版（英文版）

基金 Supported by the National High-Tech Research and Development (863) Program of China (No. 863-306-ZD03-01-2)

关键词 audio-visual speech recognition (AVSR) large vocabulary continuous speech recognition (LVCSR) discriminative training minimum classification error (MCE) audio-visual speech recognition (AVSR) large vocabulary continuous speech recognition (LVCSR) discriminative training minimum classification error (MCE)

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

同被引文献11

1刘鹏,王作英.多模式汉语连续语音识别中视觉特征的提取和应用[J].中文信息学报,2004,18(4):79-84. 被引量：6
2谢磊,付中华,蒋冬梅,赵荣椿,Werner Verhelst,Hichem Sahli,Jan Conlenis.一种稳健的基于VisemicLDA的口形动态特征及听视觉语音识别[J].电子与信息学报,2005,27(1):64-68. 被引量：4
3CHEN T H.Audiovisual speech processing[J].IEEE Signal Processing Magazine,2001,18(1):9-21.
4NETI C,POTAMIANOS G,LUETTIN J,et al.Audio-visual speech recognition,Final Workshop Report[R].[S.l.]:Center for Language and Speech Processing,2000.
5MIYAJIMA C,TOKUDA K,KITAMURA T.Audio-visual speech recognition using MCE-based HMMs and model-dependent stream weights[C]//Proc of ICSLP2000.2000:1023-1026.
6NAKAMURA S,ITO H,SHIKANO K.Stream weight optimization of speech and lip image sequence for audio-visual speech recognition[C]//Proc of ICSLP2000.2000:20-24.
7POTAMIANOS G,GRAF H P.Discriminative training of HMM stream exponents for audio-visual speech recognition[C]//Proc of Int Conf Acoust Speech Signal Process.Seattle:[s.n.],1998:3733-3736.
8TAMURA S,IWANO K,FURUI S.A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs[C]//Proc of ICASSP2004.Montreal:[s.n.],2004:857-860.
9CHOW Y L.Maximum mutual information estimation of HMM parameters for continuous speech recognition using the N-best algorithm[C]//Proc of IEEE Intl Conf Acoust,Speech,Signal Processing.1990:701-704.
10ZHANG Xiao-zheng,MERSEREAU R M,CLEMENTS M.Bimodal fusion in audio-visual speech recognition[C]//Proc of International Conference on Image Processing.2002:964-967.

引证文献1

1秦伟,韦岗.多数据流隐马尔可夫模型的流权值优化方法[J].计算机应用研究,2007,24(11):100-102.

1YANG Zhanlei LIU Wenju CHAO Hao.Integrating induced probability into decoding for large vocabulary continuous speech recognition[J].Chinese Journal of Acoustics,2012,31(3):338-352. 被引量：2
2QIAN Yanmin XU Ji LIU Jia.Multi-Stream Posterior Features and Combining Subspace Gmms for Low Resource LVCSR[J].Chinese Journal of Electronics,2013,22(2):291-295. 被引量：2
3郭奋卓,秦素娟,温巧燕,朱甫臣.Cryptanalysis and Improvement of Two GHZ-State-Based QSDC Protocols[J].Chinese Physics Letters,2010,27(9):34-37. 被引量：1
4吴尊敬,曹志刚.Improved MFCC-Based Feature for Robust Speaker Identification[J].Tsinghua Science and Technology,2005,10(2):158-161. 被引量：7
5Liguo Shi.Application of Task-based Teaching Method to College Audio-visual English Teaching[J].International Journal of Technology Management,2015(9):65-67.
6刘迪源,郭武.基于区分性准则的Bottleneck特征及其在LVCSR中的应用[J].数据采集与处理,2016,31(2):331-337. 被引量：2
7信号处理、分析与设计[J].电子科技文摘,2000(11):53-54.
8赵贤宇 Ou Zhijian Wang Zuoying.Using vector Taylor series with noise clustering for speech recognition in non-stationary noisy environments[J].High Technology Letters,2006,12(1):18-23.
9樊占军,高超.SCTP协议在NAT技术上的应用研究[J].网络安全技术与应用,2014(2):47-48.
10张磊,陈晶,项学智,贾梅梅.结合关键词混淆网络的关键词检出系统[J].智能系统学报,2010,5(5):432-435. 被引量：2

Tsinghua Science and Technology

2005年第2期

浏览历史

内容加载中请稍等...

Stream Weight Training Based on MCE for Audio-Visual LVCSR 被引量：1

同被引文献11

引证文献1

相关作者

相关机构

相关主题

浏览历史