双模态语音识别中乘积HMM权重系数与瞬时SNR的关系研究

Research of relationship between weight coefficient of product HMM and instantaneous SNR in bimodal speech recognition

下载PDF

导出

摘要在有噪声污染等复杂情况下,为了能够得到更高的语音识别率,提出了一种新的乘积隐马尔可夫模型(HMM)用于双模态语音识别,研究并确定了模型中权重系数与瞬时信噪比(SNR)之间的关系。该模型在独立训练音频和视频HMM的基础上,建立二维训练模型,并使用重估策略保证更高的准确性。同时引入广义几率递减(GPD)算法,调整音视频特征的权重系数。实验结果表明,提出的方法在噪声环境下体现出了良好稳定的识别性能。 In order to better realize speech recognition in complicated noise environment, a new Hidden Markov Model （HMM） was proposed. The relationship between weight coefficient of product HMM and instantaneous Signal Noise Ratio （SNR） was researched and confirmed. In this proposed model, a two-dimension training model was built based on independently trained audio-HMM and visual-HMM, and re-estimation strategy was used to obtain higher recognition accuracy. Generalized Probabilistic Descent （GPD） algorithm was introduced to adjust weight coefficient. The experimental results show that, the proposed bimodal recognition approach with adjusting weight coefficient exhibits good performance on speech recognition in noisy environment.

作者赵晖顾亚强唐朝京

机构地区国防科学技术大学电子科学与工程学院

出处《计算机应用》 CSCD 北大核心 2009年第B12期279-281,285,共4页 journal of Computer Applications

基金 "十一五"武器装备预研项目(51329060101)

关键词双模态语音识别乘积隐马尔可夫模型权重系数重估广义几率递减算法 bimodal speech recognition product Hidden Markov Model （HMM） weight coefficient re-estimatation Generalized Probabilistic Descent （GPD） algorithm

分类号 TN912.34 [电子电信—通信与信息系统] TP391.42 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1DENG J, BOUCHARD M, YAEP T H. Feature enhancement for noisy speech recognition with a time-variant linear predictive HMM structure [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2008, 16(5) : 891 -899.
2CUI X, ALWAN A. Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR [ J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(6) : 1161 - 1172.
3刘鹏,王作英.多模式汉语连续语音识别中视觉特征的提取和应用[J].中文信息学报,2004,18(4):79-84. 被引量：6
4DUPONT S, LUETTIN J. Audio-visual speech modeling for continuous speech recognition [ J]. IEEE Transactions on Multimedia, 2000, 2(3) : 141 - 151.
5LEE J S, PARK C H. Robust audio-visual speech recognition based on late integration [ J]. IEEE Transactions on Multimedia, 2008, 10 (5) : 767 -779.
6KUMATANI K, NAKAMURA S, SHIKANO K. An adaptive integration based on product HMM for audio-visual speech recognition [ C]// Proceedings of the 2001 IEEE International Conference on Multimedia and Expo. Tokyo, Japan: [ s. n. ], 2001:813 -816.
7ZHAO H, TANG C J, YU T. Fast thresholding segmentation for image with high noise [ C]// Proceedings of the 2008 IEEE International Conference on Information and Automation. Washington, DC: IEEE Computer Society, 2008:290 - 295.
8谢磊,蒋冬梅,Ilse Ravyse,赵荣椿,Hichem Sahli,Werner Verhelst,Jan Cornelis.双模型语音识别中的听视觉合成和模型同步异步性实验研究[J].西北工业大学学报,2004,22(2):171-175. 被引量：3

二级参考文献19

1Lippmann R P. Speech Recognition by Machines and Humans. Speech Communication, 1997, 22(1): 1-15
2Chibelushi C C, et al. A Review of Speech-Based Bimodal Recognition. IEEE Trans on Multimedia, 2002, 4(1) : 23-37
3Hall D L. Mathematical Techniques in Multisensor Data Fusion. Norwood: Artech House, 1992. 18-22
4Bourlard H, et al. Multi-stream Speech Recognition. Technical Report IDIAP-RR96-07, IDIAP, 1996
5Varga P, Moore R K. Hidden Markov Model Decomposition of Speech and Noise. Proc International Conference on Acoustics, Speech and Signal Processing, Albuquerque, USA: 1990, 845-848
6Young S J, et al. The HTK Book.http ://htk. eng. cam. ac. uk/docs/docs. shtml, 2002
7Ravyse I, Reinders M, Cornelis J, Sahli H. Eye Gesture Estimation. Proc Signal Processing Symposium of IEEE Benelux Signal Processing Chapter, Hilvarenbeek, The Netherlands: 2000, 4- 7
8Gravier G, Potamianos G, Neti C. Asynchrony Modeling for Audio-Visual Speech Recognition. Proc Human Language Technology Conference, SanDiego, USA: 2002, 325-328
9Tsuhan Chen, Audiovisual speech processing[J], IEEE Signal Processing Magazine,Jan,2001,18:9-21.
10Petajan, E.D., Automatic lip reading to enhance speech recognition, Ph.D. thesis,[D] University of Illinois at Urbana-Champaign, 1984.

共引文献6

1秦伟,韦岗.多数据流隐马尔可夫模型的流权值优化方法[J].计算机应用研究,2007,24(11):100-102.
2陈思宝,胡郁,王仁华.一种结构受限的异方差线性判别分析[J].中文信息学报,2008,22(4):94-99.
3赵晖,林成龙,唐朝京.基于视频三音子的汉语双模态语料库的建立[J].中文信息学报,2009,23(5):98-103. 被引量：7
4赵晖,顾亚强,唐朝京.基于乘积HMM的双模态语音识别方法[J].计算机工程,2010,36(8):7-9. 被引量：9
5彭玉青,高洁,梁春娟,李铁军.基于图像可听化的视听信息融合方法研究[J].计算机应用与软件,2013,30(11):76-79.
6程璟星,康智强,谢鹏志.无人机的多模态语音识别[J].现代工业经济和信息化,2021,11(11):82-84. 被引量：1

1赵晖,顾亚强,唐朝京.基于乘积HMM的双模态语音识别方法[J].计算机工程,2010,36(8):7-9. 被引量：9
2严乐贫,奉小慧.双模态车载语音控制仿真系统的设计与实现[J].计算机与现代化,2010(8):211-215.
3潘希姣.多子群粒子群集成神经网络[J].安徽建筑工业学院学报（自然科学版）,2007,15(2):38-40.
4杨晓梅,李世军,苏国平.正交空时分组码在瑞利衰落信道中的性能估计[J].现代电子技术,2005,28(9):103-105.
5周贤娟,赵发,冷强,杨欢.具有语音识别功能的无线传感器网络节点设计[J].单片机与嵌入式系统应用,2014,14(7):57-59.
6何元烈,应自炉,张有为.用K-D树实现对双模态多媒体数据库的有效查询[J].计算机工程与应用,2003,39(18):187-189. 被引量：1
7赵飞名,郑庆义.等离子体放电显示技术[J].哈尔滨科学技术大学学报,1989,13(2):90-93. 被引量：1
8Mellanox创新网络支撑科大讯飞走向前台[J].中国信息化,2016,0(12):94-94.
9姚兵.君子动口不动手——声龙语音识别系统使用手记[J].微电脑世界,2000(29):50-50.
10傅强,胡上序,赵胜颖.基于PSO算法的神经网络集成构造方法[J].浙江大学学报（工学版）,2004,38(12):1596-1600. 被引量：18

计算机应用

2009年第B12期

浏览历史

内容加载中请稍等...

双模态语音识别中乘积HMM权重系数与瞬时SNR的关系研究

参考文献8

二级参考文献19

共引文献6

相关作者

相关机构

相关主题

浏览历史