摘要
构建了一种基于发音特征的音/视频双流动态贝叶斯网络(dynamic Bayesian network,DBN)语音识别模型,定义了各节点的条件概率关系,以及发音特征之间的异步约束关系,最后在音/视频连接数字语音数据库上进行了语音识别实验,并与音频单流、视频单流DBN模型比较了在不同信噪比情况下的识别效果。结果表明,在低信噪比情况下,基于发音特征的音/视频双流语音识别模型表现出最好的识别性能,而且随着噪声的增加,其识别率下降的趋势比较平缓,表明该模型对噪声具有很强的鲁棒性,更适用于低信噪比环境下的语音识别。
This paper presented an articulatory feature (AF) -based multi-stream dynamic Bayesian networks (DBN) model (AF_AV_DBN) for audio visual speech recognition. Defined conditional probability of each node and degree of asynchrony between AFs, and carried out speech recognition experiments on an audio visual connected digit database. Comparing results with the other two single stream DBN models (audio-only model and video-only model) show that AF AV DBN performs the best when the signal-noise ratio on the audio stream is low. Moreover, the AF AV DBN model is more robust to noise, thus more suitable for speech recognition in noisy environments.
出处
《计算机应用研究》
CSCD
北大核心
2009年第7期2481-2483,共3页
Application Research of Computers
基金
国家自然科学基金资助项目(60703104)
关键词
动态贝叶斯网络
发音特征
音/视频
语音识别
dynamic Bayesian network( DBN)
articulatory feature
audio-visual
speech recognition