摘要
针对说话人识别技术多基于语音的现状,文章提出了一种新颖的基于唇动的说话人识别技术。通过离散余弦变换,从说话人讲话时的图像序列提取那些既反映说话人嘴部生理特性也反映了说话人唇动的行为特性的视觉特征。基于这些特征,为说话人建立静态-动态混合模型,其中使用半连续隐马尔可夫模型为说话人建立动态模型。在一个小型的视觉语料库上,我们分别对说话人辨认系统和确认系统进行实现。对说话人辨认系统,其文本有关与文本无关模式的正确率分别达到了100%和99.7%;对说话人确认系统,文本有关与文本无关模式的等错误率分别为0.09%与0.33%。
For most of speaker recognition systems based on acoustic signals,a novel approach of speaker recognition technology based on lip movement is presented in this paper.By Discrete Cosine Transform,visual features is extracted from the talking image sequences,which represent both the physical characteristics of the speaker mouth and his lip movement behaviour trait.Based on these feantures,the static-dynmic models are constructed for the speakers,in which the dynmic model is based on SCHMM.We implement both speaker identification system and speaker verification system on a small visual database,and the accuracy of the text-dependent and the text-independent got to 100% and 99.7% for identification,respectively,and the ERR of both of them are 0.09% and 0.33% for speaker verification,separately.
出处
《计算机工程与应用》
CSCD
北大核心
2006年第12期85-88,共4页
Computer Engineering and Applications