声门下共鸣的谱规整用于非特定人的语音识别

Spectrum warping based on sub-glottal resonances in speaker-independent speech recognition

导出

摘要提出在参数的提取过程中用不同的感知规整因子对不同人的参数归一化,从而实现在非特定人语音识别中对不同人的归一化处理。感知规整因子是基于声门上和声门下之间耦合作用产生声门下共鸣频率来估算的,与采用声道第三共振峰作为基准频率的方法比较,它能较多的滤除语义信息的影响,更好地体现说话人的个性特征。本文提取抗噪性能优于Mel倒谱参数的感知最小方差无失真参数作为识别特征,语音模型用经典的隐马尔可夫模型(HMM)。实验证明,本文方法与传统的语音识别参数和用声道第三共振峰进行谱规整的方法相比,在干净语音中单词错误识别率分别下降了4%和3%,在噪声环境下分别下降了9%和5%,有效地改善了非特定人语音识别系统的性能。 In an effort to reduce the degradation caused by variation of different speaker in speech recognition,a new perceptual frequency warping based on sub-glottal resonances to speaker normalization is investigated.A new warping factor is extracted from the second sub-glottal resonance that is based on acoustic coupling between the sub-glottal and vocal tract.Second sub-glottal resonance is independent of the speech content,and it embodiment speaker character more than the third format.Then it is used to normalize the PMVDR coefficients,which are a speech coefficients based on perceptual Minimum variance distortionless response（PMDVR） and is more robustness and anti-noise than traditional MFCC,utilizing the normalized coefficients to speech mode training and recognition. The experiments show that the word error rate comparing with mel frequency cepstrum and the spectrum warping by third formant decreases 4%and 3%in clean speech recognition,9%and 5%in noise speech recognition.The results demonstrate this method to improve word recognition accuracy of speaker independent recognition system.

作者侯丽敏黄振华谢娟敏

机构地区上海大学通信与信息工程学院

出处《声学学报》 EI CSCD 北大核心 2010年第5期580-586,共7页 Acta Acustica

关键词非特定人语音识别共鸣 Mel倒谱参数隐马尔可夫模型归一化处理语音识别系统提取过程基准频率 Resonance

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献12

1李虎生,刘加,刘润生.语音识别说话人自适应研究现状及发展趋势[J].电子学报,2003,31(1):103-108. 被引量：32
2Rohit Sinha, Umesh S. A shift-based approach to speaker normalization using non-linear frequency-scaling model. Speech Communication, 2008; 50(3): 191--202.
3Bharath Kumar S V, Umesh S, Sinha R. Study of non-linear frequency warping functions for speaker normalization. Acoustics, Speech and Signal Processing, ICASSP 2006; 1: I- 1245--I-1248.
4CHI Xuemin, Morgan Sonderegger. Subglottal coupling and its influence on vowel formants. J. Acoust. Soc. Am., 2007; 122(3):1735--1745.
5Lulich S M. A role for the second subglottal resonance in lexical access. J. Acoust. Soc. Am., 2007; 122(4): 2320--2327.
6Wang S, Alwan A, Lulich S M. Speaker normalization based on subglottal resonances. ICASSP, 2008:4277-4280.
7Wang S, Lulich S M, Alwan A. A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation. Interspeech, 2008:1717--1720.
8Murthi M N, Rao B D. All-pole modeling of speech based on the minimum variance distortionless response spectrum. IEEE Trans. Acoustic Speech Signal Process., 2000; 8(3): 221--239.
9Yapanel U H, Satya Dharanipragada. Perceptual MVDR-based cepstral coefficients (PMCCs) for noise robust speech recognition. IEEE ICASSP, 2003; 1:644 -647.
10Yapanel U H, Hansen J H L. A new perceptually motivated MVDR- based acoustic front-end (PMVDR) for robust automatic speech recognition. Speech Communication, 2008; 50(2): 142--152.

二级参考文献19

1王韫佳.音高和时长在普通话轻声知觉中的作用[J].声学学报,2004,29(5):453-461. 被引量：33
2张昊天.[D].北京:清华大学电子工程系,2000.
3王欢良,钱瑶,F.K.Soong,韩纪庆.基于声调建模的带噪汉语数字串语音识别[J].声学学报,2007,32(5):454-460. 被引量：2
4Huang C H, Side F. Pitch tracking and tone features for mandarin speech recognition. Proceedings of the 25th International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, 2000; 3:1523-1526
5Lei X, S M, Hwang M, Ostendorf M et al. Improved tone modeling for mandarin broadcast news speech recognition. In: Proceedings of Interspeech (ICSLP), Pittsburgh, USA, 2006:1277-1280
6Wang H L, Qian Y, Soong F K, Zhou J L et al. Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone models. In: Proceedings of International Symposium on Chinese Spoken Language Processing, 2006: 445-443
7Yang W J, Lee J C, Chang Y C et al. Hidden Markov Model for Mandarin lexical tone recognition. IEEE Trans. on Acoustic Speech and Signal Processing, 1988; 36(7): 988-992
8Thubthong N, Kijsirikul B, Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half-tone model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2001; 9(6): 815-825
9CAO Yang, ZHANG Shu Wu, HUANG Tai Yi et al. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 2004; 7(2-3): 115-128
10Wong P F, Siu M H. Decision tree based tone modeling for Chinese speech recognition. Proceedings of the 29th International Conference on Acoustics, Speech and Signal Processing, Montreal, Canada, 2004; 1:905-908

共引文献32

1杨吉斌,邢艳玲,曹铁勇,张雄伟.基于Mellin变换和Mel频率分析的非特定人语音识别特征研究[J].模式识别与人工智能,2005,18(3):350-353. 被引量：2
2李财莲,赵小阳,王丽娟,岳振军.说话人识别中关键技术的现状与发展[J].军事通信技术,2005,26(2):62-65. 被引量：3
3蔡铁,朱杰.基于支持说话人权重的快速说话人自适应算法[J].上海交通大学学报,2005,39(12):1997-2001.
4徐向华,朱杰,郭强.决策树结构对说话人自适应影响的研究[J].声学学报,2006,31(1):42-47. 被引量：3
5黄盈椿,王欢良,冯涛.应用MAP方差估计的话者自适应训练方法[J].计算机工程,2006,32(20):203-204.
6倪建克,曾虹,张翔.基于最大均值似然判决规则的说话人辨认研究[J].杭州电子科技大学学报（自然科学版）,2006,26(5):96-99.
7申朝文,何家峰,蔡继祖.说话人识别技术的方法与展望[J].中国科技信息,2007(4):269-270.
8朱浩冰,郭东辉.声纹识别系统原理及其关键技术[J].计算机安全,2007(9):14-17. 被引量：19
9武永星,郑海,周波,杨常青,李茂林.基于距离和相关性准则的混合参数说话人识别[J].系统仿真学报,2008,20(4):926-930.
10雷建军,杨震,刘刚,郭军.噪声鲁棒语音识别研究综述[J].计算机应用研究,2009,26(4):1210-1216. 被引量：16

1叶蕾,方鹏,杨震.基于因特网的说话人识别技术研究[J].南京邮电学院学报（自然科学版）,2004,24(3):45-48.
2潘明.FPGA在频率合成器中的应用[J].广西科学院学报,2003,19(4):271-274.
3胡诞康.石英晶体振荡器在通信产业中的应用[J].无线电工程,2001,31(12):49-50. 被引量：3
4郭春霞.基于VQ的说话人识别系统实现[J].电脑知识与技术（过刊）,2009,15(10X):8256-8257.
5叶蕾,方鹏.说话人识别技术中Mel倒谱参数改进算法的C程序实现[J].计算机与现代化,2007(11):26-28.
6田岚,陆小珊,白树忠.基于快速神经网络算法的非特定人语音识别[J].控制与决策,2002,17(1):65-68. 被引量：10
7徐波.语音技术开发与应用[J].中国经济和信息化,1999,0(16):29-30. 被引量：1
8不开机故障通用维修方法[J].家电科技（手机维修天地）,2004(10):52-52.
9董丽.用激光而不是石英进行跟踪[J].现代材料动态,2015,0(3):6-7.
10小无止境——iPod shuffle 3G[J].移动信息,2009(4):9-9.

声学学报

2010年第5期

浏览历史

内容加载中请稍等...

声门下共鸣的谱规整用于非特定人的语音识别

参考文献12

二级参考文献19

共引文献32

相关作者

相关机构

相关主题

浏览历史