期刊文献+

层叠式“产生/判别”混合模型的语音情感识别 被引量:3

Speech emotion recognition using stacked generative and discriminative hybrid models
原文传递
导出
摘要 提出了层叠式"产生/判别"混合模型的语音情感识别方法。首先,提取63维语句级特征,运用Fisher从中选择12个最佳的语句级特征,建立小波神经网络(WNN)的层叠式产生式模型进行语音情感识别;然后提取69维帧级特征,采用SFS选择出待使用的8维特征,将高斯混合模型(GMM)进行多维概率输出,建立层叠式"产生/判别"混合模型进行语音情感识别。实验结果显示:(1)层叠式"产生/判别"混合模型较单独WNN、GMM、HMM(隐马尔可夫模型)、SVM(支持向量机)的识别率要高;(2)层叠式"产生/判决式"混合模型识别率较基于WNN的层叠产生式模型高;(3)M=13,D维GMM-MAP/SVM(MAP,最大后验概率)串联融合模型为最优的层叠式"产生/判别"混合模型,能获得最高85.1%的识别率。 Generative models and discriminative models have advantages and disadvantages on internal distribution, optimize classification results, dynamic variation characteristics of emotion. This paper attempts to fuse the two kinds of models together and speech emotion recognition based on stacked hybrid generative and discriminative models. First, we reduce the dimensions of utterance-level eigenvectors from 63 to 12 by fisher discriminant, which is used for the stacked discriminative models. Then we use Sequential Forward Selection to select 8 dimensional frame-level features from the total 69 dimensional features, and two kind of GMM multidimensional likelihoods (the same dimension as eigenvector and mixtures of GMM) are proposed for hybrid generative and discriminative models. Experimental results on Berlin emotional speech databases show that (1) hybrid generative and discriminative models achieves significant improvements than merely using WNN, GMM, HMM, or SVM; (2) the recognition rate of the stacked generative and discriminative hybrid models is higher than the stacked discriminative models (3) the GMM-MAP/SVM series hybrid model (the mixtures of GMM is 13, GMM multidimensional likelihoods is the same dimension with eigenvector) is the optimal stacked generative and discriminative hybrid Models, with the recognition rate up to 85.1%.
出处 《声学学报》 EI CSCD 北大核心 2013年第2期231-240,共10页 Acta Acustica
基金 国家863计划 国家自然科学基金资助项目
  • 相关文献

参考文献19

  • 1Elif B, Erzin E, Eroglu E C et al. Improving automatic emotion recognition from speech signals. 10th Annum Con- ference of the International Speech Communication Asso- ciation (Brighton, United kingdom, September 6-10, 2009), 2009:324- 327.
  • 2Yang B, Lugger M. Emotion recognition from speech sig- nals using new harmony features. Signal Processing, 2010; 90(5): 1415-1423.
  • 3Kim E H, Hyun K H, Kim S H et al. Improved emo- tion recognition with a novel speaker-independent feature. IEEE Trans. on Mechatronics, 2009; 14(3): 317-325.
  • 4Park J S, Kim J H, Oh Y H. Feature vector classification based speech emotion recognition for service robots. IEEE Trans. on Consumer Electronics, 2009; 55(3): 1590-1596.
  • 5Bitouk D, Verma R, Nenkova A. Class-level spectral fea- tures for emotion recognition. Speech Communication, 2010; 52(7-8): 613-625.
  • 6Suryannarayana C, Amitava C, Sugata M. Support vector machines employing cross-correlation for emotional speech recognition. Journal of the International Measurement Confederation, 2009; 42(4): 611-618.
  • 7张建平,李明,索宏彬,杨琳,付强,颜永红.长时语音特征在说话人识别技术上的应用[J].声学学报,2010,35(2):267-269. 被引量:8
  • 8张军,韦岗,余华.基于特征分量输出概率加权的多数据流鲁棒语音识别方法[J].声学学报,2008,33(2):102-108. 被引量:2
  • 9严斌峰,朱小燕,张智江,张范.基于邻接空间的鲁棒语音识别方法[J].软件学报,2007,18(4):878-883. 被引量:5
  • 10Huang Y M, Zhang C B, Xu X L. Speech emotion recog- nition research based on wavelet neural network for robot pet. In: 5th International Conference on Intelligent Com- puting, 2009:993-1000.

二级参考文献26

  • 1谢磊,付中华,蒋冬梅,赵荣椿,Werner Verhelst,Hichem Sahli,Jan Conlenis.一种稳健的基于VisemicLDA的口形动态特征及听视觉语音识别[J].电子与信息学报,2005,27(1):64-68. 被引量:4
  • 2张军,韦岗.噪声自适应的多数据流复合子带语音识别方法[J].电子与信息学报,2006,28(7):1183-1187. 被引量:3
  • 3赵蕤,王作英.语音识别中信道和噪音的联合补偿[J].声学学报,2006,31(5):466-470. 被引量:11
  • 4Campbell W M, Sturim D E, Reynolds D A. Support vector machines using GMM supervectors for speaker verification. IEEE SIGNAL PROCESSING LETTERS, 2006; 13(5).
  • 5Reynolds D A, Rose R C. An integrated speech-background model for robust speaker identification. ICASSP-92 pp. II- 185 - II-188.
  • 6Pelecanos J, Sridharan S. Feature warping for robust speaker verification. In: Proc. ISCA Workshop on Speaker, Recognition - 2001.
  • 7Campbell W M, Sturim D, Reynolds D A, Solomonoff A. SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. ICASSP, 2006: 97-100.
  • 8Kenny P, Boulianne G, Ouellet P, Dumouchel P. Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions. on Audio, Speech, and Language, 2007; 15(4): 1435-1447.
  • 9Auckenthaler R, Carey M, Lloyd-Thomas H. Score normalization for text-independent speaker verification systems. Digital Signal Processing, 2000; 10:42-54.
  • 10Dehak Najim, Demouchel Pierre, Kenny Patrick. Modeling prosodic feature with joint factor analysis for speaker verification. IEEE Trans. Audio Speech and Language Processing, 2007.

共引文献12

同被引文献34

引证文献3

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部