Emotion recognition from speech is an important field of research in human computer interaction. In this letter the framework of Support Vector Machines (SVM) with Gaussian Mixture Model (GMM) supervector is introduce...Emotion recognition from speech is an important field of research in human computer interaction. In this letter the framework of Support Vector Machines (SVM) with Gaussian Mixture Model (GMM) supervector is introduced for emotional speech recognition. Because of the importance of variance in reflecting the distribution of speech, the normalized mean vectors potential to exploit the information from the variance are adopted to form the GMM supervector. Comparative experiments from five aspects are conducted to study their corresponding effect to system performance. The experiment results, which indicate that the influence of number of mixtures is strong as well as influence of duration is weak, provide basis for the train set selection of Universal Background Model (UBM).展开更多
在文本无关的说话人识别中,韵律特征由于其对信道环境噪声不敏感等特性而被应用于话者识别任务中。本文对韵律参数采用基于高斯混合模型超向量的支持向量机建模方法,并将类内协方差特征映射方法应用于模型超向量上,单系统的性能比传统...在文本无关的说话人识别中,韵律特征由于其对信道环境噪声不敏感等特性而被应用于话者识别任务中。本文对韵律参数采用基于高斯混合模型超向量的支持向量机建模方法,并将类内协方差特征映射方法应用于模型超向量上,单系统的性能比传统方法的混合高斯-通用背景模型(Gaussian mixture model-universalbackground model,GMM-UBM)基线系统有了40.19%的提升。该方法与本文的基于声学倒谱参数的确认系统融合后,能使整体系统的识别性能有9.25%的提升。在NIST(National institute of standards and technology mixture)2006说话人测试数据库上,融合后的系统能够取得4.9%的等错误率。展开更多
基金Supported by the National Natural Science Foundation of China (No. 61105076)Natural Science Foundation of Anhui Province of China (No. 11040606M127) as well as Key ScientificTechnological Project of Anhui Province (No. 11010202192)
文摘Emotion recognition from speech is an important field of research in human computer interaction. In this letter the framework of Support Vector Machines (SVM) with Gaussian Mixture Model (GMM) supervector is introduced for emotional speech recognition. Because of the importance of variance in reflecting the distribution of speech, the normalized mean vectors potential to exploit the information from the variance are adopted to form the GMM supervector. Comparative experiments from five aspects are conducted to study their corresponding effect to system performance. The experiment results, which indicate that the influence of number of mixtures is strong as well as influence of duration is weak, provide basis for the train set selection of Universal Background Model (UBM).
文摘在文本无关的说话人识别中,韵律特征由于其对信道环境噪声不敏感等特性而被应用于话者识别任务中。本文对韵律参数采用基于高斯混合模型超向量的支持向量机建模方法,并将类内协方差特征映射方法应用于模型超向量上,单系统的性能比传统方法的混合高斯-通用背景模型(Gaussian mixture model-universalbackground model,GMM-UBM)基线系统有了40.19%的提升。该方法与本文的基于声学倒谱参数的确认系统融合后,能使整体系统的识别性能有9.25%的提升。在NIST(National institute of standards and technology mixture)2006说话人测试数据库上,融合后的系统能够取得4.9%的等错误率。