摘要
该文提出了一种基于分类高斯混合模型和神经网络融合(FS-GMM/NN)的说话人识别方法,通过对特征矢量进行聚类分析,将说话人的训练语音分成若干类。然后根据各个类中含特征矢量的多少采用不同的模型混合度,训练建立分类高斯混合模型。并采用神经网络实现各个分类高斯混合模型输出的融合.在100个男性话者的与文本无关的说话人识别实验中,基于分类高斯混合模型和神经网络融合的方法在识别性能及噪声鲁棒性上都优于不分类的GMM识别系统,并具有较高的模型训练效率,且可以有效地降低话者模型的混合度和测试语音长度。
In this paper, a speaker identification system is proposed based on classify Feature Sub-space Gaussian Mixture Model and Neural Net fusion (FS-GMM/NN).With clustering analysis of the feature vectors, the speaker's training feature vectors can be classified to some subsets and training classify Gaussian Mixture Models (GMM) with different mixtures according to the subset's feature vectors's number. Finally, the outputs of every classify GMM will be fused by Neural Net (NN). In the experiment of text-independent speaker identification of 100 speakers (male), the system based on FS-GMM/NN overmatch the Baseline Gaussian Mixture Model (B-GMM) in identification performance and noise robustness with fewer mixtures and shorter test speech. Moreover, the training of FS-GMM/NN is more effective.
出处
《电子与信息学报》
EI
CSCD
北大核心
2004年第10期1607-1612,共6页
Journal of Electronics & Information Technology
基金
国家自然科学基金项目(60272039)
安徽省自然科学基金项目(01042205)资助