基于LSTM神经网络的声纹识别被引量：19

Voiceprint Recognition Based on LSTM Neural Network

下载PDF

导出

摘要声纹识别利用说话人生物特征的个体差异性,通过声音来识别说话人的身份。声纹具有非接触、易采集、特征稳定等特点,应用领域十分广泛。现有的统计模型方法具有提取特征单一、泛化能力不强等局限性。近年来,随着人工智能深度学习的快速发展,神经网络模型在声纹识别领域崭露头角。文中提出基于长短时记忆(Long Short-Term Memory,LSTM)神经网络的声纹识别方法,使用语谱图提取声纹特征作为模型输入,从而实现文本无关的声纹识别。语谱图能够综合表征语音信号在时间方向上的频率和能量信息,表达的声纹特征更加丰富。LSTM神经网络擅长捕捉时序特征,着重考虑了时间维度上的信息,相比其他神经网络模型,更契合语音数据的特点。文中将LSTM神经网络长期学习的优势与声纹语谱图的时序特征有效结合,实验结果表明,在THCHS-30语音数据集上取得了84.31%的识别正确率。在自然环境下,对于3 s的短语音,该方法的识别正确率达96.67%,与现有的高斯混合模型和卷积神经网络方法相比,所提方法的识别性能更优。 Voiceprint recognition determines the identification of the given speaker by voice,using the individual differences of biological characteristics.It has a wide range of use,with the characteristics of non-contact,simple acquisition,feature stability and so on.The existing statistical methods of voiceprint recognition have the limitations of single-source extracted feature and weak generalization ability.In recent years,with the rapid development of artificial intelligence and deep learning,neural networks are emerging in the field of voiceprint recognition.In this paper,a method based on Long Short-Term Memory(Long Short-Term Memory,LSTM)neural network was proposed to realize text-independent voiceprint recognition,using spectrograms to extract voiceprint features as the model input.Spectrograms can represent the frequency and energy information of voice signal in time direction comprehensively,and express more abundant voiceprint features.LSTM neural network is good at capturing temporal features,focusing on the information in time dimension,which is more consistent with the characteristics of voice data compared with other neural network models.The method in this paper combined the long-term learning of LSTM neural network with the sequential feature of voiceprint spectrograms effectively.The experimental results show that 84.31%accuracy is achieved on THCHS-30 voice data set.For three seconds short voice in natural environment,the accuracy of this method is 96.67%,which is better than the existing methods such as Gaussian Mixture Model and Convolutional Neural Network.

作者刘晓璇季怡刘纯平 LIU Xiao-xuan;JI Yi;LIU Chun-ping(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)

机构地区苏州大学计算机科学与技术学院

出处《计算机科学》 CSCD 北大核心 2021年第S02期270-274,共5页 Computer Science

基金秦惠䇹与李政道中国大学生见习进修基金国家自然科学基金面上项目(61773272) 江苏省高等学校自然科学研究重大项目(19KJA230001)。

关键词声纹识别长短时记忆语谱图神经网络深度学习 Voiceprint recognition Long Short-Term Memory Spectrogram Neural network Deep learning

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]