期刊文献+

基于LSTM神经网络的声纹识别 被引量:19

Voiceprint Recognition Based on LSTM Neural Network
在线阅读 下载PDF
导出
摘要 声纹识别利用说话人生物特征的个体差异性,通过声音来识别说话人的身份。声纹具有非接触、易采集、特征稳定等特点,应用领域十分广泛。现有的统计模型方法具有提取特征单一、泛化能力不强等局限性。近年来,随着人工智能深度学习的快速发展,神经网络模型在声纹识别领域崭露头角。文中提出基于长短时记忆(Long Short-Term Memory,LSTM)神经网络的声纹识别方法,使用语谱图提取声纹特征作为模型输入,从而实现文本无关的声纹识别。语谱图能够综合表征语音信号在时间方向上的频率和能量信息,表达的声纹特征更加丰富。LSTM神经网络擅长捕捉时序特征,着重考虑了时间维度上的信息,相比其他神经网络模型,更契合语音数据的特点。文中将LSTM神经网络长期学习的优势与声纹语谱图的时序特征有效结合,实验结果表明,在THCHS-30语音数据集上取得了84.31%的识别正确率。在自然环境下,对于3 s的短语音,该方法的识别正确率达96.67%,与现有的高斯混合模型和卷积神经网络方法相比,所提方法的识别性能更优。 Voiceprint recognition determines the identification of the given speaker by voice,using the individual differences of biological characteristics.It has a wide range of use,with the characteristics of non-contact,simple acquisition,feature stability and so on.The existing statistical methods of voiceprint recognition have the limitations of single-source extracted feature and weak generalization ability.In recent years,with the rapid development of artificial intelligence and deep learning,neural networks are emerging in the field of voiceprint recognition.In this paper,a method based on Long Short-Term Memory(Long Short-Term Memory,LSTM)neural network was proposed to realize text-independent voiceprint recognition,using spectrograms to extract voiceprint features as the model input.Spectrograms can represent the frequency and energy information of voice signal in time direction comprehensively,and express more abundant voiceprint features.LSTM neural network is good at capturing temporal features,focusing on the information in time dimension,which is more consistent with the characteristics of voice data compared with other neural network models.The method in this paper combined the long-term learning of LSTM neural network with the sequential feature of voiceprint spectrograms effectively.The experimental results show that 84.31%accuracy is achieved on THCHS-30 voice data set.For three seconds short voice in natural environment,the accuracy of this method is 96.67%,which is better than the existing methods such as Gaussian Mixture Model and Convolutional Neural Network.
作者 刘晓璇 季怡 刘纯平 LIU Xiao-xuan;JI Yi;LIU Chun-ping(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处 《计算机科学》 CSCD 北大核心 2021年第S02期270-274,共5页 Computer Science
基金 秦惠䇹与李政道中国大学生见习进修基金 国家自然科学基金面上项目(61773272) 江苏省高等学校自然科学研究重大项目(19KJA230001)。
关键词 声纹识别 长短时记忆 语谱图 神经网络 深度学习 Voiceprint recognition Long Short-Term Memory Spectrogram Neural network Deep learning
  • 相关文献

参考文献4

二级参考文献10

共引文献176

同被引文献398

引证文献19

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部