期刊文献+

基于时间卷积网络的深度聚类说话人语音分离 被引量:1

Deep clustering speaker speech separation based on temporal convolutional network
在线阅读 下载PDF
导出
摘要 “鸡尾酒会问题”在语音分离任务上一直是一个难题,主要因为这个问题属于一个说话人无关的语音分离问题,对于说话人事先不知道其先验信息。通过参考Jonathan等提出的深度聚类方法,在其基础上进行改进,提出基于时间卷及网络的深度聚类模型,以理想二值掩蔽作为分离目标并在公开中文语音数据集下进行实验。实验结果表明,相比传统深度聚类模型,所提模型在训练速度、分离后的语音质量和语音客观可懂度方面都得到了提升。 Cocktail party problem has always been a difficult problem in speech separation task,mainly because it belongs to a speaker-independent seech separation problem,and the speaker does not know its prior information.Referring to the deep clustering method proposed by Jonathan et al,by improving it,the deep clustering model based on temporal convolutional network was proposed.Ideal binary mask was taken as separation target,and experiments under the open Chinese voice data set were carried out.The results show that the proposed model improves the training speed,speech quality and speech intelligibility compared with the traditional deep clustering model.
作者 王昕 蒋志翔 张杨 寇金桥 常新旭 徐冬冬 WANG Xin;JIANG Zhi-xiang;ZHANG Yang;KOU Jin-qiao;CHANG Xin-xu;XU Dong-dong(Beijing Computer Technology and Application Institute,Second Academy of China Aerospace Science and Industry Corporation,Beijing 100854,China)
出处 《计算机工程与设计》 北大核心 2020年第9期2630-2635,共6页 Computer Engineering and Design
基金 装备发展部信息系统局“十三五”预研课题基金项目(31511040401) 装备预研领域基金项目(61400040201)。
关键词 语音分离 深度聚类模型 时间卷积网络 膨胀卷积 因果卷积 理想二值掩蔽 speech separation deep clustering model temporal convolutional network dilation convolutional causal convolutional ideal binary masking
  • 相关文献

参考文献1

二级参考文献66

  • 1Kim G, Lu Y, Hu Y, Loizou P C. An algorithm that im- proves speech intelligibility in noise for normal-hearing lis- teners. The Journal of the Acoustical Society of America, 2009, 126(3): 1486-1494.
  • 2Dillon H. Hearing Aids. New York: Thieme, 2001.
  • 3Allen J B. Articulation and intelligibility. Synthesis Lectures on Speech and Audio Processing, 2005, 1(1): 1-124.
  • 4Seltzer M L, Raj B, Stern R M. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition. Speech Communication, 2004, 43(4): 379-393.
  • 5Weninger F, Erdogan H, Watanabe S, Vincent E, Le Roux J, Hershey J R, Schuller B. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. In: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation. Liberec, Czech Republic: Springer International Publishing, 2015.91 -99.
  • 6Weng C, Yu D, Seltzer M L, Droppo J. Deep neural networks for single-channel multi-talker speech recognition. IEEE/ ACM Transactions on Audio, Speech, and Language Pro- cessing, 2015, 23(10): 1670-1679.
  • 7Boll S F. Suppression of acoustic noise in speech using spec- tral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979, 27(2): 113-120.
  • 8Chen J D, Benesty J, Huang Y T, Doclo S. New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing, 2006, 14(4): 1218 -1234.
  • 9Loizou P C. Speech Enhancement: Theory and Practice. New York: CRC Press, 2007.
  • 10Liang S, Liu W J, Jiang W. A new Bayesian method incor- porating with local correlation for IBM estimation. IEEE Transactions on Audio, Speech, and Language Processing, 2013, 21(3): 476-487.

共引文献73

同被引文献10

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部