摘要
“鸡尾酒会问题”在语音分离任务上一直是一个难题,主要因为这个问题属于一个说话人无关的语音分离问题,对于说话人事先不知道其先验信息。通过参考Jonathan等提出的深度聚类方法,在其基础上进行改进,提出基于时间卷及网络的深度聚类模型,以理想二值掩蔽作为分离目标并在公开中文语音数据集下进行实验。实验结果表明,相比传统深度聚类模型,所提模型在训练速度、分离后的语音质量和语音客观可懂度方面都得到了提升。
Cocktail party problem has always been a difficult problem in speech separation task,mainly because it belongs to a speaker-independent seech separation problem,and the speaker does not know its prior information.Referring to the deep clustering method proposed by Jonathan et al,by improving it,the deep clustering model based on temporal convolutional network was proposed.Ideal binary mask was taken as separation target,and experiments under the open Chinese voice data set were carried out.The results show that the proposed model improves the training speed,speech quality and speech intelligibility compared with the traditional deep clustering model.
作者
王昕
蒋志翔
张杨
寇金桥
常新旭
徐冬冬
WANG Xin;JIANG Zhi-xiang;ZHANG Yang;KOU Jin-qiao;CHANG Xin-xu;XU Dong-dong(Beijing Computer Technology and Application Institute,Second Academy of China Aerospace Science and Industry Corporation,Beijing 100854,China)
出处
《计算机工程与设计》
北大核心
2020年第9期2630-2635,共6页
Computer Engineering and Design
基金
装备发展部信息系统局“十三五”预研课题基金项目(31511040401)
装备预研领域基金项目(61400040201)。
关键词
语音分离
深度聚类模型
时间卷积网络
膨胀卷积
因果卷积
理想二值掩蔽
speech separation
deep clustering model
temporal convolutional network
dilation convolutional
causal convolutional
ideal binary masking