摘要
针对说话人语音提取问题,提出了一种基于深度神经网络多任务学习的嵌入式注意机制单声道说话人语音提取方法;该算法将语音分离和语音提取统一到单个框架中,向频谱映射分离模型中嵌入说话人注意机制,并在引入说话人辅助信息的注意机制中得到时变注意权重,利用时变注意权重分离出目标说话人的内部嵌入向量,随后采用提取模型对目标说话人的嵌入向量进行非线性处理运算,估计出目标说话人对应的掩蔽,进而提取出目标说话人语音;同时借助TIMIT数据集,进行了语音提取实验;实验结果验证了所提算法的可行性和有效性,并在说话人语音提取的性能上有明显的优越性。
Aiming at the problem of speaker speech extraction,a monophonic speaker speech extraction method based on deep neural network multi-task learning embedded attention mechanism is proposed.The algorithm unifies the speech separation and speech extraction into single framework,embeds the speaker attention mechanism in the spectrum mapping separation network,obtains the time-varying attention weight in the attention mechanism of the speaker auxiliary information,utilizes the time-varying attention weight to separate the internal embedded vector of the target speaker,and then adopts the extraction model to perform nonlinear processing operations on the embedded vector of the target speaker,estimates the mask corresponding to the target speaker,and then extracts the target speaker’s voice.At the same time,by means of the TIMIT dataset,the speech extraction experiments are carried out.The experimental results show the feasibility and effectiveness of the proposed algorithm,and it has obvious superiority in the performance of speaker speech extraction.
作者
郭志楷
杨明堃
蒋国峰
陶祁
刘欢欢
马红强
GUO Zhikai;YANG Mingkun;JIANG Guofeng;TAO Qi;LIU Huanhuan;MA Hongqiang(Aircraft Maintenance NCO Academy of Air Force Engineering University,Xinyang 464099,China)
出处
《计算机测量与控制》
2023年第10期174-181,共8页
Computer Measurement &Control
关键词
深度神经网络
单声道说话人语音提取
多任务学习
嵌入式注意机制
deep neural network
monophonic speaker speech extraction
multi-task learning
embedded attention mechanism