期刊文献+

面向说话人日志的多原型驱动图神经网络方法

Multi-prototype driven graph neural network for speaker diarization
在线阅读 下载PDF
导出
摘要 最近,图神经网络在会话级建模中的应用,已显示出其在说话人日志任务上的有效性。然而,现有的大多数图神经网络变体仅依赖于局部结构信息,忽略了全局说话人信息的重要性,无法充分弥补说话人日志任务中说话人信息不足的问题。提出了面向说话人日志的多原型驱动图神经网络方法(MPGNN)用于表示学习,该方法在每个会话中有效地结合了局部和全局说话人信息,并同时将x-vector重新映射到一个更适合聚类的新的嵌入空间。此外,多原型学习模块的设计采用了动态自适应的方法,这一关键组件能够捕获更准确的全局说话人信息。实验结果表明,所提出的MPGNN方法显著优于基线系统,能在AMI_SDM和CALLHOME数据集上分别达到3.33%、3.52%、5.66%和6.52%的说话人日志错误率(DER)。 Recently,the utilization of graph neural network for session-level modeling has demonstrated its efficacy for speaker diarization.However,most of existing variants solely rely on local structure information,ignoring the importance of global speaker information,which cannot fully compensate for the lack of speaker information in the speaker diarization task.This paper proposed a multi-prototype driven graph neural network(MPGNN)for representation learning,which effectively combined local and global speaker information within each session and simultaneously remaps x-vector to a new embedding space that was more suitable for clustering.Specifically,the design of prototype learning with a dynamic and adaptive approach was a critical component,where more accurate global speaker information could be captured.Experimental results show that the proposed MPGNN approach significantly outperforms the baseline systems,achieving diarization error rates(DER)of 3.33%,3.52%,5.66%,and 6.52%on the AMI_SDM and CALLHOME datasets respectively.
作者 毛青青 贾洪杰 朱必松 Mao Qingqing;Jia Hongjie;Zhu Bisong(School of Computer Science&Telecommunication Engineering,Jiangsu University,Zhenjiang Jiangsu 212013,China)
出处 《计算机应用研究》 北大核心 2025年第6期1778-1783,共6页 Application Research of Computers
基金 江苏省自然科学基金资助项目(BK20190838)。
关键词 说话人日志 图神经网络 局部结构信息 全局说话人信息 多原型学习 speaker diarization graph neural network local structure information global speaker information multiprototype learning
  • 相关文献

参考文献1

二级参考文献4

共引文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部