期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Multi-talker audio–visual speech recognition towards diverse scenarios
1
作者 Yuxiao LIN Tao JIN +2 位作者 xize cheng Zhou ZHAO Fei WU 《Frontiers of Information Technology & Electronic Engineering》 2025年第11期2310-2323,共14页
Recently,audio–visual speech recognition(AVSR)has attracted increasing attention.However,most existing works simplify the complex challenges in real-world applications and only focus on scenarios with two speakers an... Recently,audio–visual speech recognition(AVSR)has attracted increasing attention.However,most existing works simplify the complex challenges in real-world applications and only focus on scenarios with two speakers and perfectly aligned audio-video clips.In this work,we study the effect of speaker number and modal misalignment in the AVSR task,and propose an end-to-end AVSR framework under a more realistic condition.Specifically,we propose a speaker-number-aware mixture-of-experts(SA-MoE)mechanism to explicitly model the characteristic difference in scenarios with different speaker numbers,and a cross-modal realignment(CMR)module for robust handling of asynchronous inputs.We also use the underlying difficulty difference and introduce a new training strategy named challenge-based curriculum learning(CBCL),which forces the model to focus on difficult,challenging data instead of simple data to improve efficiency. 展开更多
关键词 Speech recognition and synthesis Multi-modal recognition Curriculum learning Multi-talker speech recognition
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部