期刊文献+

基于跨模态注意力与门控融合的声场景分类

Acoustic scene classification based on cross-modal attention and gating fusion
在线阅读 下载PDF
导出
摘要 针对声场景分类任务中模态间关联获取不充分、特征融合效率低等问题,提出一种基于跨模态注意力与门控融合的声场景分类模型。该模型通过跨模态注意力模块实现声学与视觉模态的双向交互,动态捕捉模态间关联;同时设计门控融合模块动态调整声学与视觉模态权重,实现特征的自适应融合,并引入残差增强与双路池化策略提升特征的鲁棒性;从准确率、帧率和模型参数量3个维度对所提模型与同任务下的其他方法进行评估。仿真结果表明,所提模型在保持较高准确率的同时,整体分类效果优于其他方法,证明了其有效性与实用性。 Aiming at the problems of insufficient acquisition of correlation between modes and inefficient feature fusion in acoustic scene classification task,a acoustic scene classification model based on cross-modal attention and gating fusion is proposed.This model enables bidirectional interaction between acoustic and visual modalities via a cross-modal attention module,dynamically capturing their correlation.Meanwhile,the gating fusion module is designed to dynamically adjust the weights of acoustic and visual modes,realize the adaptive fusion of features,and the residual enhancement and dual-path pooling strategy are introduced to boost the robustness of features.The proposed model and the excellent methods with the same task are evaluated across accuracy,frame rate and model parameters.The simulation results show that the overall classification effect of the proposed model outperforms other methods while maintaining high accuracy,which proves its effectiveness and practicability.
作者 韦娟 周惠文 宁方立 WEI Juan;ZHOU Huiwen;NING Fangli(School of Communication Engineering,Xidian University,Xi’an 710071,China;School of Mechanical Engineering,Northwestern Polytechnical University,Xi’an 710072,China)
出处 《系统工程与电子技术》 北大核心 2025年第11期3543-3550,共8页 Systems Engineering and Electronics
基金 国家自然科学基金(52475132) 陕西省重点研发计划(2024GX-ZDCYL-01-16) 航空科学基金(20200015053001) 西安市重点产业链技术攻关基金(23ZDCYJSGG0006-2023)资助课题。
关键词 声场景分类 跨模态注意力 动态门控 自适应融合 acoustic scene classification cross-modal attention dynamic gating adaptive fusion
  • 相关文献

参考文献5

二级参考文献16

共引文献33

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部