摘要
遥感图像字幕生成(RSIC)是一项结合计算机视觉和自然语言处理的任务,旨在将遥感图像转换为自然语言描述。提出了一种基于双分支注意力Mamba的图像字幕生成方法。在双分支注意力Mamba网络中,设计了一个双向扫描Mamba块,使用最新的Mamba结构编码图像全局特征,通过双向扫描机理增强模型对图像空间的感知理解。双分支注意力模块使用轻量的通道-空间注意力机制,有效地实现对图像的局部特征的关注与优化,提高模型性能。基于UCM-Captions数据集和Sydney-Captions数据集的图像字幕生成实验表明:本文提出的方法相比其他现有方法表现更优。
Remote sensing image captioning(RSIC)is a task that combines computer vision and natural language processing,aiming to convert remote sensing images into natural language descriptions.In this paper,an image captioning method based on dual-branch attention and Mamba is proposed.In the dual-branch attention Mamba network,a bidirectional scanning Mamba module is designed.The latest Mamba architecture is adopted to encode global image features,and a bidirectional scanning mechanism is used to enhance the model’s spatial perception and understanding of the image space.In the dual-branch attention module,a lightweight attention mechanism is used to effectively focus on and optimize local image features,thereby improving the overall model performance.Tests on image captioning based on the UCM-Captions dataset and Sydney-Captions dataset show that the method proposed in this paper performs better than existing methods.
作者
王鹏
周凯立
祝好
王幸运
杜君
WANG Peng;ZHOU Kaili;ZHU Hao;WANG Xingyun;DU Jun(Shenzhen Research Institute,Nanjing University of Aeronautics and Astronautics,Shenzhen 518110,Guangdong,China;Key Laboratory of Remote Sensing Application and Innovation,Chongqing 401147,China;Shanghai Aerospace Radio Equipment Research Institute,Shanghai 200090,China)
出处
《上海航天(中英文)》
2026年第1期74-81,共8页
Aerospace Shanghai(Chinese&English)
基金
国家自然科学基金资助项目(61801211)
卫星遥感数字化应用创新重点实验室开放课题资助项目(LRSAI-2025008)
上海航天科技创新基金资助项目(SAST2024-052)
广东省基础与应用基础研究基金资助项目(2025A1515010258)
深圳市科技计划资助项目(JCYJ20240813180005007)。