摘要
对于自编码器图像融合算法难以突出红外显著目标,现有融合策略难以同时考虑全局结构与局部细节信息,以及大多数融合算法过度关注统计指标,而忽视了高级视觉任务的支持需求的问题,提出了一种基于语义分割网络引导的图像融合方法,并设计了混合交叉特征机制作为融合策略。首先,在编码器和解码器之间引入浅层和深层的跳跃连接,通过最大值选择策略融合特征,以突出显著目标并减少冗余信息。其次,融合策略采用混合交叉特征机制,在单一框架内通过交叉注意力和卷积操作融合不同模态特征,来整合全局上下文与局部细粒度信息。最后,将生成的融合图像输入到分割网络中,利用语义损失引导高级语义信息回流至融合网络,以生成具有丰富语义信息的融合图像。结果表明,所提方法在RoadScene数据集的SD,MI,VIFF,Qabf和AG等客观评价指标上,相较于7种对比算法分别平均提高了33.93%,112.81%,49.89%,27.64%,23.87%。在MSRS数据集的语义分割任务中,该方法在car,person和bicycle这3个类别上交并比超越了7种先进算法,分别平均提高了3.47%,6.37%和9.57%。
To address the difficulty of self-encoder image fusion algorithms in highlighting infrared(IR)salient targets and the challenge of simultaneously considering global structure and local detail information in existing fusion strategies-while most algorithms overly prioritize statistical metrics and overlook support for advanced visual tasks-a semantic segmentation-guided image fusion method with a hybrid cross-feature mechanism is proposed.Shallow and deep skip connections are introduced between the encoder and decoder,employing a maximum value selection strategy to emphasize salient targets and reduce redundancy.The fusion strategy integrates global context and local fine-grained information through cross-attention and convolutional operations,combining different modal features within a single frame.The fused image is then fed into a segmentation network,where semantic loss guides high-level semantic information back to the fusion network,enabling the generation of a fused image rich in semantic detail.Experimental results demonstrate that the proposed method achieves average improvements of 33.93%,112.81%,49.89%,27.64%,and 23.87%in SD,MI,VIFF,Qabf,and AG metrics on the RoadScene dataset compared to seven baseline algorithms.Additionally,the intersection and concurrency ratios for car,person,and bicycle categories in the semantic segmentation task on the MSRS dataset increase by 3.47%,6.37%,and 9.57%on average,outperforming other state-of-the-art methods.
作者
季赛
乔礼维
孙亚杰
JI Sai;QIAO Liwei;SUN Yajie(College of Computer Science,Cyber Science and Engineering,Nanjing University of Information Science and Technology,Nanjing 210044,China;School of Information Engineering,Taizhou University,Taizhou,Jiangsu 225300,China)
出处
《计算机科学》
北大核心
2026年第2期253-263,共11页
Computer Science
基金
国家自然科学基金(62172292)。
关键词
图像融合
红外与可见光图像
交叉注意力机制
卷积
语义分割
Image fusion
Infrared and visible image
Cross attention mechanism
Convolution
Semantic segmentation