摘要
现有的遥感图像语义分割方法面临显著挑战:基于卷积神经网络(CNN)的方法缺乏远程建模能力,在复杂遥感场景中的分割性能受限;基于Transformer的方法计算复杂度随输入图像尺寸呈平方级增长,难以兼顾分割性能与计算效率。最近,视觉状态空间模型(VSS)因能够以线性计算复杂度建模全局依赖关系而受到广泛关注。针对上述问题,提出了一种结合CNN与VSS的遥感图像语义分割网络,旨在同时兼顾性能与效率。网络由基于CNN构成的编码器和基于VSS的解码器组成,用于建模局部信息并捕获远程上下文依赖关系。引入多尺度深度卷积和坐标注意力机制,构建多尺度前馈网络(MSFFN)替换原始VSS中的前馈网络(FFN),以缓解顺序扫描机制带来的2D图像局部区域空间像素不连续问题,同时增强多尺度特征表示。此外,设计空间通道聚合增强模块(SCAEM),充分融合编码器浅层细节信息和解码器全局语义信息,实现高效特征聚合。使用辅助分割头优化梯度传播和特征学习的方向,促进更准确的分割结果输出。在Vaihingen、Potsdam和LoveDA数据集上与一些先进的语义分割网络进行了对比实验,实验结果表明,提出的网络在这三个公共数据集上的表现优于其他分割网络.
Existing methods for semantic segmentation of remote sensing images face significant challenges:convolutional neural network(CNN)-based methods lack remote modeling capability and have limited segmentation efficacy in complex scenes.Transformer-based methods have a computational complexity that grows in square steps with the size of the input image,which makes it difficult to balance segmentation performance and computational efficiency.Recently,visual state space(VSS)has received much attention for its ability to model global dependencies with linear computational complexity.A semantic segmentation network for remote sensing images combining CNN and VSS is proposed to address the above problems,aiming to balance the performance and efficiency at the same time.Specifically,the network consists of a CNN-based encoder and a VSS-based decoder for extracting local correlations and capturing long-range contextual dependencies.The multi-scale deep convolution and coordinate attention mechanisms are introduced to construct a multiscale feed-forward network(MSFFN)to replace the feed-forward network(FFN)in the original VSS,in order to address the token fragmentation issue within local 2D image regions caused by sequential scanning mechanisms,while enhancing the multi-scale feature representation.The spatial channel aggregated enhancement module(SCAEM)is designed to fully fuse the shallow detail information of the encoder and the global semantic information of the decoder to achieve efficient feature aggregation.An auxiliary segmentation head aids gradient propagation and feature refinement,leading to superior segmentation outputs.Comparison experiments with some state-of-the-art semantic segmentation methods on Vaihingen,Potsdam and LoveDA datasets are conducted,and the experimental results show that the proposed network outperforms other segmentation networks on these three public datasets.
作者
蔺月妮
汪西莉
LIN Yueni;WANG Xili(School of Artificial Intelligence and Computer Science,Shaanxi Normal University,Xi’an 710119,Ch)
出处
《计算机科学与探索》
北大核心
2025年第12期3290-3302,共13页
Journal of Frontiers of Computer Science and Technology
基金
科技部青藏高原科考专项第二次青藏高原综合科学考察研究项目(2019QZKK0405)
国家自然科学基金(42361056)。
关键词
遥感图像
语义分割:视觉状态空间
多尺度特征
卷积神经网络
remote sensing images
semantic segmentation
visual state space
multi-scale features
convolutional neural network