基于双流聚合Transformer的RGB-D语义分割

RGB-D Semantic Segmentation Based on Dual-Stream Aggregated Transformer

下载PDF

导出

摘要针对RGB-D(Red Green Blue Depth)语义分割中色彩信息和深度信息无法有效融合以及无法充分提取多尺度上下文信息的问题,文中提出了一种基于双流聚合Transformer的RGB-D语义分割方法。通过Transformer提取全彩图像和深度图像的多层次特征,采用通道注意交叉融合模块与深度增强RGB操作实现各层次特征模态鸿沟的补偿,完成双模态信息融合。使用多层聚合解码器模块整合多层次多尺度上下文特征,减少了信息传递损失,实现了更准确和更全面的语义分割。实验结果表明,所提方法在NYU-Dv2数据集上的平均交并比(mean Intersection over Union,mIoU)、像素准确率和平均像素准确率分别达到52.9%、78.0%、66.0%。在Cityscapes数据集上的实验结果表明,在低分辨率输入图像下,所提方法的mIoU达到了79.8%。 In view of the problems that color information and depth information cannot be effectively fused in RGB-D(Red Green Blue Depth)semantic segmentation and that multi-scale context information cannot be fully extracted,this study proposes an RGB-D semantic segmentation method based on dual-stream aggregation Transformer.The Transformer is used to extract multi-level features of full-color images and depth images.The channel attention cross-fusion module and the depth-enhanced RGB operation are adopted to compensate for the modality gap of features at various levels,thus completing the fusion of dual-modal information.The multi-layer aggregation decoder module is used to integrate multi-level and multi-scale context features,reducing the information transmission loss and achieving more accurate and comprehensive semantic segmentation.Experimental results show that the proposed method achieves a mIoU(mean Intersection over Union),pixel accuracy,and mean pixel accuracy of 52.9%,78.0%,and 66.0%,respectively on the NYU-Dv2 dataset.The experimental results on the Cityscapes dataset show that,under the condition of low-resolution input images,the mIoU of the proposed method reaches 79.8%.

作者葛梦娇苏雯何烨陈稼炜高金凤 GE Mengjiao;SU Wen;HE Ye;CHEN Jiawei;GAO Jinfeng(School of Information Science and Engineering,Zhejiang Sci-Tech University,Hangzhou 310018,China)

机构地区浙江理工大学信息科学与工程学院

出处《电子科技》 2025年第12期79-85,共7页 Electronic Science and Technology

基金国家自然科学基金(62006209) 浙江省自然科学基金(LY24F020010)。

关键词 RGB-D 语义分割 TRANSFORMER 通道注意交叉融合深度增强RGB操作多层聚合解码器全彩图像深度图像 RGB-D semantic segmentation Transformer channel attention cross fusion depth-enhanced RGB operation multi-layer aggregation decoder full-color image depth map

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]