基于白盒Transformer与动态卷积的弱监督语义分割

Weakly Supervised Semantic Segmentation Based on White Box Transformer and Dynamic Convolution

下载PDF

导出

摘要基于图像级标签的弱监督语义分割方法备受关注,因其能通过少量图像级标签训练网络以减轻注释负担,而类激活图是该领域的一种常用方法,其质量受限于初始定位的稀疏性和特征表达能力的不足。现有基于视觉Transformer的方法虽通过自注意力优化类激活图,但其黑盒特性导致注意力区域分散,静态卷积难以适应多尺度目标,且交叉熵损失易受简单样本主导。为解决上述问题,该文提出了一种基于白盒Transformer与动态卷积的弱监督语义分割方法。首先,使用稀疏编码白盒Transformer模块通过可解释的稀疏编码机制生成高精度的类激活图,有效抑制背景噪声。其次,设计的动态条件卷积模块通过自适应调整卷积核参数,实现了对多尺度目标的精准特征提取。最后,引入Focal Loss通过动态抑制易分样本权重,提高了模型对难分样本的分割精度。在PASCAL VOC 2012和MS COCO 2014验证集上与主流方法进行对比,性能分别提高了1.6百分点和1.3百分点。实验结果表明,该模型可以获得更完整的类激活图。 The weakly supervised semantic segmentation method based on image level labels has attracted much attention because it can train the network with a small number of image level labels to reduce annotation burden.Class activation map generation is a commonly used method in this field,but its quality is limited by the sparsity of initial localization and insufficient feature expression ability.Although existing methods based on visual Transformers optimize class activation maps through self attention,their black box characteristics lead to scattered attention regions,making static convolution difficult to adapt to multi-scale targets,and cross entropy loss is easily dominated by simple samples.To address the aforementioned issues,we propose a weakly supervised semantic segmentation method based on white box Transformer and dynamic convolution.Firstly,a sparse coding white box Transformer module is constructed to generate high-precision class activation maps through interpretable sparse coding mechanisms,effectively suppressing background noise.Secondly,a dynamic conditional convolution module is designed to achieve accurate feature extraction of multi-scale targets by adaptively adjusting the convolution kernel parameters.Finally,the introduction of Focal Loss improves the segmentation accuracy of the model for difficult to distinguish samples by dynamically suppressing the weights of easily separable samples.Compared to mainstream methods in PASCAL VOC 2012 and MS COCO 2014 validation sets,the proposed method is improved by 1.6 percentage points and 1.3 percentage points in terms of performance,respectively.The experimental results indicate that the model proposed can obtain a more complete class activation graph.

作者严格刘进锋 YAN Ge;LIU Jin-feng(School of Information Engineering,Ningxia University,Yinchuan 750021,China)

机构地区宁夏大学信息工程学院

出处《计算机技术与发展》 2026年第1期38-45,共8页 Computer Technology and Development

基金宁夏自然科学基金(2023AAC03126)。

关键词弱监督学习语义分割图像级标签白盒Transformer 动态卷积类激活图 weakly supervised learning semantic segmentation image level labels white box Transformer dynamic convolution class activation map

分类号 TP391 [自动化与计算机技术—计算机应用技术]