期刊文献+

基于时空解耦和区域鲁棒性增强的半监督视频目标分割方法

Semi-supervised video object segmentation method based on spatio-temporal decoupling and regional robustness enhancement
在线阅读 下载PDF
导出
摘要 针对半监督视频目标分割(VOS)领域中基于记忆的方法存在由于目标交互造成的物体遮挡以及背景中类似对象或噪声的干扰等问题,提出一种基于时空解耦和区域鲁棒性增强的半监督VOS方法。首先,构建一个结构化Transformer架构去除所有像素共有的特征信息,突出每个像素之间的差异,深入挖掘视频帧中目标的关键特征;其次,解耦当前帧与长期记忆帧之间的相似性,区分为时空相关性和目标重要性2个关键维度,使得对像素级时空特征和目标特征的分析更精确,从而解决由目标交互造成的物体遮挡问题;最后,设计一个区域条形注意力(RSA)模块,利用长期记忆中的目标位置信息增强对前景区域的关注度并抑制背景噪声。实验结果表明,所提方法在DAVIS 2017验证集上比重新训练的AOT(Associating Objects with Transformers)模型的J&F指标高1.7个百分点,在YouTube-VOS2019验证集上比重新训练的AOT模型的总分高1.6个百分点。可见所提方法可有效解决半监督VOS存在的问题。 In response to issues faced by memory-based methods in semi-supervised Video Object Segmentation(VOS),such as object occlusion caused by inter-object interactions and interference from similar objects or background noise,a semi-supervised VOS method based on spatio-temporal decoupling and regional robustness enhancement was proposed.Firstly,a structural Transformer architecture was employed to eliminate shared feature information across all pixels,emphasizing the differences among pixels and thoroughly exploring the key features of objects in video frames.Secondly,the similarity between the current frame and the long-term memory frames was decoupled into two critical dimensions:spatio-temporal correlation and object importance.This decoupling allowed for a more precise analysis of pixel-level spatio-temporal and object features,thereby solving the issue of object occlusion caused by inter-object interactions.Finally,a Regional Strip Attention(RSA)module was designed to enhance focus to the foreground region and suppress background noise by utilizing the object location information from long-term memory.Experimental results indicate that the proposed method outperforms the retrained AOT(Associating Objects with Transformers)model on DAVIS 2017 validation set by 1.7 percentage points in J&F,and achieves a 1.6 percentage points improvement compared to the retrained AOT model in overall score on YouTube-VOS 2019 validation set,indicating that the proposed method effectively addresses existing challenges in semi-supervised VOS.
作者 陈鹏宇 聂秀山 李南君 李拓 CHEN Pengyu;NIE Xiushan;LI Nanjun;LI Tuo(School of Computer Science and Technology,Shandong Jianzhu University,Jinan Shandong 250101,China;Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Company Limited,Jinan Shandong 250013,China;State Key Laboratory of High-end Server&Storage Technology(Inspur Group Company Limited),Jinan Shandong 250013,China)
出处 《计算机应用》 北大核心 2025年第5期1379-1386,共8页 journal of Computer Applications
基金 国家自然科学基金资助项目(62176141) 山东省杰出青年自然科学基金资助项目(ZR2021JQ26)。
关键词 视频目标分割 时空解耦 半监督学习 TRANSFORMER 条形注意力 Video Object Segmentation(VOS) spatio-temporal decoupling semi-supervised learning Transformer strip attention
  • 相关文献

参考文献1

二级参考文献3

共引文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部