基于时空解耦和区域鲁棒性增强的半监督视频目标分割方法

Semi-supervised video object segmentation method based on spatio-temporal decoupling and regional robustness enhancement

下载PDF

导出

摘要针对半监督视频目标分割(VOS)领域中基于记忆的方法存在由于目标交互造成的物体遮挡以及背景中类似对象或噪声的干扰等问题,提出一种基于时空解耦和区域鲁棒性增强的半监督VOS方法。首先,构建一个结构化Transformer架构去除所有像素共有的特征信息,突出每个像素之间的差异,深入挖掘视频帧中目标的关键特征;其次,解耦当前帧与长期记忆帧之间的相似性,区分为时空相关性和目标重要性2个关键维度,使得对像素级时空特征和目标特征的分析更精确,从而解决由目标交互造成的物体遮挡问题;最后,设计一个区域条形注意力(RSA)模块,利用长期记忆中的目标位置信息增强对前景区域的关注度并抑制背景噪声。实验结果表明,所提方法在DAVIS 2017验证集上比重新训练的AOT(Associating Objects with Transformers)模型的J&F指标高1.7个百分点,在YouTube-VOS2019验证集上比重新训练的AOT模型的总分高1.6个百分点。可见所提方法可有效解决半监督VOS存在的问题。 In response to issues faced by memory-based methods in semi-supervised Video Object Segmentation(VOS),such as object occlusion caused by inter-object interactions and interference from similar objects or background noise,a semi-supervised VOS method based on spatio-temporal decoupling and regional robustness enhancement was proposed.Firstly,a structural Transformer architecture was employed to eliminate shared feature information across all pixels,emphasizing the differences among pixels and thoroughly exploring the key features of objects in video frames.Secondly,the similarity between the current frame and the long-term memory frames was decoupled into two critical dimensions:spatio-temporal correlation and object importance.This decoupling allowed for a more precise analysis of pixel-level spatio-temporal and object features,thereby solving the issue of object occlusion caused by inter-object interactions.Finally,a Regional Strip Attention(RSA)module was designed to enhance focus to the foreground region and suppress background noise by utilizing the object location information from long-term memory.Experimental results indicate that the proposed method outperforms the retrained AOT(Associating Objects with Transformers)model on DAVIS 2017 validation set by 1.7 percentage points in J&F,and achieves a 1.6 percentage points improvement compared to the retrained AOT model in overall score on YouTube-VOS 2019 validation set,indicating that the proposed method effectively addresses existing challenges in semi-supervised VOS.

作者陈鹏宇聂秀山李南君李拓 CHEN Pengyu;NIE Xiushan;LI Nanjun;LI Tuo(School of Computer Science and Technology,Shandong Jianzhu University,Jinan Shandong 250101,China;Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Company Limited,Jinan Shandong 250013,China;State Key Laboratory of High-end Server&Storage Technology(Inspur Group Company Limited),Jinan Shandong 250013,China)

机构地区山东建筑大学计算机科学与技术学院山东云海国创云计算装备产业创新中心有限公司高效能服务器和存储技术国家重点实验室(浪潮集团有限公司)

出处《计算机应用》北大核心 2025年第5期1379-1386,共8页 journal of Computer Applications

基金国家自然科学基金资助项目(62176141) 山东省杰出青年自然科学基金资助项目(ZR2021JQ26)。

关键词视频目标分割时空解耦半监督学习 TRANSFORMER 条形注意力 Video Object Segmentation(VOS) spatio-temporal decoupling semi-supervised learning Transformer strip attention

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1胡学敏,童秀迟,郭琳,张若晗,孔力.基于深度视觉注意神经网络的端到端自动驾驶模型[J].计算机应用,2020,40(7):1926-1931. 被引量：7

二级参考文献3

1胡学敏,易重辉,陈钦,陈茜,陈龙.基于运动显著图的人群异常行为检测[J].计算机应用,2018,38(4):1164-1169. 被引量：7
2白丽贇,胡学敏,宋昇,童秀迟,张若晗.基于深度级联神经网络的自动驾驶运动规划模型[J].计算机应用,2019,39(10):2870-2875. 被引量：11
3张盼盼,李其申,杨词慧.基于轻量级分组注意力模块的图像分类算法[J].计算机应用,2020,40(3):645-650. 被引量：8

共引文献6

1文莉莉,孙苗,邬满.基于注意力机制和Faster R-CNN深度学习的海洋目标识别模型[J].大连海洋大学学报,2021,36(5):859-865. 被引量：11
2吕潇,宋慧慧,樊佳庆.深浅层表示融合的半监督视频目标分割[J].计算机应用,2022,42(12):3884-3890.
3朱波,张纪伟,谈东奎,胡旭东.基于多源传感器与导航地图的端到端自动驾驶方法[J].汽车安全与节能学报,2022,13(4):738-749. 被引量：4
4栾世伟,喻彩丽,印志本,刘峰,张学东.改进多目标检测器的骏枣外观品质实时检测[J].塔里木大学学报,2023,35(1):105-112. 被引量：4
5刘茜,邱官升,曾召余.改进A*算法融合DWA算法的自动驾驶路径规划[J].自动化与仪器仪表,2023(2):32-36. 被引量：4
6石海芃,狄思思.基于CiteSpace的国内自动驾驶领域研究热点综述与分析[J].江苏科技信息,2025,42(11):42-48. 被引量：1

1丁子宜.情境教学法在中职英语词汇教学中的实践研究[J].华夏教师,2025(9):111-113. 被引量：1
2李文哲,李浩然,王涛,马梓瀚,汪传磊,郭丽雪.用于小样本PTC质量智能诊断的AoT-DCGAN和P-CNN混合深度学习模型[J].计算机系统应用,2025,34(5):159-172.
3何可,王建华,于丹,陈永乐.基于自适应采样的机器遗忘方法[J].信息网络安全,2025(4):630-639. 被引量：1
4铁天石.书讯[J].建筑结构,1990(1):61-61.
5王玉明.情感永续设计理论在包装设计中的表达策略[J].包装学报,2025,17(3):103-110.
6黄金杰,刘彬.基于双重优化稳定扩散模型的文本生成图像方法[J].模式识别与人工智能,2025,38(4):359-373. 被引量：1
7Lin Yuan,Xun Duan,Lingjie Xiang,Guangqian Kong,Yun Wu.A Transformer-Based Video Colorization Method Fusing Local Self-attention and Bidirectional Optical Flow[J].国际计算机前沿大会会议论文集,2024(2):153-168.
8Ying CHEN,Ruixue CAO,Yuting FENG,Hui ZHAO.Summer phytoplankton blooms off the Somali coast in the South-western Arabian Sea from remote sensing observations[J].Frontiers of Earth Science,2025,19(1):13-24.

计算机应用

2025年第5期

浏览历史

内容加载中请稍等...

基于时空解耦和区域鲁棒性增强的半监督视频目标分割方法

参考文献1

二级参考文献3

共引文献6

相关作者

相关机构

相关主题

浏览历史