Video camouflaged object detection(VCOD)has become a fundamental task in computer vision that has attracted significant attention in recent years.Unlike image camouflaged object detection(ICOD),VCOD not only requires ...Video camouflaged object detection(VCOD)has become a fundamental task in computer vision that has attracted significant attention in recent years.Unlike image camouflaged object detection(ICOD),VCOD not only requires spatial cues but also needs motion cues.Thus,effectively utilizing spatiotemporal information is crucial for generating accurate segmentation results.Current VCOD methods,which typically focus on exploring motion representation,often ineffectively integrate spatial and motion features,leading to poor performance in diverse scenarios.To address these issues,we design a novel spatiotemporal network with an encoder-decoder structure.During the encoding stage,an adjacent space-time memory module(ASTM)is employed to extract high-level temporal features(i.e.,motion cues)from the current frame and its adjacent frames.In the decoding stage,a selective space-time aggregation module is introduced to efficiently integrate spatial and temporal features.Additionally,a multi-feature fusion module is developed to progressively refine the rough prediction by utilizing the information provided by multiple types of features.Furthermore,we incorporate multi-task learning into the proposed network to obtain more accurate predictions.Experimental results show that the proposed method outperforms existing cutting-edge baselines on VCOD benchmarks.展开更多
Camouflaged object detection(COD)refers to the task of identifying and segmenting objects that blend seamlessly into their surroundings,posing a significant challenge for computer vision systems.In recent years,COD ha...Camouflaged object detection(COD)refers to the task of identifying and segmenting objects that blend seamlessly into their surroundings,posing a significant challenge for computer vision systems.In recent years,COD has garnered widespread attention due to its potential applications in surveillance,wildlife conservation,autonomous systems,and more.While several surveys on COD exist,they often have limitations in terms of the number and scope of papers covered,particularly regarding the rapid advancements made in the field since mid-2023.To address this void,we present the most comprehensive review of COD to date,encompassing both theoretical frameworks and practical contributions to the field.This paper explores various COD methods across four domains,including both image-level and video-level solutions,from the perspectives of traditional and deep learning approaches.We thoroughly investigate the correlations between COD and other camouflaged scenario methods,thereby laying the theoretical foundation for subsequent analyses.Furthermore,we delve into novel tasks such as referring-based COD and collaborative COD,which have not been fully addressed in previous works.Beyond object-level detection,we also summarize extended methods for instance-level tasks,including camouflaged instance segmentation,counting,and ranking.Additionally,we provide an overview of commonly used benchmarks and evaluation metrics in COD tasks,conducting a comprehensive evaluation of deep learning-based techniques in both image and video domains,considering both qualitative and quantitative performance.Finally,we discuss the limitations of current COD models and propose 9 promising directions for future research,focusing on addressing inherent challenges and exploring novel,meaningful technologies.This comprehensive examination aims to deepen the understanding of COD models and related methods in camouflaged scenarios.For those interested,a curated list of CODrelated techniques,datasets,and additional resources can be found at https://github.com/ChunmingHe/awesome-concealed-objectsegmentation.展开更多
Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging du...Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging due to advances in both camouflage materials and biological mimicry.Although multispectral-RGB based technology shows promise,conventional dual-aperture multispectral-RGB imaging systems are constrained by imprecise and time-consuming registration and fusion across different modalities,limiting their performance.Here,we propose the Reconstructed Multispectral-RGB Fusion Network(RMRF-Net),which reconstructs RGB images into multispectral ones,enabling efficient multimodal segmentation using only an RGB camera.Specifically,RMRF-Net employs a divergentsimilarity feature correction strategy to minimize reconstruction errors and includes an efficient boundary-aware decoder to enhance object contours.Notably,we establish the first real-world aerial multispectral-RGB semantic segmentation of camouflage objects dataset,including 11 object categories.Experimental results demonstrate that RMRF-Net outperforms existing methods,achieving 17.38 FPS on the NVIDIA Jetson AGX Orin,with only a 0.96%drop in mIoU compared to the RTX 3090,showing its practical applicability in multimodal remote sensing.展开更多
This paper introduces deep gradient network(DGNet),a novel deep framework that exploits object gradient supervision for camouflaged object detection(COD).It decouples the task into two connected branches,i.e.,a contex...This paper introduces deep gradient network(DGNet),a novel deep framework that exploits object gradient supervision for camouflaged object detection(COD).It decouples the task into two connected branches,i.e.,a context and a texture encoder.The es-sential connection is the gradient-induced transition,representing a soft grouping between context and texture features.Benefiting from the simple but efficient framework,DGNet outperforms existing state-of-the-art COD models by a large margin.Notably,our efficient version,DGNet-S,runs in real-time(80 fps)and achieves comparable results to the cutting-edge model JCSOD-CVPR21 with only 6.82%parameters.The application results also show that the proposed DGNet performs well in the polyp segmentation,defect detec-tion,and transparent object segmentation tasks.The code will be made available at https://github.com/GewelsJI/DGNet.展开更多
Confusing object detection(COD),such as glass,mirrors,and camouflaged objects,represents a burgeoning visual detection task centered on pinpointing and distinguishing concealed targets within intricate backgrounds,lev...Confusing object detection(COD),such as glass,mirrors,and camouflaged objects,represents a burgeoning visual detection task centered on pinpointing and distinguishing concealed targets within intricate backgrounds,leveraging deep learning methodologies.Despite garnering increasing attention in computer vision,the focus of most existing works leans toward formulating task-specific solutions rather than delving into in-depth analyses of methodological structures.As of now,there is a notable absence of a comprehensive systematic review that focuses on recently proposed deep learning-based models for these specific tasks.To fill this gap,our study presents a pioneering review that covers both themodels and the publicly available benchmark datasets,while also identifying potential directions for future research in this field.The current dataset primarily focuses on single confusing object detection at the image level,with some studies extending to video-level data.We conduct an in-depth analysis of deep learning architectures,revealing that the current state-of-the-art(SOTA)COD methods demonstrate promising performance in single object detection.We also compile and provide detailed descriptions ofwidely used datasets relevant to these detection tasks.Our endeavor extends to discussing the limitations observed in current methodologies,alongside proposed solutions aimed at enhancing detection accuracy.Additionally,we deliberate on relevant applications and outline future research trajectories,aiming to catalyze advancements in the field of glass,mirror,and camouflaged object detection.展开更多
We introduce a novel bilateral reference framework(BiRefNet)for high-resolution dichotomous image segmentation(DIS).It comprises two essential components:the localization module(LM)and the reconstruction module(RM)wit...We introduce a novel bilateral reference framework(BiRefNet)for high-resolution dichotomous image segmentation(DIS).It comprises two essential components:the localization module(LM)and the reconstruction module(RM)with our proposed bilateral reference(BiRef).LM aids in object localization using global semantic information.Within the RM,we utilize BiRef for the reconstruction process,where hierarchical patches of images provide the source reference,and gradient maps serve as the target reference.These components collaborate to generate the final predicted maps.We also introduce auxiliary gradient supervision to enhance the focus on regions with finer details.In addition,we outline practical training strategies tailored for DIS to improve map quality and the training process.To validate the general applicability of our approach,we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance,outperforming task-specific cutting-edge methods across all benchmarks.Our codes are publicly available at https://github.com/ZhengPeng7/BiRefNet.展开更多
The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as e...The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as existing methods may misclassify salient objects as camouflaged ones,despite these contradictory characteristics.This limitation may stem from the lack of multipattern training images,leading to reduced robustness against salient objects.To overcome the scarcity of multi-pattern training images,we introduce CamDiff,a novel approach inspired by AI-Generated Content(AIGC).Specifically,we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes,while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training(CLIP)model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt.Consequently,the synthesized image retains its original camouflage label while incorporating salient objects,yielding camouflaged scenes with richer characteristics.The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more;thus,such samples pose a greater challenge to the existing COD models.Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost.It significantly enhances the training and testing phases of COD baselines,granting them robustness across diverse domains.Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.展开更多
文摘Video camouflaged object detection(VCOD)has become a fundamental task in computer vision that has attracted significant attention in recent years.Unlike image camouflaged object detection(ICOD),VCOD not only requires spatial cues but also needs motion cues.Thus,effectively utilizing spatiotemporal information is crucial for generating accurate segmentation results.Current VCOD methods,which typically focus on exploring motion representation,often ineffectively integrate spatial and motion features,leading to poor performance in diverse scenarios.To address these issues,we design a novel spatiotemporal network with an encoder-decoder structure.During the encoding stage,an adjacent space-time memory module(ASTM)is employed to extract high-level temporal features(i.e.,motion cues)from the current frame and its adjacent frames.In the decoding stage,a selective space-time aggregation module is introduced to efficiently integrate spatial and temporal features.Additionally,a multi-feature fusion module is developed to progressively refine the rough prediction by utilizing the information provided by multiple types of features.Furthermore,we incorporate multi-task learning into the proposed network to obtain more accurate predictions.Experimental results show that the proposed method outperforms existing cutting-edge baselines on VCOD benchmarks.
基金supported by the STI 2030-Major Projects(No.2021ZD0201404).
文摘Camouflaged object detection(COD)refers to the task of identifying and segmenting objects that blend seamlessly into their surroundings,posing a significant challenge for computer vision systems.In recent years,COD has garnered widespread attention due to its potential applications in surveillance,wildlife conservation,autonomous systems,and more.While several surveys on COD exist,they often have limitations in terms of the number and scope of papers covered,particularly regarding the rapid advancements made in the field since mid-2023.To address this void,we present the most comprehensive review of COD to date,encompassing both theoretical frameworks and practical contributions to the field.This paper explores various COD methods across four domains,including both image-level and video-level solutions,from the perspectives of traditional and deep learning approaches.We thoroughly investigate the correlations between COD and other camouflaged scenario methods,thereby laying the theoretical foundation for subsequent analyses.Furthermore,we delve into novel tasks such as referring-based COD and collaborative COD,which have not been fully addressed in previous works.Beyond object-level detection,we also summarize extended methods for instance-level tasks,including camouflaged instance segmentation,counting,and ranking.Additionally,we provide an overview of commonly used benchmarks and evaluation metrics in COD tasks,conducting a comprehensive evaluation of deep learning-based techniques in both image and video domains,considering both qualitative and quantitative performance.Finally,we discuss the limitations of current COD models and propose 9 promising directions for future research,focusing on addressing inherent challenges and exploring novel,meaningful technologies.This comprehensive examination aims to deepen the understanding of COD models and related methods in camouflaged scenarios.For those interested,a curated list of CODrelated techniques,datasets,and additional resources can be found at https://github.com/ChunmingHe/awesome-concealed-objectsegmentation.
基金National Natural Science Foundation of China(Grant Nos.62005049 and 62072110)Natural Science Foundation of Fujian Province(Grant No.2020J01451).
文摘Accurate segmentation of camouflage objects in aerial imagery is vital for improving the efficiency of UAV-based reconnaissance and rescue missions.However,camouflage object segmentation is increasingly challenging due to advances in both camouflage materials and biological mimicry.Although multispectral-RGB based technology shows promise,conventional dual-aperture multispectral-RGB imaging systems are constrained by imprecise and time-consuming registration and fusion across different modalities,limiting their performance.Here,we propose the Reconstructed Multispectral-RGB Fusion Network(RMRF-Net),which reconstructs RGB images into multispectral ones,enabling efficient multimodal segmentation using only an RGB camera.Specifically,RMRF-Net employs a divergentsimilarity feature correction strategy to minimize reconstruction errors and includes an efficient boundary-aware decoder to enhance object contours.Notably,we establish the first real-world aerial multispectral-RGB semantic segmentation of camouflage objects dataset,including 11 object categories.Experimental results demonstrate that RMRF-Net outperforms existing methods,achieving 17.38 FPS on the NVIDIA Jetson AGX Orin,with only a 0.96%drop in mIoU compared to the RTX 3090,showing its practical applicability in multimodal remote sensing.
文摘This paper introduces deep gradient network(DGNet),a novel deep framework that exploits object gradient supervision for camouflaged object detection(COD).It decouples the task into two connected branches,i.e.,a context and a texture encoder.The es-sential connection is the gradient-induced transition,representing a soft grouping between context and texture features.Benefiting from the simple but efficient framework,DGNet outperforms existing state-of-the-art COD models by a large margin.Notably,our efficient version,DGNet-S,runs in real-time(80 fps)and achieves comparable results to the cutting-edge model JCSOD-CVPR21 with only 6.82%parameters.The application results also show that the proposed DGNet performs well in the polyp segmentation,defect detec-tion,and transparent object segmentation tasks.The code will be made available at https://github.com/GewelsJI/DGNet.
基金supported by the NationalNatural Science Foundation of China Nos.62302167,U23A20343Shanghai Sailing Program(23YF1410500)Chenguang Program of Shanghai Education Development Foundation and Shanghai Municipal Education Commission(23CGA34).
文摘Confusing object detection(COD),such as glass,mirrors,and camouflaged objects,represents a burgeoning visual detection task centered on pinpointing and distinguishing concealed targets within intricate backgrounds,leveraging deep learning methodologies.Despite garnering increasing attention in computer vision,the focus of most existing works leans toward formulating task-specific solutions rather than delving into in-depth analyses of methodological structures.As of now,there is a notable absence of a comprehensive systematic review that focuses on recently proposed deep learning-based models for these specific tasks.To fill this gap,our study presents a pioneering review that covers both themodels and the publicly available benchmark datasets,while also identifying potential directions for future research in this field.The current dataset primarily focuses on single confusing object detection at the image level,with some studies extending to video-level data.We conduct an in-depth analysis of deep learning architectures,revealing that the current state-of-the-art(SOTA)COD methods demonstrate promising performance in single object detection.We also compile and provide detailed descriptions ofwidely used datasets relevant to these detection tasks.Our endeavor extends to discussing the limitations observed in current methodologies,alongside proposed solutions aimed at enhancing detection accuracy.Additionally,we deliberate on relevant applications and outline future research trajectories,aiming to catalyze advancements in the field of glass,mirror,and camouflaged object detection.
基金supported by the Fundamental Research Funds for the Central Universities(No.Nankai University,63243150).
文摘We introduce a novel bilateral reference framework(BiRefNet)for high-resolution dichotomous image segmentation(DIS).It comprises two essential components:the localization module(LM)and the reconstruction module(RM)with our proposed bilateral reference(BiRef).LM aids in object localization using global semantic information.Within the RM,we utilize BiRef for the reconstruction process,where hierarchical patches of images provide the source reference,and gradient maps serve as the target reference.These components collaborate to generate the final predicted maps.We also introduce auxiliary gradient supervision to enhance the focus on regions with finer details.In addition,we outline practical training strategies tailored for DIS to improve map quality and the training process.To validate the general applicability of our approach,we conduct extensive experiments on four tasks to evince that BiRefNet exhibits remarkable performance,outperforming task-specific cutting-edge methods across all benchmarks.Our codes are publicly available at https://github.com/ZhengPeng7/BiRefNet.
文摘The burgeoning field of Camouflaged Object Detection(COD)seeks to identify objects that blend into their surroundings.Despite the impressive performance of recent learning-based models,their robustness is limited,as existing methods may misclassify salient objects as camouflaged ones,despite these contradictory characteristics.This limitation may stem from the lack of multipattern training images,leading to reduced robustness against salient objects.To overcome the scarcity of multi-pattern training images,we introduce CamDiff,a novel approach inspired by AI-Generated Content(AIGC).Specifically,we leverage a latent diffusion model to synthesize salient objects in camouflaged scenes,while using the zero-shot image classification ability of the Contrastive Language-Image Pre-training(CLIP)model to prevent synthesis failures and ensure that the synthesized objects align with the input prompt.Consequently,the synthesized image retains its original camouflage label while incorporating salient objects,yielding camouflaged scenes with richer characteristics.The results of user studies show that the salient objects in our synthesized scenes attract the user’s attention more;thus,such samples pose a greater challenge to the existing COD models.Our CamDiff enables flexible editing and effcient large-scale dataset generation at a low cost.It significantly enhances the training and testing phases of COD baselines,granting them robustness across diverse domains.Our newly generated datasets and source code are available at https://github.com/drlxj/CamDiff.