Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world appli...Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world applications.Few-shot object detection presents a new research idea that aims to localize and classify objects in images using only limited annotated examples.However,the inherent challenge in few-shot object detection lies in the insufficient sample diversity to fully characterize the sample feature distribution,which consequently impacts model performance.Inspired by contrastive learning principles,we propose an Implicit Feature Contrastive Learning(IFCL)module to address this limitation and augment feature diversity for more robust representational learning.This module generates augmented support sample features in a mixed feature space and implicitly contrasts them with query Region of Interest(RoI)features.This approach facilitates more comprehensive learning of both intra-class feature similarity and inter-class feature diversity,thereby enhancing the model’s object classification and localization capabilities.Extensive experiments on PASCAL VOC show that our method achieves a respective improvement of 3.2%,1.8%,and 2.3%on 10-shot of three Novel Sets compared to the baseline model FPD.展开更多
Few-shot point cloud 3D object detection(FS3D)aims to identify and locate objects of novel classes within point clouds using knowledge acquired from annotated base classes and a minimal number of samples from the nove...Few-shot point cloud 3D object detection(FS3D)aims to identify and locate objects of novel classes within point clouds using knowledge acquired from annotated base classes and a minimal number of samples from the novel classes.Due to imbalanced training data,existing FS3D methods based on fully supervised learning can lead to overfitting toward base classes,which impairs the network’s ability to generalize knowledge learned from base classes to novel classes and also prevents the network from extracting distinctive foreground and background representations for novel class objects.To address these issues,this thesis proposes a category-agnostic contrastive learning approach,enhancing the generalization and identification abilities for almost unseen categories through the construction of pseudo-labels and positive-negative sample pairs unrelated to specific classes.Firstly,this thesis designs a proposal-wise context contrastive module(CCM).By reducing the distance between foreground point features and increasing the distance between foreground and background point features within a region proposal,CCM aids the network in extracting more discriminative foreground and background feature representations without reliance on categorical annotations.Secondly,this thesis utilizes a geometric contrastive module(GCM),which enhances the network’s geometric perception capability by employing contrastive learning on the foreground point features associated with various basic geometric components,such as edges,corners,and surfaces,thereby enabling these geometric components to exhibit more distinguishable representations.This thesis also combines category-aware contrastive learning with former modules to maintain categorical distinctiveness.Extensive experimental results on FS-SUNRGBD and FS-ScanNet datasets demonstrate the effectiveness of this method with average precision exceeding the baseline by up to 8%.展开更多
In the metric-based meta-learning detection model,the distribution of training samples in the metric space has great influence on the detection performance,and this influence is usually ignored by traditional meta-det...In the metric-based meta-learning detection model,the distribution of training samples in the metric space has great influence on the detection performance,and this influence is usually ignored by traditional meta-detectors.In addition,the design of metric space might be interfered with by the background noise of training samples.To tackle these issues,we propose a metric space optimisation method based on hyperbolic geometry attention and class-agnostic activation maps.First,the geometric properties of hyperbolic spaces to establish a structured metric space are used.A variety of feature samples of different classes are embedded into the hyperbolic space with extremely low distortion.This metric space is more suitable for representing tree-like structures between categories for image scene analysis.Meanwhile,a novel similarity measure function based on Poincarédistance is proposed to evaluate the distance of various types of objects in the feature space.In addition,the class-agnostic activation maps(CCAMs)are employed to re-calibrate the weight of foreground feature information and suppress background information.Finally,the decoder processes the high-level feature information as the decoding of the query object and detects objects by predicting their locations and corresponding task encodings.Experimental evaluation is conducted on Pascal VOC and MS COCO datasets.The experiment results show that the effectiveness of the authors’method surpasses the performance baseline of the excellent few-shot detection models.展开更多
Traditional object detectors based on deep learning rely on plenty of labeled samples,which are expensive to obtain.Few-shot object detection(FSOD)attempts to solve this problem,learning detection objects from a few l...Traditional object detectors based on deep learning rely on plenty of labeled samples,which are expensive to obtain.Few-shot object detection(FSOD)attempts to solve this problem,learning detection objects from a few labeled samples,but the performance is often unsatisfactory due to the scarcity of samples.We believe that the main reasons that restrict the performance of few-shot detectors are:(1)the positive samples is scarce,and(2)the quality of positive samples is low.Therefore,we put forward a novel few-shot object detector based on YOLOv4,starting from both improving the quantity and quality of positive samples.First,we design a hybrid multivariate positive sample augmentation(HMPSA)module to amplify the quantity of positive samples and increase positive sample diversity while suppressing negative samples.Then,we design a selective non-local fusion attention(SNFA)module to help the detector better learn the target features and improve the feature quality of positive samples.Finally,we optimize the loss function to make it more suitable for the task of FSOD.Experimental results on PASCAL VOC and MS COCO demonstrate that our designed few-shot object detector has competitive performance with other state-of-the-art detectors.展开更多
Few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated data.The transfer learning-based solution becomes popular due to its simple training with good a...Few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated data.The transfer learning-based solution becomes popular due to its simple training with good accuracy,however,it is still challenging to enrich the feature diversity during the training process.And fine-grained features are also insufficient for novel class detection.To deal with the problems,this paper proposes a novel few-shot object detection method based on dual-domain feature fusion and patch-level attention.Upon original base domain,an elementary domain with more category-agnostic features is superposed to construct a two-stream backbone,which benefits to enrich the feature diversity.To better integrate various features,a dual-domain feature fusion is designed,where the feature pairs with the same size are complementarily fused to extract more discriminative features.Moreover,a patch-wise feature refinement termed as patch-level attention is presented to mine internal relations among the patches,which enhances the adaptability to novel classes.In addition,a weighted classification loss is given to assist the fine-tuning of the classifier by combining extra features from FPN of the base training model.In this way,the few-shot detection quality to novel class objects is improved.Experiments on PASCAL VOC and MS COCO datasets verify the effectiveness of the method.展开更多
The task of few‐shot object detection is to classify and locate objects through a few annotated samples.Although many studies have tried to solve this problem,the results are still not satisfactory.Recent studies hav...The task of few‐shot object detection is to classify and locate objects through a few annotated samples.Although many studies have tried to solve this problem,the results are still not satisfactory.Recent studies have found that the class margin significantly impacts the classification and representation of the targets to be detected.Most methods use the loss function to balance the class margin,but the results show that the loss‐based methods only have a tiny improvement on the few‐shot object detection problem.In this study,the authors propose a class encoding method based on the transformer to balance the class margin,which can make the model pay more attention to the essential information of the features,thus increasing the recognition ability of the sample.Besides,the authors propose a multi‐target decoding method to aggregate RoI vectors generated from multi‐target images with multiple support vectors,which can significantly improve the detection ability of the detector for multi‐target images.Experiments on Pascal visual object classes(VOC)and Microsoft Common Objects in Context datasets show that our proposed Few‐Shot Object Detection via Class Encoding and Multi‐Target Decoding significantly improves upon baseline detectors(average accuracy improvement is up to 10.8%on VOC and 2.1%on COCO),achieving competitive performance.In general,we propose a new way to regulate the class margin between support set vectors and a way of feature aggregation for images containing multiple objects and achieve remarkable results.Our method is implemented on mmfewshot,and the code will be available later.展开更多
基金funded by the China Chongqing Municipal Science and Technology Bureau,grant numbers CSTB2024TIAD-CYKJCXX0009,CSTB2024NSCQ-LZX0043,CSTB2022NSCQ-MSX0288Chongqing Municipal Commission of Housing and Urban-Rural Development,grant number CKZ2024-87+3 种基金the Chongqing University of Technology Graduate Education High-Quality Development Project,grant number gzlsz202401the Chongqing University of Technology—Chongqing LINGLUE Technology Co.,Ltd.Electronic Information(Artificial Intelligence)Graduate Joint Training Basethe Postgraduate Education and Teaching Reform Research Project in Chongqing,grant number yjg213116the Chongqing University of Technology-CISDI Chongqing Information Technology Co.,Ltd.Computer Technology Graduate Joint Training Base.
文摘Although conventional object detection methods achieve high accuracy through extensively annotated datasets,acquiring such large-scale labeled data remains challenging and cost-prohibitive in numerous real-world applications.Few-shot object detection presents a new research idea that aims to localize and classify objects in images using only limited annotated examples.However,the inherent challenge in few-shot object detection lies in the insufficient sample diversity to fully characterize the sample feature distribution,which consequently impacts model performance.Inspired by contrastive learning principles,we propose an Implicit Feature Contrastive Learning(IFCL)module to address this limitation and augment feature diversity for more robust representational learning.This module generates augmented support sample features in a mixed feature space and implicitly contrasts them with query Region of Interest(RoI)features.This approach facilitates more comprehensive learning of both intra-class feature similarity and inter-class feature diversity,thereby enhancing the model’s object classification and localization capabilities.Extensive experiments on PASCAL VOC show that our method achieves a respective improvement of 3.2%,1.8%,and 2.3%on 10-shot of three Novel Sets compared to the baseline model FPD.
文摘Few-shot point cloud 3D object detection(FS3D)aims to identify and locate objects of novel classes within point clouds using knowledge acquired from annotated base classes and a minimal number of samples from the novel classes.Due to imbalanced training data,existing FS3D methods based on fully supervised learning can lead to overfitting toward base classes,which impairs the network’s ability to generalize knowledge learned from base classes to novel classes and also prevents the network from extracting distinctive foreground and background representations for novel class objects.To address these issues,this thesis proposes a category-agnostic contrastive learning approach,enhancing the generalization and identification abilities for almost unseen categories through the construction of pseudo-labels and positive-negative sample pairs unrelated to specific classes.Firstly,this thesis designs a proposal-wise context contrastive module(CCM).By reducing the distance between foreground point features and increasing the distance between foreground and background point features within a region proposal,CCM aids the network in extracting more discriminative foreground and background feature representations without reliance on categorical annotations.Secondly,this thesis utilizes a geometric contrastive module(GCM),which enhances the network’s geometric perception capability by employing contrastive learning on the foreground point features associated with various basic geometric components,such as edges,corners,and surfaces,thereby enabling these geometric components to exhibit more distinguishable representations.This thesis also combines category-aware contrastive learning with former modules to maintain categorical distinctiveness.Extensive experimental results on FS-SUNRGBD and FS-ScanNet datasets demonstrate the effectiveness of this method with average precision exceeding the baseline by up to 8%.
基金National Natural Science Foundation of China,Grant/Award Number:61602157Henan scientific and technological project,Grant/Award Number:242102210020Basal Research Fund,Grant/Award Number:NSFRF240618。
文摘In the metric-based meta-learning detection model,the distribution of training samples in the metric space has great influence on the detection performance,and this influence is usually ignored by traditional meta-detectors.In addition,the design of metric space might be interfered with by the background noise of training samples.To tackle these issues,we propose a metric space optimisation method based on hyperbolic geometry attention and class-agnostic activation maps.First,the geometric properties of hyperbolic spaces to establish a structured metric space are used.A variety of feature samples of different classes are embedded into the hyperbolic space with extremely low distortion.This metric space is more suitable for representing tree-like structures between categories for image scene analysis.Meanwhile,a novel similarity measure function based on Poincarédistance is proposed to evaluate the distance of various types of objects in the feature space.In addition,the class-agnostic activation maps(CCAMs)are employed to re-calibrate the weight of foreground feature information and suppress background information.Finally,the decoder processes the high-level feature information as the decoding of the query object and detects objects by predicting their locations and corresponding task encodings.Experimental evaluation is conducted on Pascal VOC and MS COCO datasets.The experiment results show that the effectiveness of the authors’method surpasses the performance baseline of the excellent few-shot detection models.
基金the China National Key Research and Development Program(Grant No.2016YFC0802904)National Natural Science Foundation of China(Grant No.61671470)62nd batch of funded projects of China Postdoctoral Science Foundation(Grant No.2017M623423)to provide fund for conducting experiments。
文摘Traditional object detectors based on deep learning rely on plenty of labeled samples,which are expensive to obtain.Few-shot object detection(FSOD)attempts to solve this problem,learning detection objects from a few labeled samples,but the performance is often unsatisfactory due to the scarcity of samples.We believe that the main reasons that restrict the performance of few-shot detectors are:(1)the positive samples is scarce,and(2)the quality of positive samples is low.Therefore,we put forward a novel few-shot object detector based on YOLOv4,starting from both improving the quantity and quality of positive samples.First,we design a hybrid multivariate positive sample augmentation(HMPSA)module to amplify the quantity of positive samples and increase positive sample diversity while suppressing negative samples.Then,we design a selective non-local fusion attention(SNFA)module to help the detector better learn the target features and improve the feature quality of positive samples.Finally,we optimize the loss function to make it more suitable for the task of FSOD.Experimental results on PASCAL VOC and MS COCO demonstrate that our designed few-shot object detector has competitive performance with other state-of-the-art detectors.
基金supported in part by Beijing Natural Science Foundation(Nos.L233030 and 2022MQ05)in part by the National Natural Science Foundation of China(Nos.62073322,61836015,and 61633020).
文摘Few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated data.The transfer learning-based solution becomes popular due to its simple training with good accuracy,however,it is still challenging to enrich the feature diversity during the training process.And fine-grained features are also insufficient for novel class detection.To deal with the problems,this paper proposes a novel few-shot object detection method based on dual-domain feature fusion and patch-level attention.Upon original base domain,an elementary domain with more category-agnostic features is superposed to construct a two-stream backbone,which benefits to enrich the feature diversity.To better integrate various features,a dual-domain feature fusion is designed,where the feature pairs with the same size are complementarily fused to extract more discriminative features.Moreover,a patch-wise feature refinement termed as patch-level attention is presented to mine internal relations among the patches,which enhances the adaptability to novel classes.In addition,a weighted classification loss is given to assist the fine-tuning of the classifier by combining extra features from FPN of the base training model.In this way,the few-shot detection quality to novel class objects is improved.Experiments on PASCAL VOC and MS COCO datasets verify the effectiveness of the method.
基金This work was supported by STI 2030-Major Projects No.2021ZD0201403in part by NSFC No.62088101 Autonomous Intelligent Unmanned Systemsin part by the Open Research Project of the State Key Laboratory of Industrial Control Technology,Zhejiang University,China(No.ICT2022B04).
文摘The task of few‐shot object detection is to classify and locate objects through a few annotated samples.Although many studies have tried to solve this problem,the results are still not satisfactory.Recent studies have found that the class margin significantly impacts the classification and representation of the targets to be detected.Most methods use the loss function to balance the class margin,but the results show that the loss‐based methods only have a tiny improvement on the few‐shot object detection problem.In this study,the authors propose a class encoding method based on the transformer to balance the class margin,which can make the model pay more attention to the essential information of the features,thus increasing the recognition ability of the sample.Besides,the authors propose a multi‐target decoding method to aggregate RoI vectors generated from multi‐target images with multiple support vectors,which can significantly improve the detection ability of the detector for multi‐target images.Experiments on Pascal visual object classes(VOC)and Microsoft Common Objects in Context datasets show that our proposed Few‐Shot Object Detection via Class Encoding and Multi‐Target Decoding significantly improves upon baseline detectors(average accuracy improvement is up to 10.8%on VOC and 2.1%on COCO),achieving competitive performance.In general,we propose a new way to regulate the class margin between support set vectors and a way of feature aggregation for images containing multiple objects and achieve remarkable results.Our method is implemented on mmfewshot,and the code will be available later.