Few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated data.The transfer learning-based solution becomes popular due to its simple training with good a...Few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated data.The transfer learning-based solution becomes popular due to its simple training with good accuracy,however,it is still challenging to enrich the feature diversity during the training process.And fine-grained features are also insufficient for novel class detection.To deal with the problems,this paper proposes a novel few-shot object detection method based on dual-domain feature fusion and patch-level attention.Upon original base domain,an elementary domain with more category-agnostic features is superposed to construct a two-stream backbone,which benefits to enrich the feature diversity.To better integrate various features,a dual-domain feature fusion is designed,where the feature pairs with the same size are complementarily fused to extract more discriminative features.Moreover,a patch-wise feature refinement termed as patch-level attention is presented to mine internal relations among the patches,which enhances the adaptability to novel classes.In addition,a weighted classification loss is given to assist the fine-tuning of the classifier by combining extra features from FPN of the base training model.In this way,the few-shot detection quality to novel class objects is improved.Experiments on PASCAL VOC and MS COCO datasets verify the effectiveness of the method.展开更多
设计图像块特征表示是计算机视觉领域内的基本研究内容,优秀的图像块特征表示能够有效地提高图像分类、对象识别等相关算法的性能.SIFT(scale-invariant feature transform)和HOG(histogram of oriented gradient)是人为设计图像块特征...设计图像块特征表示是计算机视觉领域内的基本研究内容,优秀的图像块特征表示能够有效地提高图像分类、对象识别等相关算法的性能.SIFT(scale-invariant feature transform)和HOG(histogram of oriented gradient)是人为设计图像块特征表示的优秀代表,然而,人为设计图像块特征间的差异往往不能足够理想地反映图像块间的相似性.核描述子(kernel descriptor,简称KD)方法提供了一种新的方式生成图像块特征,在图像块间匹配核函数基础上,应用核主成分分析(kernel principal component analysis,简称KPCA)方法进行特征表示,且在图像分类应用上获得不错的性能.但是,该方法需要利用所有联合基向量去生成核描述子特征,导致算法时间复杂度较高.为了解决这个问题,提出了一种算法生成图像块特征表示,称为有效图像块描述子(efficient patch-level descriptor,简称EPLd).算法建立在不完整Cholesky分解基础上,自动选择少量的标志性图像块以提高算法效率,且利用MMD(maximum mean discrepancy)距离计算图像间的相似性.实验结果表明,该算法在图像/场景分类应用中获得了优秀的性能.展开更多
基金supported in part by Beijing Natural Science Foundation(Nos.L233030 and 2022MQ05)in part by the National Natural Science Foundation of China(Nos.62073322,61836015,and 61633020).
文摘Few-shot object detection receives much attention with the ability to detect novel class objects using limited annotated data.The transfer learning-based solution becomes popular due to its simple training with good accuracy,however,it is still challenging to enrich the feature diversity during the training process.And fine-grained features are also insufficient for novel class detection.To deal with the problems,this paper proposes a novel few-shot object detection method based on dual-domain feature fusion and patch-level attention.Upon original base domain,an elementary domain with more category-agnostic features is superposed to construct a two-stream backbone,which benefits to enrich the feature diversity.To better integrate various features,a dual-domain feature fusion is designed,where the feature pairs with the same size are complementarily fused to extract more discriminative features.Moreover,a patch-wise feature refinement termed as patch-level attention is presented to mine internal relations among the patches,which enhances the adaptability to novel classes.In addition,a weighted classification loss is given to assist the fine-tuning of the classifier by combining extra features from FPN of the base training model.In this way,the few-shot detection quality to novel class objects is improved.Experiments on PASCAL VOC and MS COCO datasets verify the effectiveness of the method.
文摘设计图像块特征表示是计算机视觉领域内的基本研究内容,优秀的图像块特征表示能够有效地提高图像分类、对象识别等相关算法的性能.SIFT(scale-invariant feature transform)和HOG(histogram of oriented gradient)是人为设计图像块特征表示的优秀代表,然而,人为设计图像块特征间的差异往往不能足够理想地反映图像块间的相似性.核描述子(kernel descriptor,简称KD)方法提供了一种新的方式生成图像块特征,在图像块间匹配核函数基础上,应用核主成分分析(kernel principal component analysis,简称KPCA)方法进行特征表示,且在图像分类应用上获得不错的性能.但是,该方法需要利用所有联合基向量去生成核描述子特征,导致算法时间复杂度较高.为了解决这个问题,提出了一种算法生成图像块特征表示,称为有效图像块描述子(efficient patch-level descriptor,简称EPLd).算法建立在不完整Cholesky分解基础上,自动选择少量的标志性图像块以提高算法效率,且利用MMD(maximum mean discrepancy)距离计算图像间的相似性.实验结果表明,该算法在图像/场景分类应用中获得了优秀的性能.