Few-shot image semantic segmentation aims to achieve pixel-level classification for novel classes using only a few labeled examples.The method first trains the segmentation model on base classes,and then adapts it to ...Few-shot image semantic segmentation aims to achieve pixel-level classification for novel classes using only a few labeled examples.The method first trains the segmentation model on base classes,and then adapts it to novel classes.Although existing methods have achieved remarkable performance in few-shot image semantic segmentation,they still face the following challenges.Traditional methods typically rely on mask average pooling to generate single-category prototype vectors and perform feature matching via metric learning,but they exhibit significant limitations in modeling inter-category relationships and addressing complex background interference.Inspired by the analogy-based transfer mechanisms in cognitive psychology,we propose a Generalized Prototype Network(GPNet)to enhance the model's generalization ability for unseen categories and improve robustness in feature matching.GPNet consists of two key modules.The first is a generalized prototype enhancement module,which explores potential inter-category relationships to construct more discriminative category prototype representations.The second is a multi-scale feature alignment module,which dynamically aligns support and query features across multiple scales using an attention mechanism,thus mitigating background interference in complex scenarios.Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches on several few-shot semantic segmentation benchmarks,validating its effectiveness and generalization capabilities.展开更多
The key to the success of few-shot semantic segmentation(FSS)depends on the efficient use of limited annotated support set to accurately segment novel classes in the query set.Due to the few samples in the support set...The key to the success of few-shot semantic segmentation(FSS)depends on the efficient use of limited annotated support set to accurately segment novel classes in the query set.Due to the few samples in the support set,FSS faces challenges such as intra-class differences,background(BG)mismatches between query and support sets,and ambiguous segmentation between the foreground(FG)and BG in the query set.To address these issues,The paper propose a multi-module network called CAMSNet,which includes four modules:the General Information Module(GIM),the Class Activation Map Aggregation(CAMA)module,the Self-Cross Attention(SCA)Block,and the Feature Fusion Module(FFM).In CAMSNet,The GIM employs an improved triplet loss,which concatenates word embedding vectors and support prototypes as anchors,and uses local support features of FG and BG as positive and negative samples to help solve the problem of intra-class differences.Then for the first time,the Class Activation Map(CAM)from the Weakly Supervised Semantic Segmentation(WSSS)is applied to FSS within the CAMA module.This method replaces the traditional use of cosine similarity to locate query information.Subsequently,the SCA Block processes the support and query features aggregated by the CAMA module,significantly enhancing the understanding of input information,leading to more accurate predictions and effectively addressing BG mismatch and ambiguous FG-BG segmentation.Finally,The FFM combines general class information with the enhanced query information to achieve accurate segmentation of the query image.Extensive Experiments on PASCAL and COCO demonstrate that-5i-20ithe CAMSNet yields superior performance and set a state-of-the-art.展开更多
Semantic segmentation of novel object categories with limited labeled data remains a challenging problem in computer vision.Few-shot segmentation methods aim to address this problem by recognizing objects from specifi...Semantic segmentation of novel object categories with limited labeled data remains a challenging problem in computer vision.Few-shot segmentation methods aim to address this problem by recognizing objects from specific target classes with a few provided examples.Previous approaches for few-shot semantic segmentation typically represent target classes using class prototypes.These prototypes are matched with the features of the query set to get segmentation results.However,class prototypes are usually obtained by applying global average pooling on masked support images.Global pooling discards much structural information,which may reduce the accuracy of model predictions.To address this issue,we propose a Category-Guided Frequency Modulation(CGFM)method.CGFM is designed to learn category-specific information in the frequency space and leverage it to provide a twostage guidance for the segmentation process.First,to self-adaptively activate class-relevant frequency bands while suppressing irrelevant ones,we leverage the Dual-Perception Gaussian Band Pre-activation(DPGBP)module to generate Gaussian filters using class embedding vectors.Second,to further enhance category-relevant frequency components in activated bands,we design a Support-Guided Category Response Enhancement(SGCRE)module to effectively introduce support frequency components into the modulation of query frequency features.Experiments on the PASCAL-5^(i) and COCO-20^(i) datasets demonstrate the promising performance of our model.展开更多
Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars.It remains a challenge because of large intra-class variation...Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars.It remains a challenge because of large intra-class variations between the support and query images.Existing approaches utilize 4D convolutions to mine semantic correspondence between the support and query images.However,they still suffer from heavy computation,sparse correspondence,and large memory.We propose axial assembled correspondence network(AACNet)to alleviate these issues.The key point of AACNet is the proposed axial assembled 4D kernel,which constructs the basic block for semantic correspondence encoder(SCE).Furthermore,we propose the deblurring equations to provide more robust correspondence for the aforementioned SCE and design a novel fusion module to mix correspondences in a learnable manner.Experiments on PASCAL-5~i reveal that our AACNet achieves a mean intersection-over-union score of 65.9%for 1-shot segmentation and 70.6%for 5-shot segmentation,surpassing the state-of-the-art method by 5.8%and 5.0%respectively.展开更多
This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation an...This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.展开更多
Deep Convolution Neural Networks(DCNNs)can capture discriminative features from large datasets.However,how to incrementally learn new samples without forgetting old ones and recognize novel classes that arise in the d...Deep Convolution Neural Networks(DCNNs)can capture discriminative features from large datasets.However,how to incrementally learn new samples without forgetting old ones and recognize novel classes that arise in the dynamically changing world,e.g.,classifying newly discovered fish species,remains an open problem.We address an even more challenging and realistic setting of this problem where new class samples are insufficient,i.e.,Few-Shot Class-Incremental Learning(FSCIL).Current FSCIL methods augment the training data to alleviate the overfitting of novel classes.By contrast,we propose Filter Bank Networks(FBNs)that augment the learnable filters to capture fine-detailed features for adapting to future new classes.In the forward pass,FBNs augment each convolutional filter to a virtual filter bank containing the canonical one,i.e.,itself,and multiple transformed versions.During back-propagation,FBNs explicitly stimulate fine-detailed features to emerge and collectively align all gradients of each filter bank to learn the canonical one.FBNs capture pattern variants that do not yet exist in the pretraining session,thus making it easy to incorporate new classes in the incremental learning phase.Moreover,FBNs introduce model-level prior knowledge to efficiently utilize the limited few-shot data.Extensive experiments on MNIST,CIFAR100,CUB200,andMini-ImageNet datasets show that FBNs consistently outperformthe baseline by a significantmargin,reporting new state-of-the-art FSCIL results.In addition,we contribute a challenging FSCIL benchmark,Fishshot1K,which contains 8261 underwater images covering 1000 ocean fish species.The code is included in the supplementary materials.展开更多
Medical image segmentation is critical for clinical diagnosis,but the scarcity of annotated data limits robust model training,making few-shot learning indispensable.Existing methods often suffer from two issues—perfo...Medical image segmentation is critical for clinical diagnosis,but the scarcity of annotated data limits robust model training,making few-shot learning indispensable.Existing methods often suffer from two issues—performance degradation due to significant inter-class variations in pathological structures,and overreliance on attention mechanisms with high computational complexity(O(n2)),which hinders the efficient modeling of long-range dependencies.In contrast,the state space model(SSM)offers linear complexity(O(n))and superior efficiency,making it a key solution.To address these challenges,we propose PPFFR(parallel prototype filter and feature refinement)for few-shot medical image segmentation.The proposed framework comprises three key modules.First,we propose the prototype refinement(PR)module to construct refined class subgraphs from encoder-extracted features of both support and query images,which generates support prototypes with minimized inter-class variation.We then propose the parallel prototype filter(PPF)module to suppress background interference and enhance the correlation between support and query prototypes.Finally,we implement the feature refinement(FR)module to further enhance segmentation accuracy and accelerate model convergence with SSM’s robust long-range dependency modeling capability,integrated with multi-head attention(MHA)to preserve spatial details.Experimental results on the Abd-MRI dataset demonstrate that FR with MHA outperforms FR alone in segmenting the left kidney,right kidney,liver,and spleen,and in terms of mean accuracy,confirming MHA’s role in improving precision.In extensive experiments conducted on three public datasets under the 1-way 1-shot setting,PPFFR achieves Dice scores of 87.62%,86.74%,and 79.71%separately,consistently surpassing state-of-the-art few-shot medical image segmentation methods.As the critical component,SSM ensures that PPFFR balances performance with efficiency.Ablation studies validate the effectiveness of the PR,PPF,and FR modules.The results indicate that explicit inter-class variation reduction and SSM-based feature refinement can enhance accuracy without heavy computational overhead.In conclusion,PPFFR effectively enhances inter-class consistency and computational efficiency for few-shot medical image segmentation.This work provides insights for few-shot learning in medical imaging and inspires lightweight architecture designs for clinical deployment.展开更多
This paper presents a novel approach for tire-pattern classification,aimed at conducting forensic analysis on tire marks discovered at crime scenes.The classification model proposed in this study accounts for the intr...This paper presents a novel approach for tire-pattern classification,aimed at conducting forensic analysis on tire marks discovered at crime scenes.The classification model proposed in this study accounts for the intricate and dynamic nature of tire prints found in real-world scenarios,including accident sites.To address this complexity,the classifier model was developed to harness the meta-learning capabilities of few-shot learning algorithms(learning-to-learn).The model is meticulously designed and optimized to effectively classify both tire patterns exhibited on wheels and tire-indentation marks visible on surfaces due to friction.This is achieved by employing a semantic segmentation model to extract the tire pattern marks within the image.These marks are subsequently used as a mask channel,combined with the original image,and fed into the classifier to perform classification.Overall,The proposed model follows a three-step process:(i)the Bilateral Segmentation Network is employed to derive the semantic segmentation of the tire pattern within a given image.(ii)utilizing the semantic image in conjunction with the original image,the model learns and clusters groups to generate vectors that define the relative position of the image in the test set.(iii)the model performs predictions based on these learned features.Empirical verification demonstrates usage of semantic model to extract the tire patterns before performing classification increases the overall accuracy of classification by∼4%.展开更多
Inverse Synthetic Aperture Radar(ISAR)images of complex targets have a low Signal-to-Noise Ratio(SNR)and contain fuzzy edges and large differences in scattering intensity,which limits the recognition performance of IS...Inverse Synthetic Aperture Radar(ISAR)images of complex targets have a low Signal-to-Noise Ratio(SNR)and contain fuzzy edges and large differences in scattering intensity,which limits the recognition performance of ISAR systems.Also,data scarcity poses a greater challenge to the accurate recognition of components.To address the issues of component recognition in complex ISAR targets,this paper adopts semantic segmentation and proposes a few-shot semantic segmentation framework fusing multimodal features.The scarcity of available data is mitigated by using a two-branch scattering feature encoding structure.Then,the high-resolution features are obtained by fusing the ISAR image texture features and scattering quantization information of complex-valued echoes,thereby achieving significantly higher structural adaptability.Meanwhile,the scattering trait enhancement module and the statistical quantification module are designed.The edge texture is enhanced based on the scatter quantization property,which alleviates the segmentation challenge of edge blurring under low SNR conditions.The coupling of query/support samples is enhanced through four-dimensional convolution.Additionally,to overcome fusion challenges caused by information differences,multimodal feature fusion is guided by equilibrium comprehension loss.In this way,the performance potential of the fusion framework is fully unleashed,and the decision risk is effectively reduced.Experiments demonstrate the great advantages of the proposed framework in multimodal feature fusion,and it still exhibits great component segmentation capability under low SNR/edge blurring conditions.展开更多
目前,主流的基于点击的交互式分割方法对所有用户点击进行无差异的编码.这样的编码方法意味着用户的交互只能给神经网络提供目标的位置信息,且每次点击的影响力是相同的.然而,不同阶段的点击的影响力是不同的.早期的交互用于目标轮廓的...目前,主流的基于点击的交互式分割方法对所有用户点击进行无差异的编码.这样的编码方法意味着用户的交互只能给神经网络提供目标的位置信息,且每次点击的影响力是相同的.然而,不同阶段的点击的影响力是不同的.早期的交互用于目标轮廓的选择,中后期的交互则偏向于对分割结果的局部细节进行微调.因此,应该适当扩大早期点击的影响力,以便更快地获得目标轮廓,同时削弱中后期点击的影响力,以防止因为超调或歧义而影响交互式分割的收敛性.1)本文提出了一种动态盘码(Dynamic Disk Coding,DDC)算法,该算法将用户的每个点击都编码成一个特定半径的圆盘,以此添加关于点击影响力的先验信息;2)本文提出了一个交互式分割网络DDC-Net,通过交互信息预处理模块加强交互信息,并在分割网络的浅层和深层将交互式信息与语义信息进行混合,缓解交互信息随着网络加深而逐渐衰减的问题;3)本文提出了一种改进的模拟训练策略,使得网络在训练时能够充分学习不同编码半径的点击所具备的不同影响力,从而使得提出的方法兼顾收敛速度和收敛性.通过实验表明,本文提出的使用动态盘码的深度交互式分割方法具有科学性和有效性,相较于基线方法,和分别平均取得3.63%和2.44%的提升.展开更多
Current studies in few-shot semantic segmentation mostly utilize meta-learning frameworks to obtain models that can be generalized to new categories.However,these models trained on base classes with sufficient annotat...Current studies in few-shot semantic segmentation mostly utilize meta-learning frameworks to obtain models that can be generalized to new categories.However,these models trained on base classes with sufficient annotated samples are biased towards these base classes,which results in semantic confusion and ambiguity between base classes and new classes.A strategy is to use an additional base learner to recognize the objects of base classes and then refine the prediction results output by the meta learner.In this way,the interaction between these two learners and the way of combining results from the two learners are important.This paper proposes a new model,namely Distilling Base and Meta(DBAM)network by using self-attention mechanism and contrastive learning to enhance the few-shot segmentation performance.First,the self-attention-based ensemble module(SEM)is proposed to produce a more accurate adjustment factor for improving the fusion of two predictions of the two learners.Second,the prototype feature optimization module(PFOM)is proposed to provide an interaction between the two learners,which enhances the ability to distinguish the base classes from the target class by introducing contrastive learning loss.Extensive experiments have demonstrated that our method improves on the PASCAL-5i under 1-shot and 5-shot settings,respectively.展开更多
文摘Few-shot image semantic segmentation aims to achieve pixel-level classification for novel classes using only a few labeled examples.The method first trains the segmentation model on base classes,and then adapts it to novel classes.Although existing methods have achieved remarkable performance in few-shot image semantic segmentation,they still face the following challenges.Traditional methods typically rely on mask average pooling to generate single-category prototype vectors and perform feature matching via metric learning,but they exhibit significant limitations in modeling inter-category relationships and addressing complex background interference.Inspired by the analogy-based transfer mechanisms in cognitive psychology,we propose a Generalized Prototype Network(GPNet)to enhance the model's generalization ability for unseen categories and improve robustness in feature matching.GPNet consists of two key modules.The first is a generalized prototype enhancement module,which explores potential inter-category relationships to construct more discriminative category prototype representations.The second is a multi-scale feature alignment module,which dynamically aligns support and query features across multiple scales using an attention mechanism,thus mitigating background interference in complex scenarios.Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches on several few-shot semantic segmentation benchmarks,validating its effectiveness and generalization capabilities.
基金supported by funding from the following sources:National Natural Science Foundation of China(U1904119)Research Programs of Henan Science and Technology Department(232102210033,232102210054)+3 种基金Chongqing Natural Science Foundation(CSTB2023NSCQ-MSX0070)Henan Province Key Research and Development Project(231111212000)Aviation Science Foundation(20230001055002)supported by Henan Center for Outstanding Overseas Scientists(GZS2022011).
文摘The key to the success of few-shot semantic segmentation(FSS)depends on the efficient use of limited annotated support set to accurately segment novel classes in the query set.Due to the few samples in the support set,FSS faces challenges such as intra-class differences,background(BG)mismatches between query and support sets,and ambiguous segmentation between the foreground(FG)and BG in the query set.To address these issues,The paper propose a multi-module network called CAMSNet,which includes four modules:the General Information Module(GIM),the Class Activation Map Aggregation(CAMA)module,the Self-Cross Attention(SCA)Block,and the Feature Fusion Module(FFM).In CAMSNet,The GIM employs an improved triplet loss,which concatenates word embedding vectors and support prototypes as anchors,and uses local support features of FG and BG as positive and negative samples to help solve the problem of intra-class differences.Then for the first time,the Class Activation Map(CAM)from the Weakly Supervised Semantic Segmentation(WSSS)is applied to FSS within the CAMA module.This method replaces the traditional use of cosine similarity to locate query information.Subsequently,the SCA Block processes the support and query features aggregated by the CAMA module,significantly enhancing the understanding of input information,leading to more accurate predictions and effectively addressing BG mismatch and ambiguous FG-BG segmentation.Finally,The FFM combines general class information with the enhanced query information to achieve accurate segmentation of the query image.Extensive Experiments on PASCAL and COCO demonstrate that-5i-20ithe CAMSNet yields superior performance and set a state-of-the-art.
文摘Semantic segmentation of novel object categories with limited labeled data remains a challenging problem in computer vision.Few-shot segmentation methods aim to address this problem by recognizing objects from specific target classes with a few provided examples.Previous approaches for few-shot semantic segmentation typically represent target classes using class prototypes.These prototypes are matched with the features of the query set to get segmentation results.However,class prototypes are usually obtained by applying global average pooling on masked support images.Global pooling discards much structural information,which may reduce the accuracy of model predictions.To address this issue,we propose a Category-Guided Frequency Modulation(CGFM)method.CGFM is designed to learn category-specific information in the frequency space and leverage it to provide a twostage guidance for the segmentation process.First,to self-adaptively activate class-relevant frequency bands while suppressing irrelevant ones,we leverage the Dual-Perception Gaussian Band Pre-activation(DPGBP)module to generate Gaussian filters using class embedding vectors.Second,to further enhance category-relevant frequency components in activated bands,we design a Support-Guided Category Response Enhancement(SGCRE)module to effectively introduce support frequency components into the modulation of query frequency features.Experiments on the PASCAL-5^(i) and COCO-20^(i) datasets demonstrate the promising performance of our model.
基金supported in part by the Key Research and Development Program of Guangdong Province(2021B0101200001)the Guangdong Basic and Applied Basic Research Foundation(2020B1515120071)。
文摘Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars.It remains a challenge because of large intra-class variations between the support and query images.Existing approaches utilize 4D convolutions to mine semantic correspondence between the support and query images.However,they still suffer from heavy computation,sparse correspondence,and large memory.We propose axial assembled correspondence network(AACNet)to alleviate these issues.The key point of AACNet is the proposed axial assembled 4D kernel,which constructs the basic block for semantic correspondence encoder(SCE).Furthermore,we propose the deblurring equations to provide more robust correspondence for the aforementioned SCE and design a novel fusion module to mix correspondences in a learnable manner.Experiments on PASCAL-5~i reveal that our AACNet achieves a mean intersection-over-union score of 65.9%for 1-shot segmentation and 70.6%for 5-shot segmentation,surpassing the state-of-the-art method by 5.8%and 5.0%respectively.
基金This work is supported by the National Natural Science Foundation of China under Grant No.62001341the National Natural Science Foundation of Jiangsu Province under Grant No.BK20221379the Jiangsu Engineering Research Center of Digital Twinning Technology for Key Equipment in Petrochemical Process under Grant No.DTEC202104.
文摘This paper focuses on the task of few-shot 3D point cloud semantic segmentation.Despite some progress,this task still encounters many issues due to the insufficient samples given,e.g.,incomplete object segmentation and inaccurate semantic discrimination.To tackle these issues,we first leverage part-whole relationships into the task of 3D point cloud semantic segmentation to capture semantic integrity,which is empowered by the dynamic capsule routing with the module of 3D Capsule Networks(CapsNets)in the embedding network.Concretely,the dynamic routing amalgamates geometric information of the 3D point cloud data to construct higher-level feature representations,which capture the relationships between object parts and their wholes.Secondly,we designed a multi-prototype enhancement module to enhance the prototype discriminability.Specifically,the single-prototype enhancement mechanism is expanded to the multi-prototype enhancement version for capturing rich semantics.Besides,the shot-correlation within the category is calculated via the interaction of different samples to enhance the intra-category similarity.Ablation studies prove that the involved part-whole relations and proposed multi-prototype enhancement module help to achieve complete object segmentation and improve semantic discrimination.Moreover,under the integration of these two modules,quantitative and qualitative experiments on two public benchmarks,including S3DIS and ScanNet,indicate the superior performance of the proposed framework on the task of 3D point cloud semantic segmentation,compared to some state-of-the-art methods.
基金support from the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No.XDA27000000.
文摘Deep Convolution Neural Networks(DCNNs)can capture discriminative features from large datasets.However,how to incrementally learn new samples without forgetting old ones and recognize novel classes that arise in the dynamically changing world,e.g.,classifying newly discovered fish species,remains an open problem.We address an even more challenging and realistic setting of this problem where new class samples are insufficient,i.e.,Few-Shot Class-Incremental Learning(FSCIL).Current FSCIL methods augment the training data to alleviate the overfitting of novel classes.By contrast,we propose Filter Bank Networks(FBNs)that augment the learnable filters to capture fine-detailed features for adapting to future new classes.In the forward pass,FBNs augment each convolutional filter to a virtual filter bank containing the canonical one,i.e.,itself,and multiple transformed versions.During back-propagation,FBNs explicitly stimulate fine-detailed features to emerge and collectively align all gradients of each filter bank to learn the canonical one.FBNs capture pattern variants that do not yet exist in the pretraining session,thus making it easy to incorporate new classes in the incremental learning phase.Moreover,FBNs introduce model-level prior knowledge to efficiently utilize the limited few-shot data.Extensive experiments on MNIST,CIFAR100,CUB200,andMini-ImageNet datasets show that FBNs consistently outperformthe baseline by a significantmargin,reporting new state-of-the-art FSCIL results.In addition,we contribute a challenging FSCIL benchmark,Fishshot1K,which contains 8261 underwater images covering 1000 ocean fish species.The code is included in the supplementary materials.
基金Project supported by the National Natural Science Foundation of China(No.62272027)the Beijing Natural Science Foundation(No.4232012)the Henan Postdoctoral Foundation(No.335614)。
文摘Medical image segmentation is critical for clinical diagnosis,but the scarcity of annotated data limits robust model training,making few-shot learning indispensable.Existing methods often suffer from two issues—performance degradation due to significant inter-class variations in pathological structures,and overreliance on attention mechanisms with high computational complexity(O(n2)),which hinders the efficient modeling of long-range dependencies.In contrast,the state space model(SSM)offers linear complexity(O(n))and superior efficiency,making it a key solution.To address these challenges,we propose PPFFR(parallel prototype filter and feature refinement)for few-shot medical image segmentation.The proposed framework comprises three key modules.First,we propose the prototype refinement(PR)module to construct refined class subgraphs from encoder-extracted features of both support and query images,which generates support prototypes with minimized inter-class variation.We then propose the parallel prototype filter(PPF)module to suppress background interference and enhance the correlation between support and query prototypes.Finally,we implement the feature refinement(FR)module to further enhance segmentation accuracy and accelerate model convergence with SSM’s robust long-range dependency modeling capability,integrated with multi-head attention(MHA)to preserve spatial details.Experimental results on the Abd-MRI dataset demonstrate that FR with MHA outperforms FR alone in segmenting the left kidney,right kidney,liver,and spleen,and in terms of mean accuracy,confirming MHA’s role in improving precision.In extensive experiments conducted on three public datasets under the 1-way 1-shot setting,PPFFR achieves Dice scores of 87.62%,86.74%,and 79.71%separately,consistently surpassing state-of-the-art few-shot medical image segmentation methods.As the critical component,SSM ensures that PPFFR balances performance with efficiency.Ablation studies validate the effectiveness of the PR,PPF,and FR modules.The results indicate that explicit inter-class variation reduction and SSM-based feature refinement can enhance accuracy without heavy computational overhead.In conclusion,PPFFR effectively enhances inter-class consistency and computational efficiency for few-shot medical image segmentation.This work provides insights for few-shot learning in medical imaging and inspires lightweight architecture designs for clinical deployment.
文摘This paper presents a novel approach for tire-pattern classification,aimed at conducting forensic analysis on tire marks discovered at crime scenes.The classification model proposed in this study accounts for the intricate and dynamic nature of tire prints found in real-world scenarios,including accident sites.To address this complexity,the classifier model was developed to harness the meta-learning capabilities of few-shot learning algorithms(learning-to-learn).The model is meticulously designed and optimized to effectively classify both tire patterns exhibited on wheels and tire-indentation marks visible on surfaces due to friction.This is achieved by employing a semantic segmentation model to extract the tire pattern marks within the image.These marks are subsequently used as a mask channel,combined with the original image,and fed into the classifier to perform classification.Overall,The proposed model follows a three-step process:(i)the Bilateral Segmentation Network is employed to derive the semantic segmentation of the tire pattern within a given image.(ii)utilizing the semantic image in conjunction with the original image,the model learns and clusters groups to generate vectors that define the relative position of the image in the test set.(iii)the model performs predictions based on these learned features.Empirical verification demonstrates usage of semantic model to extract the tire patterns before performing classification increases the overall accuracy of classification by∼4%.
文摘Inverse Synthetic Aperture Radar(ISAR)images of complex targets have a low Signal-to-Noise Ratio(SNR)and contain fuzzy edges and large differences in scattering intensity,which limits the recognition performance of ISAR systems.Also,data scarcity poses a greater challenge to the accurate recognition of components.To address the issues of component recognition in complex ISAR targets,this paper adopts semantic segmentation and proposes a few-shot semantic segmentation framework fusing multimodal features.The scarcity of available data is mitigated by using a two-branch scattering feature encoding structure.Then,the high-resolution features are obtained by fusing the ISAR image texture features and scattering quantization information of complex-valued echoes,thereby achieving significantly higher structural adaptability.Meanwhile,the scattering trait enhancement module and the statistical quantification module are designed.The edge texture is enhanced based on the scatter quantization property,which alleviates the segmentation challenge of edge blurring under low SNR conditions.The coupling of query/support samples is enhanced through four-dimensional convolution.Additionally,to overcome fusion challenges caused by information differences,multimodal feature fusion is guided by equilibrium comprehension loss.In this way,the performance potential of the fusion framework is fully unleashed,and the decision risk is effectively reduced.Experiments demonstrate the great advantages of the proposed framework in multimodal feature fusion,and it still exhibits great component segmentation capability under low SNR/edge blurring conditions.
文摘目前,主流的基于点击的交互式分割方法对所有用户点击进行无差异的编码.这样的编码方法意味着用户的交互只能给神经网络提供目标的位置信息,且每次点击的影响力是相同的.然而,不同阶段的点击的影响力是不同的.早期的交互用于目标轮廓的选择,中后期的交互则偏向于对分割结果的局部细节进行微调.因此,应该适当扩大早期点击的影响力,以便更快地获得目标轮廓,同时削弱中后期点击的影响力,以防止因为超调或歧义而影响交互式分割的收敛性.1)本文提出了一种动态盘码(Dynamic Disk Coding,DDC)算法,该算法将用户的每个点击都编码成一个特定半径的圆盘,以此添加关于点击影响力的先验信息;2)本文提出了一个交互式分割网络DDC-Net,通过交互信息预处理模块加强交互信息,并在分割网络的浅层和深层将交互式信息与语义信息进行混合,缓解交互信息随着网络加深而逐渐衰减的问题;3)本文提出了一种改进的模拟训练策略,使得网络在训练时能够充分学习不同编码半径的点击所具备的不同影响力,从而使得提出的方法兼顾收敛速度和收敛性.通过实验表明,本文提出的使用动态盘码的深度交互式分割方法具有科学性和有效性,相较于基线方法,和分别平均取得3.63%和2.44%的提升.
文摘Current studies in few-shot semantic segmentation mostly utilize meta-learning frameworks to obtain models that can be generalized to new categories.However,these models trained on base classes with sufficient annotated samples are biased towards these base classes,which results in semantic confusion and ambiguity between base classes and new classes.A strategy is to use an additional base learner to recognize the objects of base classes and then refine the prediction results output by the meta learner.In this way,the interaction between these two learners and the way of combining results from the two learners are important.This paper proposes a new model,namely Distilling Base and Meta(DBAM)network by using self-attention mechanism and contrastive learning to enhance the few-shot segmentation performance.First,the self-attention-based ensemble module(SEM)is proposed to produce a more accurate adjustment factor for improving the fusion of two predictions of the two learners.Second,the prototype feature optimization module(PFOM)is proposed to provide an interaction between the two learners,which enhances the ability to distinguish the base classes from the target class by introducing contrastive learning loss.Extensive experiments have demonstrated that our method improves on the PASCAL-5i under 1-shot and 5-shot settings,respectively.