The unsupervised vehicle re-identification task aims at identifying specific vehicles in surveillance videos without utilizing annotation information.Due to the higher similarity in appearance between vehicles compare...The unsupervised vehicle re-identification task aims at identifying specific vehicles in surveillance videos without utilizing annotation information.Due to the higher similarity in appearance between vehicles compared to pedestrians,pseudo-labels generated through clustering are ineffective in mitigating the impact of noise,and the feature distance between inter-class and intra-class has not been adequately improved.To address the aforementioned issues,we design a dual contrastive learning method based on knowledge distillation.During each iteration,we utilize a teacher model to randomly partition the entire dataset into two sub-domains based on clustering pseudo-label categories.By conducting contrastive learning between the two student models,we extract more discernible vehicle identity cues to improve the problem of imbalanced data distribution.Subsequently,we propose a context-aware pseudo label refinement strategy that leverages contextual features by progressively associating granularity information from different bottleneck blocks.To produce more trustworthy pseudo-labels and lessen noise interference during the clustering process,the context-aware scores are obtained by calculating the similarity between global features and contextual ones,which are subsequently added to the pseudo-label encoding process.The proposed method has achieved excellent performance in overcoming label noise and optimizing data distribution through extensive experimental results on publicly available datasets.展开更多
Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on ...Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on pseudo-labels generated from the source domain,which struggle to effectively address the diversity and dynamic nature of real-world scenarios.Given the limited variety of common vehicle types,enhancing the model’s generalization capability across these types is crucial.To this end,an innovative approach called meta-type generalization(MTG)is proposed.By dividing the training data into meta-train and meta-test sets based on vehicle type information,a novel gradient interaction computation strategy is designed to enhance the model’s ability to learn typeinvariant features.Integrated into the ResNet50 backbone,the MTG model achieves improvements of 4.50%and 12.04%on the Veri-776 and VRAI datasets,respectively,compared with traditional unsupervised algorithms,and surpasses current state-of-the-art methods.This achievement holds promise for application in intelligent traffic systems,enabling more efficient urban traffic solutions.展开更多
Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it i...Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it is still considered as an unsolved problem.Since the rise of deep learning,the accuracy of supervised person re-ID on public datasets has reached the highest level.However,these methods are difficult to apply to real-life scenarios because a large number of labeled training data is required in this situation.Pedestrian identity labeling,especially cross-camera pedestrian identity labeling,is heavy and expensive.Why we cannot apply the pre-trained model directly to the unseen camera network?Due to the existence of domain bias between source and target environment,the accuracy on target dataset is always low.For example,the model trained on the mall needs to adapt to the new environment of airport obviously.Recently,some researches have been proposed to solve this problem,including clustering-based methods,GAN-based methods,co-training methods and unsupervised domain adaptation methods.展开更多
In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or expl...In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.展开更多
In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestri...In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.展开更多
Aiming at the problem that the existing pedestrian recognition technology re-identification effect is not good and the traditional method has low recognition effect. A feature fusion network is proposed in this paper,...Aiming at the problem that the existing pedestrian recognition technology re-identification effect is not good and the traditional method has low recognition effect. A feature fusion network is proposed in this paper, which combines the CNN features extracted by ResNet with the manual annotation attributes into a unified feature space. ResNet solved the problem of network degradation and multi-convergence in multi-layer CNN training, and extracted deeper features. The attribute combination method was adopted by the artificial annotation attributes. The CNN features were constrained by the hand-crafted features because of the back propagation. Then the loss measurement function was used to optimize network identification results. In the public datasets VIPeR, PRID, and CUHK for further testing, the experimental results show that the method achieves a high cumulative matching score.展开更多
This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use Vi...This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use ViT for UDA Re-ID.We observe that the ViT structure provides a unique advantage for UDA Re-ID,i.e.,it has a prompt(the learnable class token)at its bottom layer,that can be used to efficiently condition the deep model for the underlying domain.To utilize this advantage,we propose a novel two-stage UDA pipeline named Prompting And Tuning(PAT)which consists of a prompt learning stage and a subsequent fine-tuning stage.In the first stage,PAT roughly adapts the model from source to target domain by learning the prompts for two domains,while in the second stage,PAT fine-tunes the entire backbone for further adaption to increase the accuracy.Although these two stages both adopt the pseudo labels for training,we show that they have different data preferences.With these two preferences,prompt learning and fine-tuning integrated well with each other and jointly facilitated a competitive PAT method for UDA Re-ID.展开更多
Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification(Re-ID).Re-ID models suffer from varying degrees of backgrou...Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification(Re-ID).Re-ID models suffer from varying degrees of background interference caused by continuous scene variations.The recently proposed segment anything model(SAM)has demonstrated exceptional performance in zero-shot segmentation tasks.The combination of SAM and vehicle Re-ID models can achieve efficient separation of vehicle identity and background information.This paper proposes a method that combines SAM-driven mask autoencoder(MAE)pre-training and backgroundaware meta-learning for unsupervised vehicle Re-ID.The method consists of three sub-modules.First,the segmentation capacity of SAM is utilized to separate the vehicle identity region from the background.SAM cannot be robustly employed in exceptional situations,such as those with ambiguity or occlusion.Thus,in vehicle Re-ID downstream tasks,a spatiallyconstrained vehicle background segmentation method is presented to obtain accurate background segmentation results.Second,SAM-driven MAE pre-training utilizes the aforementioned segmentation results to select patches belonging to the vehicle and to mask other patches,allowing MAE to learn identity-sensitive features in a self-supervised manner.Finally,we present a background-aware meta-learning method to fit varying degrees of background interference in different scenarios by combining different background region ratios.Our experiments demonstrate that the proposed method has state-of-the-art performance in reducing background interference variations.展开更多
Existing unsupervised person re-identification(Re-ID)methods have achieved remarkable performance by adopting an alternate clustering-training manner.However,they still suffer from camera variation,which results in an...Existing unsupervised person re-identification(Re-ID)methods have achieved remarkable performance by adopting an alternate clustering-training manner.However,they still suffer from camera variation,which results in an inconsistent feature space and unreliable pseudo labels that severely degrade the performance.In this paper,we propose a cross-camera self-distillation(CCSD)framework for unsupervised person Re-ID to alleviate the effect of camera variation.Specifically,in the clustering phase,we propose a camera-aware cluster refinement mechanism,whichfirst splits each cluster into multiple clusters according to the camera views,and then refines them into more compact clusters.In the training phase,wefirst obtain the similarity between the samples and the refined clusters from the same and different cameras,and then transfer the knowledge of similarity distribution from intra-camera to cross-camera.Since the intra-camera similarity is free from camera variation,our knowledge distillation approach is able to learn a more consistent feature space across cameras.Extensive experiments demonstrate the superiority of our proposed CCSD against the state-of-the-art approaches on unsupervised person Re-ID.展开更多
In dairy farming,ensuring the health of each cow and minimizing economic losses requires individual monitoring,achieved through cow Re-Identification(Re-ID).Computer vision-based Re-ID relies on visually dis-tinguishi...In dairy farming,ensuring the health of each cow and minimizing economic losses requires individual monitoring,achieved through cow Re-Identification(Re-ID).Computer vision-based Re-ID relies on visually dis-tinguishing features,such as the distinctive coat patterns of breeds like Holstein.However,annotating every cow in each farm is cost-prohibitive.Our objective is to develop Re-ID methods applicable to both labeled and unlabeled farms,accommodating new individuals and diverse environments.Un-supervised Domain Adaptation(UDA)techniques bridge this gap,transferring knowledge from labeled source domains to unlabeled target domains,but have only been mainly designed for pedestrian and vehicle Re-ID applications.Our work introduces Cumulative Unsupervised Multi-Domain Adaptation(CUMDA)to address challenges of lim-ited identity diversity and diverse farm appearances.CUMDA accumulates knowledge from all domains,enhanc-ing specialization in known domains and improving generalization to unseen domains.Our contributions include a CUMDA method adapting to multiple unlabeled target domains while preserving source domain performance,along with extensive cross-dataset experiments on three cattle Re-ID datasets.These experiments demonstrate significant enhancements in source preservation,target domain specialization,and generalization to unseen domains.展开更多
Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the...Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the same label after clustering.The identity-independent information contained in different local regions leads to different levels of local noise.To address these challenges,joint training with local soft attention and dual cross-neighbor label smoothing(DCLS)is proposed in this study.First,the joint training is divided into global and local parts,whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions,which improves the ability of the re-identification model in identifying a person’s local significant features.Second,DCLS is designed to progressively mitigate label noise in different local regions.The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions,thereby achieving label smoothing of the global and local regions throughout the training process.In extensive experiments,the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.展开更多
目前,无监督单模态行人重识别研究主要集中于可见光图像。随着新型红外摄像头的普及,无监督红外行人重识别也展现出其研究价值。由于红外图像对比度低、缺乏颜色纹理细节信息,因此全局信息对于红外行人重识别至关重要。本文设计了基于F-...目前,无监督单模态行人重识别研究主要集中于可见光图像。随着新型红外摄像头的普及,无监督红外行人重识别也展现出其研究价值。由于红外图像对比度低、缺乏颜色纹理细节信息,因此全局信息对于红外行人重识别至关重要。本文设计了基于F-ResGAM的无监督红外行人重识别网络。该网络首先利用小波变换对图像进行预处理以增强特征提取能力,接着在resnet50网络结构中引入全局注意力机制(Global Attention Mechanism,GAM)关注更多的全局信息。此外,由于红外伪标签噪声较大,本文提出采用基于样本扩展的分组采样(Group Sampling based on Sample Expansion,GSSE)策略进一步优化伪标签生成,从而提升了模型的识别精度。实验结果表明,本文提出的优化方法有效提升了无监督红外行人重识别的精度,尤其是rank指标显著提升。展开更多
无监督行人重识别(Unsupervised Person Re-Identification,UPR)技术在安防工程和智慧城市等场景中得到广泛应用。然而,现有的很多UPR算法在特征提取上忽略了局部特征匹配和空间位置特征信息,在伪标签聚类过程中可能丢弃大量未聚类样本...无监督行人重识别(Unsupervised Person Re-Identification,UPR)技术在安防工程和智慧城市等场景中得到广泛应用。然而,现有的很多UPR算法在特征提取上忽略了局部特征匹配和空间位置特征信息,在伪标签聚类过程中可能丢弃大量未聚类样本。为克服上述缺点,文章提出基于局部特征匹配和混合对比学习的无监督行人重识别方法(LHFC):首先,针对网络不能提取不同空间位置特征信息的问题,在特征提取的骨干网络ResNet50中引入了自相似的非局域注意力机制(Non-local);针对局部特征不匹配的问题,设计了局部特征匹配模块(Aligned),在学习图像相似度的同时考虑了人体结构的匹配;最后,针对训练过程中丢弃未聚类样本从而导致提取特征不充分的问题,提出了聚类级与实例级混合存储器(HCL),以存储聚类级身份特征和离群点实例特征。为验证模型性能的有效性,在2个公开数据集(Market-1501、DukeMTMC-ReID)上与现有的12种无监督方法进行对比。同时,为探讨Non-local、Aligned、HCL对模型效果的影响,进行了消融实验。对比实验结果表明:LHFC方法在Market-1501、DukeMTMC-ReID数据集上的mAP指标分别达到了84.4%、71.5%,相对于12种对比方法中表现最好的CACL方法,指标分别提高了3.5%、1.9%。消融实验结果表明Non-local、Aligned、HCL可以提高指标精度:在ResNet50中引入Non-local有利于提取更多有用的行人特征信息,从而更好地标注局部特征之间的空间位置关系;Aligned模块可以有效融合相对应的人体结构信息;HCL可以减少训练后期伪标签带来的误差。展开更多
目的无监督行人重识别可缓解有监督方法中数据集标注成本高的问题,其中无监督跨域自适应是最常见的行人重识别方案。现有UDA(unsupervised domain adaptive)行人重识别方法在聚类过程中容易引入伪标签噪声,存在对相似人群区分能力差等...目的无监督行人重识别可缓解有监督方法中数据集标注成本高的问题,其中无监督跨域自适应是最常见的行人重识别方案。现有UDA(unsupervised domain adaptive)行人重识别方法在聚类过程中容易引入伪标签噪声,存在对相似人群区分能力差等问题。方法针对上述问题,基于特征具有类内收敛性、类内连续性与类间外散性的特点,提出了一种基于近邻优化的跨域无监督行人重识别方法,首先采用有监督方法得到源域预训练模型,然后在目标域进行无监督训练。为增强模型对高相似度行人的辨识能力,设计了邻域对抗损失函数,任意样本与其他样本构成样本对,使类别确定性最强的一组样本对与不确定性最强的一组样本对之间进行对抗。为使类内样本特征朝着同一方向收敛,设计了特征连续性损失函数,将特征距离曲线进行中心归一化处理,在维持特征曲线固有差异的同时,拉近样本k邻近特征距离。结果消融实验结果表明损失函数各部分的有效性,对比实验结果表明,提出方法性能较已有方法更具优势,在Market-1501(1501 identities dataset from market)和DukeMTMC-reID(multi-target multi-camera person re-identification dataset from Duke University)数据集上的Rank-1和平均精度均值(mean average precision,mAP)指标分别达到了92.8%、84.1%和83.9%、71.1%。结论提出方法设计了邻域对抗损失与邻域连续性损失函数,增强了模型对相似人群的辨识能力,从而有效提升了行人重识别的性能。展开更多
Person re-identification(person re-id) aims to match observations on pedestrians from different cameras.It is a challenging task in real word surveillance systems and draws extensive attention from the community.Most ...Person re-identification(person re-id) aims to match observations on pedestrians from different cameras.It is a challenging task in real word surveillance systems and draws extensive attention from the community.Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words(i Bo W) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed i Bo W descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPe R, PRID450 S, and Market1501.展开更多
基金supported by the National Natural Science Foundation of China under Grant Nos.62461037,62076117 and 62166026the Jiangxi Provincial Natural Science Foundation under Grant Nos.20224BAB212011,20232BAB202051,20232BAB212008 and 20242BAB25078the Jiangxi Provincial Key Laboratory of Virtual Reality under Grant No.2024SSY03151.
文摘The unsupervised vehicle re-identification task aims at identifying specific vehicles in surveillance videos without utilizing annotation information.Due to the higher similarity in appearance between vehicles compared to pedestrians,pseudo-labels generated through clustering are ineffective in mitigating the impact of noise,and the feature distance between inter-class and intra-class has not been adequately improved.To address the aforementioned issues,we design a dual contrastive learning method based on knowledge distillation.During each iteration,we utilize a teacher model to randomly partition the entire dataset into two sub-domains based on clustering pseudo-label categories.By conducting contrastive learning between the two student models,we extract more discernible vehicle identity cues to improve the problem of imbalanced data distribution.Subsequently,we propose a context-aware pseudo label refinement strategy that leverages contextual features by progressively associating granularity information from different bottleneck blocks.To produce more trustworthy pseudo-labels and lessen noise interference during the clustering process,the context-aware scores are obtained by calculating the similarity between global features and contextual ones,which are subsequently added to the pseudo-label encoding process.The proposed method has achieved excellent performance in overcoming label noise and optimizing data distribution through extensive experimental results on publicly available datasets.
基金Supported by the National Natural Science Foundation of China(No.61976098)the Natural Science Foundation for Outstanding Young Scholars of Fujian Province(No.2022J06023).
文摘Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on pseudo-labels generated from the source domain,which struggle to effectively address the diversity and dynamic nature of real-world scenarios.Given the limited variety of common vehicle types,enhancing the model’s generalization capability across these types is crucial.To this end,an innovative approach called meta-type generalization(MTG)is proposed.By dividing the training data into meta-train and meta-test sets based on vehicle type information,a novel gradient interaction computation strategy is designed to enhance the model’s ability to learn typeinvariant features.Integrated into the ResNet50 backbone,the MTG model achieves improvements of 4.50%and 12.04%on the Veri-776 and VRAI datasets,respectively,compared with traditional unsupervised algorithms,and surpasses current state-of-the-art methods.This achievement holds promise for application in intelligent traffic systems,enabling more efficient urban traffic solutions.
文摘Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it is still considered as an unsolved problem.Since the rise of deep learning,the accuracy of supervised person re-ID on public datasets has reached the highest level.However,these methods are difficult to apply to real-life scenarios because a large number of labeled training data is required in this situation.Pedestrian identity labeling,especially cross-camera pedestrian identity labeling,is heavy and expensive.Why we cannot apply the pre-trained model directly to the unseen camera network?Due to the existence of domain bias between source and target environment,the accuracy on target dataset is always low.For example,the model trained on the mall needs to adapt to the new environment of airport obviously.Recently,some researches have been proposed to solve this problem,including clustering-based methods,GAN-based methods,co-training methods and unsupervised domain adaptation methods.
文摘In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.
基金the Foshan Science and technology Innovation Team Project(No.FS0AA-KJ919-4402-0060)the National Natural Science Foundation of China(No.62263018)。
文摘In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.
文摘Aiming at the problem that the existing pedestrian recognition technology re-identification effect is not good and the traditional method has low recognition effect. A feature fusion network is proposed in this paper, which combines the CNN features extracted by ResNet with the manual annotation attributes into a unified feature space. ResNet solved the problem of network degradation and multi-convergence in multi-layer CNN training, and extracted deeper features. The attribute combination method was adopted by the artificial annotation attributes. The CNN features were constrained by the hand-crafted features because of the back propagation. Then the loss measurement function was used to optimize network identification results. In the public datasets VIPeR, PRID, and CUHK for further testing, the experimental results show that the method achieves a high cumulative matching score.
基金This work was supported by the National Key Research and Development Program of China in the 13th Five-Year(No.2016YFB0801301)in the 14th Five-Year(Nos.2021YFFO602103,2021YFF0602102,and 20210Y1702).
文摘This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use ViT for UDA Re-ID.We observe that the ViT structure provides a unique advantage for UDA Re-ID,i.e.,it has a prompt(the learnable class token)at its bottom layer,that can be used to efficiently condition the deep model for the underlying domain.To utilize this advantage,we propose a novel two-stage UDA pipeline named Prompting And Tuning(PAT)which consists of a prompt learning stage and a subsequent fine-tuning stage.In the first stage,PAT roughly adapts the model from source to target domain by learning the prompts for two domains,while in the second stage,PAT fine-tunes the entire backbone for further adaption to increase the accuracy.Although these two stages both adopt the pseudo labels for training,we show that they have different data preferences.With these two preferences,prompt learning and fine-tuning integrated well with each other and jointly facilitated a competitive PAT method for UDA Re-ID.
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Nos.20224BAB212011,20232BAB212008,and 20232BAB202051.
文摘Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification(Re-ID).Re-ID models suffer from varying degrees of background interference caused by continuous scene variations.The recently proposed segment anything model(SAM)has demonstrated exceptional performance in zero-shot segmentation tasks.The combination of SAM and vehicle Re-ID models can achieve efficient separation of vehicle identity and background information.This paper proposes a method that combines SAM-driven mask autoencoder(MAE)pre-training and backgroundaware meta-learning for unsupervised vehicle Re-ID.The method consists of three sub-modules.First,the segmentation capacity of SAM is utilized to separate the vehicle identity region from the background.SAM cannot be robustly employed in exceptional situations,such as those with ambiguity or occlusion.Thus,in vehicle Re-ID downstream tasks,a spatiallyconstrained vehicle background segmentation method is presented to obtain accurate background segmentation results.Second,SAM-driven MAE pre-training utilizes the aforementioned segmentation results to select patches belonging to the vehicle and to mask other patches,allowing MAE to learn identity-sensitive features in a self-supervised manner.Finally,we present a background-aware meta-learning method to fit varying degrees of background interference in different scenarios by combining different background region ratios.Our experiments demonstrate that the proposed method has state-of-the-art performance in reducing background interference variations.
基金supported by the National Natural Science Foundation of China(No.62176097)the Outstanding Youth Foundation of Hubei Province(No.2022CFA055).
文摘Existing unsupervised person re-identification(Re-ID)methods have achieved remarkable performance by adopting an alternate clustering-training manner.However,they still suffer from camera variation,which results in an inconsistent feature space and unreliable pseudo labels that severely degrade the performance.In this paper,we propose a cross-camera self-distillation(CCSD)framework for unsupervised person Re-ID to alleviate the effect of camera variation.Specifically,in the clustering phase,we propose a camera-aware cluster refinement mechanism,whichfirst splits each cluster into multiple clusters according to the camera views,and then refines them into more compact clusters.In the training phase,wefirst obtain the similarity between the samples and the refined clusters from the same and different cameras,and then transfer the knowledge of similarity distribution from intra-camera to cross-camera.Since the intra-camera similarity is free from camera variation,our knowledge distillation approach is able to learn a more consistent feature space across cameras.Extensive experiments demonstrate the superiority of our proposed CCSD against the state-of-the-art approaches on unsupervised person Re-ID.
文摘In dairy farming,ensuring the health of each cow and minimizing economic losses requires individual monitoring,achieved through cow Re-Identification(Re-ID).Computer vision-based Re-ID relies on visually dis-tinguishing features,such as the distinctive coat patterns of breeds like Holstein.However,annotating every cow in each farm is cost-prohibitive.Our objective is to develop Re-ID methods applicable to both labeled and unlabeled farms,accommodating new individuals and diverse environments.Un-supervised Domain Adaptation(UDA)techniques bridge this gap,transferring knowledge from labeled source domains to unlabeled target domains,but have only been mainly designed for pedestrian and vehicle Re-ID applications.Our work introduces Cumulative Unsupervised Multi-Domain Adaptation(CUMDA)to address challenges of lim-ited identity diversity and diverse farm appearances.CUMDA accumulates knowledge from all domains,enhanc-ing specialization in known domains and improving generalization to unseen domains.Our contributions include a CUMDA method adapting to multiple unlabeled target domains while preserving source domain performance,along with extensive cross-dataset experiments on three cattle Re-ID datasets.These experiments demonstrate significant enhancements in source preservation,target domain specialization,and generalization to unseen domains.
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Key Laboratory of Smart City under Grant No.20192BCD40002Jiangxi Provincial Natural Science Foundation under Grant No.20224BAB212011.
文摘Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the same label after clustering.The identity-independent information contained in different local regions leads to different levels of local noise.To address these challenges,joint training with local soft attention and dual cross-neighbor label smoothing(DCLS)is proposed in this study.First,the joint training is divided into global and local parts,whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions,which improves the ability of the re-identification model in identifying a person’s local significant features.Second,DCLS is designed to progressively mitigate label noise in different local regions.The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions,thereby achieving label smoothing of the global and local regions throughout the training process.In extensive experiments,the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.
文摘目前,无监督单模态行人重识别研究主要集中于可见光图像。随着新型红外摄像头的普及,无监督红外行人重识别也展现出其研究价值。由于红外图像对比度低、缺乏颜色纹理细节信息,因此全局信息对于红外行人重识别至关重要。本文设计了基于F-ResGAM的无监督红外行人重识别网络。该网络首先利用小波变换对图像进行预处理以增强特征提取能力,接着在resnet50网络结构中引入全局注意力机制(Global Attention Mechanism,GAM)关注更多的全局信息。此外,由于红外伪标签噪声较大,本文提出采用基于样本扩展的分组采样(Group Sampling based on Sample Expansion,GSSE)策略进一步优化伪标签生成,从而提升了模型的识别精度。实验结果表明,本文提出的优化方法有效提升了无监督红外行人重识别的精度,尤其是rank指标显著提升。
文摘无监督行人重识别(Unsupervised Person Re-Identification,UPR)技术在安防工程和智慧城市等场景中得到广泛应用。然而,现有的很多UPR算法在特征提取上忽略了局部特征匹配和空间位置特征信息,在伪标签聚类过程中可能丢弃大量未聚类样本。为克服上述缺点,文章提出基于局部特征匹配和混合对比学习的无监督行人重识别方法(LHFC):首先,针对网络不能提取不同空间位置特征信息的问题,在特征提取的骨干网络ResNet50中引入了自相似的非局域注意力机制(Non-local);针对局部特征不匹配的问题,设计了局部特征匹配模块(Aligned),在学习图像相似度的同时考虑了人体结构的匹配;最后,针对训练过程中丢弃未聚类样本从而导致提取特征不充分的问题,提出了聚类级与实例级混合存储器(HCL),以存储聚类级身份特征和离群点实例特征。为验证模型性能的有效性,在2个公开数据集(Market-1501、DukeMTMC-ReID)上与现有的12种无监督方法进行对比。同时,为探讨Non-local、Aligned、HCL对模型效果的影响,进行了消融实验。对比实验结果表明:LHFC方法在Market-1501、DukeMTMC-ReID数据集上的mAP指标分别达到了84.4%、71.5%,相对于12种对比方法中表现最好的CACL方法,指标分别提高了3.5%、1.9%。消融实验结果表明Non-local、Aligned、HCL可以提高指标精度:在ResNet50中引入Non-local有利于提取更多有用的行人特征信息,从而更好地标注局部特征之间的空间位置关系;Aligned模块可以有效融合相对应的人体结构信息;HCL可以减少训练后期伪标签带来的误差。
文摘目的无监督行人重识别可缓解有监督方法中数据集标注成本高的问题,其中无监督跨域自适应是最常见的行人重识别方案。现有UDA(unsupervised domain adaptive)行人重识别方法在聚类过程中容易引入伪标签噪声,存在对相似人群区分能力差等问题。方法针对上述问题,基于特征具有类内收敛性、类内连续性与类间外散性的特点,提出了一种基于近邻优化的跨域无监督行人重识别方法,首先采用有监督方法得到源域预训练模型,然后在目标域进行无监督训练。为增强模型对高相似度行人的辨识能力,设计了邻域对抗损失函数,任意样本与其他样本构成样本对,使类别确定性最强的一组样本对与不确定性最强的一组样本对之间进行对抗。为使类内样本特征朝着同一方向收敛,设计了特征连续性损失函数,将特征距离曲线进行中心归一化处理,在维持特征曲线固有差异的同时,拉近样本k邻近特征距离。结果消融实验结果表明损失函数各部分的有效性,对比实验结果表明,提出方法性能较已有方法更具优势,在Market-1501(1501 identities dataset from market)和DukeMTMC-reID(multi-target multi-camera person re-identification dataset from Duke University)数据集上的Rank-1和平均精度均值(mean average precision,mAP)指标分别达到了92.8%、84.1%和83.9%、71.1%。结论提出方法设计了邻域对抗损失与邻域连续性损失函数,增强了模型对相似人群的辨识能力,从而有效提升了行人重识别的性能。
基金supported by the National Natural Science Foundation of China (No. 61071135)the National Science and Technology Support Program (No. 2013BAK02B04)
文摘Person re-identification(person re-id) aims to match observations on pedestrians from different cameras.It is a challenging task in real word surveillance systems and draws extensive attention from the community.Most existing methods are based on supervised learning which requires a large number of labeled data. In this paper, we develop a robust unsupervised learning approach for person re-id. We propose an improved Bag-of-Words(i Bo W) model to describe and match pedestrians under different camera views. The proposed descriptor does not require any re-id labels, and is robust against pedestrian variations. Experiments show the proposed i Bo W descriptor outperforms other unsupervised methods. By combination with efficient metric learning algorithms, we obtained competitive accuracy compared to existing state-of-the-art methods on person re-identification benchmarks, including VIPe R, PRID450 S, and Market1501.