The cross-modal person re-identification task aims to match visible and infrared images of the same individual.The main challenges in this field arise from significant modality differences between individuals and the ...The cross-modal person re-identification task aims to match visible and infrared images of the same individual.The main challenges in this field arise from significant modality differences between individuals and the lack of high-quality cross-modal correspondence methods.Existing approaches often attempt to establish modality correspondence by extracting shared features across different modalities.However,these methods tend to focus on local information extraction and fail to fully leverage the global identity information in the cross-modal features,resulting in limited correspondence accuracy and suboptimal matching performance.To address this issue,we propose a quadratic graph matching method designed to overcome the challenges posed by modality differences through precise cross-modal relationship alignment.This method transforms the cross-modal correspondence problem into a graph matching task and minimizes the matching cost using a center search mechanism.Building on this approach,we further design a block reasoning module to uncover latent relationships between person identities and optimize the modality correspondence results.The block strategy not only improves the efficiency of updating gallery images but also enhances matching accuracy while reducing computational load.Experimental results demonstrate that our proposed method outperforms the state-of-the-art methods on the SYSU-MM01,RegDB,and RGBNT201 datasets,achieving excellent matching accuracy and robustness,thereby validating its effectiveness in cross-modal person re-identification.展开更多
In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestri...In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.展开更多
Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-c...Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.展开更多
The unsupervised vehicle re-identification task aims at identifying specific vehicles in surveillance videos without utilizing annotation information.Due to the higher similarity in appearance between vehicles compare...The unsupervised vehicle re-identification task aims at identifying specific vehicles in surveillance videos without utilizing annotation information.Due to the higher similarity in appearance between vehicles compared to pedestrians,pseudo-labels generated through clustering are ineffective in mitigating the impact of noise,and the feature distance between inter-class and intra-class has not been adequately improved.To address the aforementioned issues,we design a dual contrastive learning method based on knowledge distillation.During each iteration,we utilize a teacher model to randomly partition the entire dataset into two sub-domains based on clustering pseudo-label categories.By conducting contrastive learning between the two student models,we extract more discernible vehicle identity cues to improve the problem of imbalanced data distribution.Subsequently,we propose a context-aware pseudo label refinement strategy that leverages contextual features by progressively associating granularity information from different bottleneck blocks.To produce more trustworthy pseudo-labels and lessen noise interference during the clustering process,the context-aware scores are obtained by calculating the similarity between global features and contextual ones,which are subsequently added to the pseudo-label encoding process.The proposed method has achieved excellent performance in overcoming label noise and optimizing data distribution through extensive experimental results on publicly available datasets.展开更多
Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on ...Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on pseudo-labels generated from the source domain,which struggle to effectively address the diversity and dynamic nature of real-world scenarios.Given the limited variety of common vehicle types,enhancing the model’s generalization capability across these types is crucial.To this end,an innovative approach called meta-type generalization(MTG)is proposed.By dividing the training data into meta-train and meta-test sets based on vehicle type information,a novel gradient interaction computation strategy is designed to enhance the model’s ability to learn typeinvariant features.Integrated into the ResNet50 backbone,the MTG model achieves improvements of 4.50%and 12.04%on the Veri-776 and VRAI datasets,respectively,compared with traditional unsupervised algorithms,and surpasses current state-of-the-art methods.This achievement holds promise for application in intelligent traffic systems,enabling more efficient urban traffic solutions.展开更多
The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have ...The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.展开更多
In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or expl...In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.展开更多
Vehicle re-identification(ReID)aims to retrieve the target vehicle in an extensive image gallery through its appearances from various views in the cross-camera scenario.It has gradually become a core technology of int...Vehicle re-identification(ReID)aims to retrieve the target vehicle in an extensive image gallery through its appearances from various views in the cross-camera scenario.It has gradually become a core technology of intelligent transportation system.Most existing vehicle re-identification models adopt the joint learning of global and local features.However,they directly use the extracted global features,resulting in insufficient feature expression.Moreover,local features are primarily obtained through advanced annotation and complex attention mechanisms,which require additional costs.To solve this issue,a multi-feature learning model with enhanced local attention for vehicle re-identification(MFELA)is proposed in this paper.The model consists of global and local branches.The global branch utilizes both middle and highlevel semantic features of ResNet50 to enhance the global representation capability.In addition,multi-scale pooling operations are used to obtain multiscale information.While the local branch utilizes the proposed Region Batch Dropblock(RBD),which encourages the model to learn discriminative features for different local regions and simultaneously drops corresponding same areas randomly in a batch during training to enhance the attention to local regions.Then features from both branches are combined to provide a more comprehensive and distinctive feature representation.Extensive experiments on VeRi-776 and VehicleID datasets prove that our method has excellent performance.展开更多
Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared ima...Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.展开更多
Person re-identification(Re-ID)is the scientific task of finding specific person images of a person in a non-overlapping camera networks,and has achieved many breakthroughs recently.However,it remains very challenging...Person re-identification(Re-ID)is the scientific task of finding specific person images of a person in a non-overlapping camera networks,and has achieved many breakthroughs recently.However,it remains very challenging in adverse environmental conditions,especially in dark areas or at nighttime due to the imaging limitations of a single visible light source.To handle this problem,we propose a novel deep red green blue(RGB)-thermal(RGBT)representation learning framework for a single modality RGB person ReID.Due to the lack of thermal data in prevalent RGB Re-ID datasets,we propose to use the generative adversarial network to translate labeled RGB images of person to thermal infrared ones,trained on existing RGBT datasets.The labeled RGB images and the synthetic thermal images make up a labeled RGBT training set,and we propose a cross-modal attention network to learn effective RGBT representations for person Re-ID in day and night by leveraging the complementary advantages of RGB and thermal modalities.Extensive experiments on Market1501,CUHK03 and Duke MTMC-re ID datasets demonstrate the effectiveness of our method,which achieves stateof-the-art performance on all above person Re-ID datasets.展开更多
Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the iss...Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.展开更多
Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it i...Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it is still considered as an unsolved problem.Since the rise of deep learning,the accuracy of supervised person re-ID on public datasets has reached the highest level.However,these methods are difficult to apply to real-life scenarios because a large number of labeled training data is required in this situation.Pedestrian identity labeling,especially cross-camera pedestrian identity labeling,is heavy and expensive.Why we cannot apply the pre-trained model directly to the unseen camera network?Due to the existence of domain bias between source and target environment,the accuracy on target dataset is always low.For example,the model trained on the mall needs to adapt to the new environment of airport obviously.Recently,some researches have been proposed to solve this problem,including clustering-based methods,GAN-based methods,co-training methods and unsupervised domain adaptation methods.展开更多
Person re-identification (re-id) on robot platform is an important application for human-robot- interaction (HRI), which aims at making the robot recognize the around persons in varying scenes. Although many effec...Person re-identification (re-id) on robot platform is an important application for human-robot- interaction (HRI), which aims at making the robot recognize the around persons in varying scenes. Although many effective methods have been proposed for surveillance re-id in recent years, re-id on robot platform is still a novel unsolved problem. Most existing methods adapt the supervised metric learning offline to improve the accuracy. However, these methods can not adapt to unknown scenes. To solve this problem, an online re-id framework is proposed. Considering that robotics can afford to use high-resolution RGB-D sensors and clear human face may be captured, face information is used to update the metric model. Firstly, the metric model is pre-trained offline using labeled data. Then during the online stage, we use face information to mine incorrect body matching pairs which are collected to update the metric model online. In addition, to make full use of both appearance and skeleton information provided by RGB-D sensors, a novel feature funnel model (FFM) is proposed. Comparison studies show our approach is more effective and adaptable to varying environments.展开更多
Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropria...Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.展开更多
Person re-identification(Re-ID)is a fundamental subject in the field of the computer vision technologies.The traditional methods of person Re-ID have difficulty in solving the problems of person illumination,occlusion...Person re-identification(Re-ID)is a fundamental subject in the field of the computer vision technologies.The traditional methods of person Re-ID have difficulty in solving the problems of person illumination,occlusion and attitude change under complex background.Meanwhile,the introduction of deep learning opens a new way of person Re-ID research and becomes a hot spot in this field.This study reviews the traditional methods of person Re-ID,then the authors focus on the related papers about different person Re-ID frameworks on the basis of deep learning,and discusses their advantages and disadvantages.Finally,they propose the direction of further research,especially the prospect of person Re-ID methods based on deep learning.展开更多
Person re-ID is becoming increasingly popular in the field of modern surveillance.The purpose of person re-ID is to retrieve person of interests in non-overlapping multi-camera surveillance system.Due to the complexit...Person re-ID is becoming increasingly popular in the field of modern surveillance.The purpose of person re-ID is to retrieve person of interests in non-overlapping multi-camera surveillance system.Due to the complexity of the surveillance scene,the person images captured by cameras often have problems such as size variation,rotation,occlusion,illumination difference,etc.,which brings great challenges to the study of person re-ID.In recent years,studies based on deep learning have achieved great success in person re-ID.The improvement of basic networks and a large number of studies on the influencing factors have greatly improved the accuracy of person re-ID.Recently,some studies utilize GAN to tackle the domain adaptation task by transferring person images of source domain to the style of target domain and have achieved state of the art result in person re-ID.展开更多
Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a...Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.展开更多
As an emerging visual task,vehicle re-identification refers to the identification of the same vehicle across multiple cameras.Herein,we propose a novel vehicle re-identification method that uses an improved ResNet-50 ...As an emerging visual task,vehicle re-identification refers to the identification of the same vehicle across multiple cameras.Herein,we propose a novel vehicle re-identification method that uses an improved ResNet-50 architecture and utilizes the topology information of a surveillance network to rerank the final results.In the training stage,we apply several data augmentation approaches to expand our training data and increase their diversity in a cost-effective manner.We reform the original RestNet-50 architecture by adding non-local blocks to implement the attention mechanism and replacing part of the batch normalization operations with instance batch normalization.After obtaining preliminary results from the proposed model,we use the reranking algorithm,whose core function is to improve the similarity scores of all images on the most likely path that the vehicle tends to appear to optimize the final results.Compared with most existing state-of-the-art methods,our method is lighter,requires less data annotation,and offers competitive performance.展开更多
Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person r...Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person re-identification.An approach for person re-identification based on feature mapping space and sample determination is proposed.At first,a weight fusion model,including mean and maximum value of the horizontal occurrence in local features,is introduced into the mapping space to optimize local features.Then,the Gaussian distribution model with hierarchical mean and covariance of pixel features is introduced to enhance feature expression.Finally,considering the influence of the size of samples on metric learning performance,the appropriate metric learning is selected by sample determination method to further improve the performance of person re-identification.Experimental results on the VIPeR,PRID450 S and CUHK01 datasets demonstrate that the proposed method is better than the traditional methods.展开更多
Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class ...Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class variation caused by the modal discrepancy between visible and infrared images.For that, this paper proposes a query related cluster(QRC) method for VIPR.Firstly, this paper uses an attention mechanism to calculate the similarity relation between a visible query and infrared images with the same identity in the gallery.Secondly, those infrared images with the same query images are aggregated by using the similarity relation to form a dynamic clustering center corresponding to the query image.Thirdly, QRC loss function is designed to enlarge the similarity between the query image and its dynamic cluster center to achieve query related clustering, so as to compact the intra-class variations.Consequently, in the proposed QRC method, each query has its own dynamic clustering center, which can well characterize intra-class variations in VIPR.Experimental results demonstrate that the proposed QRC method is superior to many state-of-the-art approaches, acquiring a 90.77% rank-1 identification rate on the RegDB dataset.展开更多
文摘The cross-modal person re-identification task aims to match visible and infrared images of the same individual.The main challenges in this field arise from significant modality differences between individuals and the lack of high-quality cross-modal correspondence methods.Existing approaches often attempt to establish modality correspondence by extracting shared features across different modalities.However,these methods tend to focus on local information extraction and fail to fully leverage the global identity information in the cross-modal features,resulting in limited correspondence accuracy and suboptimal matching performance.To address this issue,we propose a quadratic graph matching method designed to overcome the challenges posed by modality differences through precise cross-modal relationship alignment.This method transforms the cross-modal correspondence problem into a graph matching task and minimizes the matching cost using a center search mechanism.Building on this approach,we further design a block reasoning module to uncover latent relationships between person identities and optimize the modality correspondence results.The block strategy not only improves the efficiency of updating gallery images but also enhances matching accuracy while reducing computational load.Experimental results demonstrate that our proposed method outperforms the state-of-the-art methods on the SYSU-MM01,RegDB,and RGBNT201 datasets,achieving excellent matching accuracy and robustness,thereby validating its effectiveness in cross-modal person re-identification.
基金the Foshan Science and technology Innovation Team Project(No.FS0AA-KJ919-4402-0060)the National Natural Science Foundation of China(No.62263018)。
文摘In view of the weak ability of the convolutional neural networks to explicitly learn spatial invariance and the probabilistic loss of discriminative features caused by occlusion and background interference in pedestrian re-identification tasks,a person re-identification method combining spatial feature learning and multi-granularity feature fusion was proposed.First,an attention spatial transformation network(A-STN)is proposed to learn spatial features and solve the problem of misalignment of pedestrian spatial features.Then the network was divided into a global branch,a local coarse-grained fusion branch,and a local fine-grained fusion branch to extract pedestrian global features,coarse-grained fusion features,and fine-grained fusion features,respectively.Among them,the global branch enriches the global features by fusing different pooling features.The local coarse-grained fusion branch uses an overlay pooling to enhance each local feature while learning the correlation relationship between multi-granularity features.The local fine-grained fusion branch uses a differential pooling to obtain the differential features that were fused with global features to learn the relationship between pedestrian local features and pedestrian global features.Finally,the proposed method was compared on three public datasets:Market1501,DukeMTMC-ReID and CUHK03.The experimental results were better than those of the comparative methods,which verifies the effectiveness of the proposed method.
基金funded by the National Natural Science Foundation of China(grant number:62172292).
文摘Vehicle re-identification involves matching images of vehicles across varying camera views.The diversity of camera locations along different roadways leads to significant intra-class variation and only minimal inter-class similarity in the collected vehicle images,which increases the complexity of re-identification tasks.To tackle these challenges,this study proposes AG-GCN(Attention-Guided Graph Convolutional Network),a novel framework integrating several pivotal components.Initially,AG-GCN embeds a lightweight attention module within the ResNet-50 structure to learn feature weights automatically,thereby improving the representation of vehicle features globally by highlighting salient features and suppressing extraneous ones.Moreover,AG-GCN adopts a graph-based structure to encapsulate deep local features.A graph convolutional network then amalgamates these features to understand the relationships among vehicle-related characteristics.Subsequently,we amalgamate feature maps from both the attention and graph-based branches for a more comprehensive representation of vehicle features.The framework then gauges feature similarities and ranks them,thus enhancing the accuracy of vehicle re-identification.Comprehensive qualitative and quantitative analyses on two publicly available datasets verify the efficacy of AG-GCN in addressing intra-class and inter-class variability issues.
基金supported by the National Natural Science Foundation of China under Grant Nos.62461037,62076117 and 62166026the Jiangxi Provincial Natural Science Foundation under Grant Nos.20224BAB212011,20232BAB202051,20232BAB212008 and 20242BAB25078the Jiangxi Provincial Key Laboratory of Virtual Reality under Grant No.2024SSY03151.
文摘The unsupervised vehicle re-identification task aims at identifying specific vehicles in surveillance videos without utilizing annotation information.Due to the higher similarity in appearance between vehicles compared to pedestrians,pseudo-labels generated through clustering are ineffective in mitigating the impact of noise,and the feature distance between inter-class and intra-class has not been adequately improved.To address the aforementioned issues,we design a dual contrastive learning method based on knowledge distillation.During each iteration,we utilize a teacher model to randomly partition the entire dataset into two sub-domains based on clustering pseudo-label categories.By conducting contrastive learning between the two student models,we extract more discernible vehicle identity cues to improve the problem of imbalanced data distribution.Subsequently,we propose a context-aware pseudo label refinement strategy that leverages contextual features by progressively associating granularity information from different bottleneck blocks.To produce more trustworthy pseudo-labels and lessen noise interference during the clustering process,the context-aware scores are obtained by calculating the similarity between global features and contextual ones,which are subsequently added to the pseudo-label encoding process.The proposed method has achieved excellent performance in overcoming label noise and optimizing data distribution through extensive experimental results on publicly available datasets.
基金Supported by the National Natural Science Foundation of China(No.61976098)the Natural Science Foundation for Outstanding Young Scholars of Fujian Province(No.2022J06023).
文摘Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on pseudo-labels generated from the source domain,which struggle to effectively address the diversity and dynamic nature of real-world scenarios.Given the limited variety of common vehicle types,enhancing the model’s generalization capability across these types is crucial.To this end,an innovative approach called meta-type generalization(MTG)is proposed.By dividing the training data into meta-train and meta-test sets based on vehicle type information,a novel gradient interaction computation strategy is designed to enhance the model’s ability to learn typeinvariant features.Integrated into the ResNet50 backbone,the MTG model achieves improvements of 4.50%and 12.04%on the Veri-776 and VRAI datasets,respectively,compared with traditional unsupervised algorithms,and surpasses current state-of-the-art methods.This achievement holds promise for application in intelligent traffic systems,enabling more efficient urban traffic solutions.
基金supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region(Grant No.2022D01B186 and No.2022D01B05)。
文摘The attention mechanism can extract salient features in images,which has been proved to be effective in improving the performance of person re-identification(Re-ID).However,most of the existing attention modules have the following two shortcomings:On the one hand,they mostly use global average pooling to generate context descriptors,without highlighting the guiding role of salient information on descriptor generation,resulting in insufficient ability of the final generated attention mask representation;On the other hand,the design of most attention modules is complicated,which greatly increases the computational cost of the model.To solve these problems,this paper proposes an attention module called self-supervised recalibration(SR)block,which introduces both global and local information through adaptive weighted fusion to generate a more refined attention mask.In particular,a special"Squeeze-Excitation"(SE)unit is designed in the SR block to further process the generated intermediate masks,both for nonlinearizations of the features and for constraint of the resulting computation by controlling the number of channels.Furthermore,we combine the most commonly used Res Net-50 to construct the instantiation model of the SR block,and verify its effectiveness on multiple Re-ID datasets,especially the mean Average Precision(m AP)on the Occluded-Duke dataset exceeds the state-of-the-art(SOTA)algorithm by 4.49%.
文摘In Unsupervised Domain Adaptation(UDA)for person re-identification(re-ID),the primary challenge is reducing the distribution discrepancy between the source and target domains.This can be achieved by implicitly or explicitly constructing an appropriate intermediate domain to enhance recognition capability on the target domain.Implicit construction is difficult due to the absence of intermediate state supervision,making smooth knowledge transfer from the source to the target domain a challenge.To explicitly construct the most suitable intermediate domain for the model to gradually adapt to the feature distribution changes from the source to the target domain,we propose the Minimal Transfer Cost Framework(MTCF).MTCF considers all scenarios of the intermediate domain during the transfer process,ensuring smoother and more efficient domain alignment.Our framework mainly includes threemodules:Intermediate Domain Generator(IDG),Cross-domain Feature Constraint Module(CFCM),and Residual Channel Space Module(RCSM).First,the IDG Module is introduced to generate all possible intermediate domains,ensuring a smooth transition of knowledge fromthe source to the target domain.To reduce the cross-domain feature distribution discrepancy,we propose the CFCM Module,which quantifies the difficulty of knowledge transfer and ensures the diversity of intermediate domain features and their semantic relevance,achieving alignment between the source and target domains by incorporating mutual information and maximum mean discrepancy.We also design the RCSM,which utilizes attention mechanism to enhance the model’s focus on personnel features in low-resolution images,improving the accuracy and efficiency of person re-ID.Our proposed method outperforms existing technologies in all common UDA re-ID tasks and improves the Mean Average Precision(mAP)by 2.3%in the Market to Duke task compared to the state-of-the-art(SOTA)methods.
基金This work was supported,in part,by the National Nature Science Foundation of China under Grant Numbers 61502240,61502096,61304205,61773219in part,by the Natural Science Foundation of Jiangsu Province under grant numbers BK20201136,BK20191401+1 种基金in part,by the Postgraduate Research&Practice Innovation Program of Jiangsu Province under Grant Numbers SJCX21_0363in part,by the Priority Academic Program Development of Jiangsu Higher Education Institutions(PAPD)fund.
文摘Vehicle re-identification(ReID)aims to retrieve the target vehicle in an extensive image gallery through its appearances from various views in the cross-camera scenario.It has gradually become a core technology of intelligent transportation system.Most existing vehicle re-identification models adopt the joint learning of global and local features.However,they directly use the extracted global features,resulting in insufficient feature expression.Moreover,local features are primarily obtained through advanced annotation and complex attention mechanisms,which require additional costs.To solve this issue,a multi-feature learning model with enhanced local attention for vehicle re-identification(MFELA)is proposed in this paper.The model consists of global and local branches.The global branch utilizes both middle and highlevel semantic features of ResNet50 to enhance the global representation capability.In addition,multi-scale pooling operations are used to obtain multiscale information.While the local branch utilizes the proposed Region Batch Dropblock(RBD),which encourages the model to learn discriminative features for different local regions and simultaneously drops corresponding same areas randomly in a batch during training to enhance the attention to local regions.Then features from both branches are combined to provide a more comprehensive and distinctive feature representation.Extensive experiments on VeRi-776 and VehicleID datasets prove that our method has excellent performance.
基金supported in part by the National Natural Science Foundation of China under Grant 62177029,62307025in part by the Startup Foundation for Introducing Talent of Nanjing University of Posts and Communications under Grant NY221041in part by the General Project of The Natural Science Foundation of Jiangsu Higher Education Institution of China 22KJB520025,23KJD580.
文摘Visible-infrared Cross-modality Person Re-identification(VI-ReID)is a critical technology in smart public facilities such as cities,campuses and libraries.It aims to match pedestrians in visible light and infrared images for video surveillance,which poses a challenge in exploring cross-modal shared information accurately and efficiently.Therefore,multi-granularity feature learning methods have been applied in VI-ReID to extract potential multi-granularity semantic information related to pedestrian body structure attributes.However,existing research mainly uses traditional dual-stream fusion networks and overlooks the core of cross-modal learning networks,the fusion module.This paper introduces a novel network called the Augmented Deep Multi-Granularity Pose-Aware Feature Fusion Network(ADMPFF-Net),incorporating the Multi-Granularity Pose-Aware Feature Fusion(MPFF)module to generate discriminative representations.MPFF efficiently explores and learns global and local features with multi-level semantic information by inserting disentangling and duplicating blocks into the fusion module of the backbone network.ADMPFF-Net also provides a new perspective for designing multi-granularity learning networks.By incorporating the multi-granularity feature disentanglement(mGFD)and posture information segmentation(pIS)strategies,it extracts more representative features concerning body structure information.The Local Information Enhancement(LIE)module augments high-performance features in VI-ReID,and the multi-granularity joint loss supervises model training for objective feature learning.Experimental results on two public datasets show that ADMPFF-Net efficiently constructs pedestrian feature representations and enhances the accuracy of VI-ReID.
基金supported by National Natural Science Foundation of China(Nos.61976002,61976003 and 61860206004)Natural Science Foundation of Anhui Higher Education Institutions of China(No.KJ2019A0033)the Open Project Program of the National Laboratory of Pattern Recognition(No.201900046)。
文摘Person re-identification(Re-ID)is the scientific task of finding specific person images of a person in a non-overlapping camera networks,and has achieved many breakthroughs recently.However,it remains very challenging in adverse environmental conditions,especially in dark areas or at nighttime due to the imaging limitations of a single visible light source.To handle this problem,we propose a novel deep red green blue(RGB)-thermal(RGBT)representation learning framework for a single modality RGB person ReID.Due to the lack of thermal data in prevalent RGB Re-ID datasets,we propose to use the generative adversarial network to translate labeled RGB images of person to thermal infrared ones,trained on existing RGBT datasets.The labeled RGB images and the synthetic thermal images make up a labeled RGBT training set,and we propose a cross-modal attention network to learn effective RGBT representations for person Re-ID in day and night by leveraging the complementary advantages of RGB and thermal modalities.Extensive experiments on Market1501,CUHK03 and Duke MTMC-re ID datasets demonstrate the effectiveness of our method,which achieves stateof-the-art performance on all above person Re-ID datasets.
基金supported by the National Natural Science Foundation of China(6147115461876057)the Fundamental Research Funds for Central Universities(JZ2018YYPY0287)
文摘Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.
文摘Person re-identification(re-ID)aims to match images of the same pedestrian across different cameras.It plays an important role in the field of security and surveillance.Although it has been studied for many years,it is still considered as an unsolved problem.Since the rise of deep learning,the accuracy of supervised person re-ID on public datasets has reached the highest level.However,these methods are difficult to apply to real-life scenarios because a large number of labeled training data is required in this situation.Pedestrian identity labeling,especially cross-camera pedestrian identity labeling,is heavy and expensive.Why we cannot apply the pre-trained model directly to the unseen camera network?Due to the existence of domain bias between source and target environment,the accuracy on target dataset is always low.For example,the model trained on the mall needs to adapt to the new environment of airport obviously.Recently,some researches have been proposed to solve this problem,including clustering-based methods,GAN-based methods,co-training methods and unsupervised domain adaptation methods.
基金This work is supported by the National Natural Science Foundation of China (NSFC, nos. 61340046), the National High Technology Research and Development Programme of China (863 Programme, no. 2006AA04Z247), the Scientific and Technical Innovation Commission of Shenzhen Municipality (nos. JCYJ20130331144631730), and the Specialized Research Fund for the Doctoral Programme of Higher Education (SRFDP, no. 20130001110011).
文摘Person re-identification (re-id) on robot platform is an important application for human-robot- interaction (HRI), which aims at making the robot recognize the around persons in varying scenes. Although many effective methods have been proposed for surveillance re-id in recent years, re-id on robot platform is still a novel unsolved problem. Most existing methods adapt the supervised metric learning offline to improve the accuracy. However, these methods can not adapt to unknown scenes. To solve this problem, an online re-id framework is proposed. Considering that robotics can afford to use high-resolution RGB-D sensors and clear human face may be captured, face information is used to update the metric model. Firstly, the metric model is pre-trained offline using labeled data. Then during the online stage, we use face information to mine incorrect body matching pairs which are collected to update the metric model online. In addition, to make full use of both appearance and skeleton information provided by RGB-D sensors, a novel feature funnel model (FFM) is proposed. Comparison studies show our approach is more effective and adaptable to varying environments.
基金supported by the the Natural Science Foundation of Jiangsu Province of China under Grant No.BK20211284the Financial and Science Technology Plan Project of Xinjiang Production and Construction Corps under Grant No.2020DB005.
文摘Person re-identification(ReID)aims to recognize the same person in multiple images from different camera views.Training person ReID models are time-consuming and resource-intensive;thus,cloud computing is an appropriate model training solution.However,the required massive personal data for training contain private information with a significant risk of data leakage in cloud environments,leading to significant communication overheads.This paper proposes a federated person ReID method with model-contrastive learning(MOON)in an edge-cloud environment,named FRM.Specifically,based on federated partial averaging,MOON warmup is added to correct the local training of individual edge servers and improve the model’s effectiveness by calculating and back-propagating a model-contrastive loss,which represents the similarity between local and global models.In addition,we propose a lightweight person ReID network,named multi-branch combined depth space network(MB-CDNet),to reduce the computing resource usage of the edge device when training and testing the person ReID model.MB-CDNet is a multi-branch version of combined depth space network(CDNet).We add a part branch and a global branch on the basis of CDNet and introduce an attention pyramid to improve the performance of the model.The experimental results on open-access person ReID datasets demonstrate that FRM achieves better performance than existing baseline.
基金supported by the Natural Science Foundation of China No.61703119,61573114Natural Science Fund of Heilongjiang Province of China No.QC2017070Fundamental Research Funds for the Central Universities of China No.HEUCFM180405.
文摘Person re-identification(Re-ID)is a fundamental subject in the field of the computer vision technologies.The traditional methods of person Re-ID have difficulty in solving the problems of person illumination,occlusion and attitude change under complex background.Meanwhile,the introduction of deep learning opens a new way of person Re-ID research and becomes a hot spot in this field.This study reviews the traditional methods of person Re-ID,then the authors focus on the related papers about different person Re-ID frameworks on the basis of deep learning,and discusses their advantages and disadvantages.Finally,they propose the direction of further research,especially the prospect of person Re-ID methods based on deep learning.
文摘Person re-ID is becoming increasingly popular in the field of modern surveillance.The purpose of person re-ID is to retrieve person of interests in non-overlapping multi-camera surveillance system.Due to the complexity of the surveillance scene,the person images captured by cameras often have problems such as size variation,rotation,occlusion,illumination difference,etc.,which brings great challenges to the study of person re-ID.In recent years,studies based on deep learning have achieved great success in person re-ID.The improvement of basic networks and a large number of studies on the influencing factors have greatly improved the accuracy of person re-ID.Recently,some studies utilize GAN to tackle the domain adaptation task by transferring person images of source domain to the style of target domain and have achieved state of the art result in person re-ID.
基金supported by the National Natural Science Foundation of China(61471154,61876057)the Key Research and Development Program of Anhui Province-Special Project of Strengthening Science and Technology Police(202004D07020012).
文摘Person re-identification is a prevalent technology deployed on intelligent surveillance.There have been remarkable achievements in person re-identification methods based on the assumption that all person images have a sufficiently high resolution,yet such models are not applicable to the open world.In real world,the changing distance between pedestrians and the camera renders the resolution of pedestrians captured by the camera inconsistent.When low-resolution(LR)images in the query set are matched with high-resolution(HR)images in the gallery set,it degrades the performance of the pedestrian matching task due to the absent pedestrian critical information in LR images.To address the above issues,we present a dualstream coupling network with wavelet transform(DSCWT)for the cross-resolution person re-identification task.Firstly,we use the multi-resolution analysis principle of wavelet transform to separately process the low-frequency and high-frequency regions of LR images,which is applied to restore the lost detail information of LR images.Then,we devise a residual knowledge constrained loss function that transfers knowledge between the two streams of LR images and HR images for accessing pedestrian invariant features at various resolutions.Extensive qualitative and quantitative experiments across four benchmark datasets verify the superiority of the proposed approach.
文摘As an emerging visual task,vehicle re-identification refers to the identification of the same vehicle across multiple cameras.Herein,we propose a novel vehicle re-identification method that uses an improved ResNet-50 architecture and utilizes the topology information of a surveillance network to rerank the final results.In the training stage,we apply several data augmentation approaches to expand our training data and increase their diversity in a cost-effective manner.We reform the original RestNet-50 architecture by adding non-local blocks to implement the attention mechanism and replacing part of the batch normalization operations with instance batch normalization.After obtaining preliminary results from the proposed model,we use the reranking algorithm,whose core function is to improve the similarity scores of all images on the most likely path that the vehicle tends to appear to optimize the final results.Compared with most existing state-of-the-art methods,our method is lighter,requires less data annotation,and offers competitive performance.
基金Supported by the National Natural Science Foundation of China (No.61976080)the Science and Technology Key Project of Science and Technology Department of Henan Province (No.212102310298)+1 种基金the Innovation and Quality Improvement Project for Graduate Education of Henan University (No.SYL20010101)the Academic Degress&Graduate Education Reform Project of Henan Province (2021SJLX195Y)。
文摘Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person re-identification.An approach for person re-identification based on feature mapping space and sample determination is proposed.At first,a weight fusion model,including mean and maximum value of the horizontal occurrence in local features,is introduced into the mapping space to optimize local features.Then,the Gaussian distribution model with hierarchical mean and covariance of pixel features is introduced to enhance feature expression.Finally,considering the influence of the size of samples on metric learning performance,the appropriate metric learning is selected by sample determination method to further improve the performance of person re-identification.Experimental results on the VIPeR,PRID450 S and CUHK01 datasets demonstrate that the proposed method is better than the traditional methods.
基金Supported by the National Natural Science Foundation of China (No.61976098)the Natural Science Foundation for Outstanding Young Scholars of Fujian Province (No.2022J06023)。
文摘Visible-infrared person re-identification(VIPR), is a cross-modal retrieval task that searches a target from a gallery captured by cameras of different spectrums.The severe challenge for VIPR is the large intra-class variation caused by the modal discrepancy between visible and infrared images.For that, this paper proposes a query related cluster(QRC) method for VIPR.Firstly, this paper uses an attention mechanism to calculate the similarity relation between a visible query and infrared images with the same identity in the gallery.Secondly, those infrared images with the same query images are aggregated by using the similarity relation to form a dynamic clustering center corresponding to the query image.Thirdly, QRC loss function is designed to enlarge the similarity between the query image and its dynamic cluster center to achieve query related clustering, so as to compact the intra-class variations.Consequently, in the proposed QRC method, each query has its own dynamic clustering center, which can well characterize intra-class variations in VIPR.Experimental results demonstrate that the proposed QRC method is superior to many state-of-the-art approaches, acquiring a 90.77% rank-1 identification rate on the RegDB dataset.