Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on ...Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on pseudo-labels generated from the source domain,which struggle to effectively address the diversity and dynamic nature of real-world scenarios.Given the limited variety of common vehicle types,enhancing the model’s generalization capability across these types is crucial.To this end,an innovative approach called meta-type generalization(MTG)is proposed.By dividing the training data into meta-train and meta-test sets based on vehicle type information,a novel gradient interaction computation strategy is designed to enhance the model’s ability to learn typeinvariant features.Integrated into the ResNet50 backbone,the MTG model achieves improvements of 4.50%and 12.04%on the Veri-776 and VRAI datasets,respectively,compared with traditional unsupervised algorithms,and surpasses current state-of-the-art methods.This achievement holds promise for application in intelligent traffic systems,enabling more efficient urban traffic solutions.展开更多
Video-based person re-identification(Re-ID),a subset of retrieval tasks,faces challenges like uncoordinated sample capturing,viewpoint variations,occlusions,cluttered backgrounds,and sequence uncertainties.Recent adva...Video-based person re-identification(Re-ID),a subset of retrieval tasks,faces challenges like uncoordinated sample capturing,viewpoint variations,occlusions,cluttered backgrounds,and sequence uncertainties.Recent advancements in deep learning have significantly improved video-based person Re-ID,laying a solid foundation for further progress in the field.In order to enrich researchers’insights into the latest research findings and prospective developments,we offer an extensive overview and meticulous analysis of contemporary video-based person ReID methodologies,with a specific emphasis on network architecture design and loss function design.Firstly,we introduce methods based on network architecture design and loss function design from multiple perspectives,and analyzes the advantages and disadvantages of these methods.Furthermore,we provide a synthesis of prevalent datasets and key evaluation metrics utilized within this field to assist researchers in assessing methodological efficacy and establishing benchmarks for performance evaluation.Lastly,through a critical evaluation of the experimental outcomes derived from various methodologies across four prominent public datasets,we identify promising research avenues and offer valuable insights to steer future exploration and innovation in this vibrant and evolving field of video-based person Re-ID.This comprehensive analysis aims to equip researchers with the necessary knowledge and strategic foresight to navigate the complexities of video-based person Re-ID,fostering continued progress and breakthroughs in this challenging yet promising research domain.展开更多
Person re-identification(Re-ID)is the scientific task of finding specific person images of a person in a non-overlapping camera networks,and has achieved many breakthroughs recently.However,it remains very challenging...Person re-identification(Re-ID)is the scientific task of finding specific person images of a person in a non-overlapping camera networks,and has achieved many breakthroughs recently.However,it remains very challenging in adverse environmental conditions,especially in dark areas or at nighttime due to the imaging limitations of a single visible light source.To handle this problem,we propose a novel deep red green blue(RGB)-thermal(RGBT)representation learning framework for a single modality RGB person ReID.Due to the lack of thermal data in prevalent RGB Re-ID datasets,we propose to use the generative adversarial network to translate labeled RGB images of person to thermal infrared ones,trained on existing RGBT datasets.The labeled RGB images and the synthetic thermal images make up a labeled RGBT training set,and we propose a cross-modal attention network to learn effective RGBT representations for person Re-ID in day and night by leveraging the complementary advantages of RGB and thermal modalities.Extensive experiments on Market1501,CUHK03 and Duke MTMC-re ID datasets demonstrate the effectiveness of our method,which achieves stateof-the-art performance on all above person Re-ID datasets.展开更多
Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the iss...Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.展开更多
Person re-identification(Re-ID)has achieved great progress in recent years.However,person Re-ID methods are still suffering from body part missing and occlusion problems,which makes the learned representations less re...Person re-identification(Re-ID)has achieved great progress in recent years.However,person Re-ID methods are still suffering from body part missing and occlusion problems,which makes the learned representations less reliable.In this paper,we pro⁃pose a robust coarse granularity part-level network(CGPN)for person Re-ID,which ex⁃tracts robust regional features and integrates supervised global features for pedestrian im⁃ages.CGPN gains two-fold benefit toward higher accuracy for person Re-ID.On one hand,CGPN learns to extract effective regional features for pedestrian images.On the other hand,compared with extracting global features directly by backbone network,CGPN learns to extract more accurate global features with a supervision strategy.The single mod⁃el trained on three Re-ID datasets achieves state-of-the-art performances.Especially on CUHK03,the most challenging Re-ID dataset,we obtain a top result of Rank-1/mean av⁃erage precision(mAP)=87.1%/83.6%without re-ranking.展开更多
Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person r...Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person re-identification.An approach for person re-identification based on feature mapping space and sample determination is proposed.At first,a weight fusion model,including mean and maximum value of the horizontal occurrence in local features,is introduced into the mapping space to optimize local features.Then,the Gaussian distribution model with hierarchical mean and covariance of pixel features is introduced to enhance feature expression.Finally,considering the influence of the size of samples on metric learning performance,the appropriate metric learning is selected by sample determination method to further improve the performance of person re-identification.Experimental results on the VIPeR,PRID450 S and CUHK01 datasets demonstrate that the proposed method is better than the traditional methods.展开更多
Person re-IDentification(re-ID) is an important research topic in the computer vision community, with significance for a range of applications. Pedestrians are well-structured objects that can be partitioned, although...Person re-IDentification(re-ID) is an important research topic in the computer vision community, with significance for a range of applications. Pedestrians are well-structured objects that can be partitioned, although detection errors cause slightly misaligned bounding boxes, which lead to mismatches. In this paper, we study the person re-identification performance of using variously designed pedestrian parts instead of the horizontal partitioning routine typically applied in previous hand-crafted part works, and thereby obtain more effective feature descriptors. Specifically, we benchmark the accuracy of individual part matching with discriminatively trained Convolutional Neural Network(CNN) descriptors on the Market-1501 dataset. We also investigate the complementarity among different parts using combination and ablation studies, and provide novel insights into this issue. Compared with the state-of-the-art, our method yields a competitive accuracy rate when the best part combination is used on two large-scale datasets(Market-1501 and CUHK03) and one small-scale dataset(VIPeR).展开更多
Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification(Re-ID).Re-ID models suffer from varying degrees of backgrou...Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification(Re-ID).Re-ID models suffer from varying degrees of background interference caused by continuous scene variations.The recently proposed segment anything model(SAM)has demonstrated exceptional performance in zero-shot segmentation tasks.The combination of SAM and vehicle Re-ID models can achieve efficient separation of vehicle identity and background information.This paper proposes a method that combines SAM-driven mask autoencoder(MAE)pre-training and backgroundaware meta-learning for unsupervised vehicle Re-ID.The method consists of three sub-modules.First,the segmentation capacity of SAM is utilized to separate the vehicle identity region from the background.SAM cannot be robustly employed in exceptional situations,such as those with ambiguity or occlusion.Thus,in vehicle Re-ID downstream tasks,a spatiallyconstrained vehicle background segmentation method is presented to obtain accurate background segmentation results.Second,SAM-driven MAE pre-training utilizes the aforementioned segmentation results to select patches belonging to the vehicle and to mask other patches,allowing MAE to learn identity-sensitive features in a self-supervised manner.Finally,we present a background-aware meta-learning method to fit varying degrees of background interference in different scenarios by combining different background region ratios.Our experiments demonstrate that the proposed method has state-of-the-art performance in reducing background interference variations.展开更多
Innovations on the Internet of Everything(IoE)enabled systems are driving a change in the settings where we interact in smart units,recognized globally as smart city environments.However,intelligent video-surveillance...Innovations on the Internet of Everything(IoE)enabled systems are driving a change in the settings where we interact in smart units,recognized globally as smart city environments.However,intelligent video-surveillance systems are critical to increasing the security of these smart cities.More precisely,in today’s world of smart video surveillance,person re-identification(Re-ID)has gained increased consideration by researchers.Various researchers have designed deep learningbased algorithms for person Re-ID because they have achieved substantial breakthroughs in computer vision problems.In this line of research,we designed an adaptive feature refinementbased deep learning architecture to conduct person Re-ID.In the proposed architecture,the inter-channel and inter-spatial relationship of features between the images of the same individual taken from nonidentical camera viewpoints are focused on learning spatial and channel attention.In addition,the spatial pyramid pooling layer is inserted to extract the multiscale and fixed-dimension feature vectors irrespective of the size of the feature maps.Furthermore,the model’s effectiveness is validated on the CUHK01 and CUHK02 datasets.When compared with existing approaches,the approach presented in this paper achieves encouraging Rank 1 and 5 scores of 24.6% and 54.8%,respectively.展开更多
As a vital vision task,person re-identification(Re-ID)aims to retrieve the same person under non-overlapping cameras.It is a very challenging task due to the presence of complex backgrounds,diverse illuminations and d...As a vital vision task,person re-identification(Re-ID)aims to retrieve the same person under non-overlapping cameras.It is a very challenging task due to the presence of complex backgrounds,diverse illuminations and different perspectives.In this work,we integrate the advantages of convolutional neural networks(CNNs)and transformers,and propose a novel learning framework named convolutional multi-level transformer(CMT)for image-based person Re-ID.More specifically,wefirst propose a scale-aware feature enhancement(SFE)module to extract multi-scale local features from a pre-trained CNN backbone.Then,we introduce a part-aware transformer encoder(PTE)to further mine discriminative local information guided by global semantics.Finally,a deeply-supervised learning(DSL)technique is adopted to optimize the proposed CMT and improve its training efficiency.Extensive experiments on four large-scale Re-ID benchmarks demonstrate that our method performs favorably against several state-of-the-art methods.展开更多
Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the...Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the same label after clustering.The identity-independent information contained in different local regions leads to different levels of local noise.To address these challenges,joint training with local soft attention and dual cross-neighbor label smoothing(DCLS)is proposed in this study.First,the joint training is divided into global and local parts,whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions,which improves the ability of the re-identification model in identifying a person’s local significant features.Second,DCLS is designed to progressively mitigate label noise in different local regions.The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions,thereby achieving label smoothing of the global and local regions throughout the training process.In extensive experiments,the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.展开更多
The primary goal of visible-infrared person re-identification(VI-ReID)is to match pedestrian photos obtained during the day and night.The majority of existing methods simply generate auxiliary modalities to reduce the...The primary goal of visible-infrared person re-identification(VI-ReID)is to match pedestrian photos obtained during the day and night.The majority of existing methods simply generate auxiliary modalities to reduce the modality discrepancy for cross-modality matching.They capture modality-invariant representations but ignore the extraction of modality-specific representations that can aid in distinguishing among various identities of the same modality.To alleviate these issues,this work provides a novel specific and shared representations learning(SSRL)model for VI-ReID to learn modality-specific and modality-shared representations.We design a shared branch in SSRL to bridge the image-level gap and learn modality-shared representations,while a specific branch retains the discriminative information of visible images to learn modality-specific representations.In addition,we propose intra-class aggregation and inter-class separation learning strategies to optimize the distribution of feature embeddings at afine-grained level.Extensive experimental results on two challenging benchmark datasets,SYSU-MM01 and RegDB,demonstrate the superior performance of SSRL over state-of-the-art methods.展开更多
The requirement for precise detection and recognition of target pedestrians in unprocessed real-world imagery drives the formulation of person search as an integrated technological framework that unifies pedestrian de...The requirement for precise detection and recognition of target pedestrians in unprocessed real-world imagery drives the formulation of person search as an integrated technological framework that unifies pedestrian detection and person re-identification(Re-ID).However,the inherent discrepancy between the optimization objectives of coarse-grained localization in pedestrian detection and fine-grained discriminative learning in Re-ID,combined with the substantial performance degradation of Re-ID during joint training caused by the Faster R-CNN-based branch,collectively constitutes a critical bottleneck for person search.In this work,we propose a cascaded person searchmodel(SeqXt)based on SeqNet and ConvNeXt that adopts a sequential end-to-end network as its core architecture,artfully integrates the design logic of the two-stepmethod and one-step method framework,and concurrently incorporates the two-step method’s advantage in efficient subtask handling while preserving the one-step method’s efficiency in end-toend training.Firstly,we utilize ConvNeXt-Base as the feature extraction module,which incorporates part of the design concept of Transformer,enhances the consideration of global context information,and boosts feature discrimination through an implicit self-attention mechanism.Secondly,we introduce prototype-guided normalization for calibrating the feature distribution,which leverages the archetype features of individual identities to calibrate the feature distribution and thereby prevents features from being overly inclined towards frequently occurring IDs,notably improving the intra-class compactness and inter-class separability of person identities.Finally,we put forward an innovative loss function named the Dynamic Online Instance Matching Loss Function(DOIM),which employs the hard sample assistantmethod to adaptively update the lookup table(LUT)and the circular queue(CQ)and aims to further enhance the distinctiveness of features between classes.Experimental results on the public datasets CUHK-SYSU and PRWand the private dataset UESTC-PS show that the proposed method achieves state-of-the-art results.展开更多
基金Supported by the National Natural Science Foundation of China(No.61976098)the Natural Science Foundation for Outstanding Young Scholars of Fujian Province(No.2022J06023).
文摘Unsupervised vehicle re-identification(Re-ID)methods have garnered widespread attention due to their potential in real-world traffic monitoring.However,existing unsupervised domain adaptation techniques often rely on pseudo-labels generated from the source domain,which struggle to effectively address the diversity and dynamic nature of real-world scenarios.Given the limited variety of common vehicle types,enhancing the model’s generalization capability across these types is crucial.To this end,an innovative approach called meta-type generalization(MTG)is proposed.By dividing the training data into meta-train and meta-test sets based on vehicle type information,a novel gradient interaction computation strategy is designed to enhance the model’s ability to learn typeinvariant features.Integrated into the ResNet50 backbone,the MTG model achieves improvements of 4.50%and 12.04%on the Veri-776 and VRAI datasets,respectively,compared with traditional unsupervised algorithms,and surpasses current state-of-the-art methods.This achievement holds promise for application in intelligent traffic systems,enabling more efficient urban traffic solutions.
基金We acknowledge funding from National Natural Science Foundation of China under Grants Nos.62101213,62103165the Shandong Provincial Natural Science Foundation under Grant Nos.ZR2020QF107,ZR2020MF137,ZR2021QF043.
文摘Video-based person re-identification(Re-ID),a subset of retrieval tasks,faces challenges like uncoordinated sample capturing,viewpoint variations,occlusions,cluttered backgrounds,and sequence uncertainties.Recent advancements in deep learning have significantly improved video-based person Re-ID,laying a solid foundation for further progress in the field.In order to enrich researchers’insights into the latest research findings and prospective developments,we offer an extensive overview and meticulous analysis of contemporary video-based person ReID methodologies,with a specific emphasis on network architecture design and loss function design.Firstly,we introduce methods based on network architecture design and loss function design from multiple perspectives,and analyzes the advantages and disadvantages of these methods.Furthermore,we provide a synthesis of prevalent datasets and key evaluation metrics utilized within this field to assist researchers in assessing methodological efficacy and establishing benchmarks for performance evaluation.Lastly,through a critical evaluation of the experimental outcomes derived from various methodologies across four prominent public datasets,we identify promising research avenues and offer valuable insights to steer future exploration and innovation in this vibrant and evolving field of video-based person Re-ID.This comprehensive analysis aims to equip researchers with the necessary knowledge and strategic foresight to navigate the complexities of video-based person Re-ID,fostering continued progress and breakthroughs in this challenging yet promising research domain.
基金supported by National Natural Science Foundation of China(Nos.61976002,61976003 and 61860206004)Natural Science Foundation of Anhui Higher Education Institutions of China(No.KJ2019A0033)the Open Project Program of the National Laboratory of Pattern Recognition(No.201900046)。
文摘Person re-identification(Re-ID)is the scientific task of finding specific person images of a person in a non-overlapping camera networks,and has achieved many breakthroughs recently.However,it remains very challenging in adverse environmental conditions,especially in dark areas or at nighttime due to the imaging limitations of a single visible light source.To handle this problem,we propose a novel deep red green blue(RGB)-thermal(RGBT)representation learning framework for a single modality RGB person ReID.Due to the lack of thermal data in prevalent RGB Re-ID datasets,we propose to use the generative adversarial network to translate labeled RGB images of person to thermal infrared ones,trained on existing RGBT datasets.The labeled RGB images and the synthetic thermal images make up a labeled RGBT training set,and we propose a cross-modal attention network to learn effective RGBT representations for person Re-ID in day and night by leveraging the complementary advantages of RGB and thermal modalities.Extensive experiments on Market1501,CUHK03 and Duke MTMC-re ID datasets demonstrate the effectiveness of our method,which achieves stateof-the-art performance on all above person Re-ID datasets.
基金supported by the National Natural Science Foundation of China(6147115461876057)the Fundamental Research Funds for Central Universities(JZ2018YYPY0287)
文摘Person re-identification(re-id)involves matching a person across nonoverlapping views,with different poses,illuminations and conditions.Visual attributes are understandable semantic information to help improve the issues including illumination changes,viewpoint variations and occlusions.This paper proposes an end-to-end framework of deep learning for attribute-based person re-id.In the feature representation stage of framework,the improved convolutional neural network(CNN)model is designed to leverage the information contained in automatically detected attributes and learned low-dimensional CNN features.Moreover,an attribute classifier is trained on separate data and includes its responses into the training process of our person re-id model.The coupled clusters loss function is used in the training stage of the framework,which enhances the discriminability of both types of features.The combined features are mapped into the Euclidean space.The L2 distance can be used to calculate the distance between any two pedestrians to determine whether they are the same.Extensive experiments validate the superiority and advantages of our proposed framework over state-of-the-art competitors on contemporary challenging person re-id datasets.
文摘Person re-identification(Re-ID)has achieved great progress in recent years.However,person Re-ID methods are still suffering from body part missing and occlusion problems,which makes the learned representations less reliable.In this paper,we pro⁃pose a robust coarse granularity part-level network(CGPN)for person Re-ID,which ex⁃tracts robust regional features and integrates supervised global features for pedestrian im⁃ages.CGPN gains two-fold benefit toward higher accuracy for person Re-ID.On one hand,CGPN learns to extract effective regional features for pedestrian images.On the other hand,compared with extracting global features directly by backbone network,CGPN learns to extract more accurate global features with a supervision strategy.The single mod⁃el trained on three Re-ID datasets achieves state-of-the-art performances.Especially on CUHK03,the most challenging Re-ID dataset,we obtain a top result of Rank-1/mean av⁃erage precision(mAP)=87.1%/83.6%without re-ranking.
基金Supported by the National Natural Science Foundation of China (No.61976080)the Science and Technology Key Project of Science and Technology Department of Henan Province (No.212102310298)+1 种基金the Innovation and Quality Improvement Project for Graduate Education of Henan University (No.SYL20010101)the Academic Degress&Graduate Education Reform Project of Henan Province (2021SJLX195Y)。
文摘Person re-identification(Re-ID) is integral to intelligent monitoring systems.However,due to the variability in viewing angles and illumination,it is easy to cause visual ambiguities,affecting the accuracy of person re-identification.An approach for person re-identification based on feature mapping space and sample determination is proposed.At first,a weight fusion model,including mean and maximum value of the horizontal occurrence in local features,is introduced into the mapping space to optimize local features.Then,the Gaussian distribution model with hierarchical mean and covariance of pixel features is introduced to enhance feature expression.Finally,considering the influence of the size of samples on metric learning performance,the appropriate metric learning is selected by sample determination method to further improve the performance of person re-identification.Experimental results on the VIPeR,PRID450 S and CUHK01 datasets demonstrate that the proposed method is better than the traditional methods.
基金supported by the National Natural Science Foundation of China (Nos. 61771288 and 61701277)the State Key Development Program of the 13th FiveYear Plan (No. 2017YFC0821601)
文摘Person re-IDentification(re-ID) is an important research topic in the computer vision community, with significance for a range of applications. Pedestrians are well-structured objects that can be partitioned, although detection errors cause slightly misaligned bounding boxes, which lead to mismatches. In this paper, we study the person re-identification performance of using variously designed pedestrian parts instead of the horizontal partitioning routine typically applied in previous hand-crafted part works, and thereby obtain more effective feature descriptors. Specifically, we benchmark the accuracy of individual part matching with discriminatively trained Convolutional Neural Network(CNN) descriptors on the Market-1501 dataset. We also investigate the complementarity among different parts using combination and ablation studies, and provide novel insights into this issue. Compared with the state-of-the-art, our method yields a competitive accuracy rate when the best part combination is used on two large-scale datasets(Market-1501 and CUHK03) and one small-scale dataset(VIPeR).
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Nos.20224BAB212011,20232BAB212008,and 20232BAB202051.
文摘Distinguishing identity-unrelated background information from discriminative identity information poses a challenge in unsupervised vehicle re-identification(Re-ID).Re-ID models suffer from varying degrees of background interference caused by continuous scene variations.The recently proposed segment anything model(SAM)has demonstrated exceptional performance in zero-shot segmentation tasks.The combination of SAM and vehicle Re-ID models can achieve efficient separation of vehicle identity and background information.This paper proposes a method that combines SAM-driven mask autoencoder(MAE)pre-training and backgroundaware meta-learning for unsupervised vehicle Re-ID.The method consists of three sub-modules.First,the segmentation capacity of SAM is utilized to separate the vehicle identity region from the background.SAM cannot be robustly employed in exceptional situations,such as those with ambiguity or occlusion.Thus,in vehicle Re-ID downstream tasks,a spatiallyconstrained vehicle background segmentation method is presented to obtain accurate background segmentation results.Second,SAM-driven MAE pre-training utilizes the aforementioned segmentation results to select patches belonging to the vehicle and to mask other patches,allowing MAE to learn identity-sensitive features in a self-supervised manner.Finally,we present a background-aware meta-learning method to fit varying degrees of background interference in different scenarios by combining different background region ratios.Our experiments demonstrate that the proposed method has state-of-the-art performance in reducing background interference variations.
基金supported by Korea Institute for Advancement of Technology(KIAT)grant funded by the Korea Government(MOTIE)(P0008703,The Competency Development Program for Industry Specialist)the MSIT(Ministry of Science and ICT),Republic of Korea,under the ITRC(Information Technology Research Center)support program(IITP-2022-2018-0-01799)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘Innovations on the Internet of Everything(IoE)enabled systems are driving a change in the settings where we interact in smart units,recognized globally as smart city environments.However,intelligent video-surveillance systems are critical to increasing the security of these smart cities.More precisely,in today’s world of smart video surveillance,person re-identification(Re-ID)has gained increased consideration by researchers.Various researchers have designed deep learningbased algorithms for person Re-ID because they have achieved substantial breakthroughs in computer vision problems.In this line of research,we designed an adaptive feature refinementbased deep learning architecture to conduct person Re-ID.In the proposed architecture,the inter-channel and inter-spatial relationship of features between the images of the same individual taken from nonidentical camera viewpoints are focused on learning spatial and channel attention.In addition,the spatial pyramid pooling layer is inserted to extract the multiscale and fixed-dimension feature vectors irrespective of the size of the feature maps.Furthermore,the model’s effectiveness is validated on the CUHK01 and CUHK02 datasets.When compared with existing approaches,the approach presented in this paper achieves encouraging Rank 1 and 5 scores of 24.6% and 54.8%,respectively.
文摘As a vital vision task,person re-identification(Re-ID)aims to retrieve the same person under non-overlapping cameras.It is a very challenging task due to the presence of complex backgrounds,diverse illuminations and different perspectives.In this work,we integrate the advantages of convolutional neural networks(CNNs)and transformers,and propose a novel learning framework named convolutional multi-level transformer(CMT)for image-based person Re-ID.More specifically,wefirst propose a scale-aware feature enhancement(SFE)module to extract multi-scale local features from a pre-trained CNN backbone.Then,we introduce a part-aware transformer encoder(PTE)to further mine discriminative local information guided by global semantics.Finally,a deeply-supervised learning(DSL)technique is adopted to optimize the proposed CMT and improve its training efficiency.Extensive experiments on four large-scale Re-ID benchmarks demonstrate that our method performs favorably against several state-of-the-art methods.
基金supported by the National Natural Science Foundation of China under Grant Nos.62076117 and 62166026the Jiangxi Key Laboratory of Smart City under Grant No.20192BCD40002Jiangxi Provincial Natural Science Foundation under Grant No.20224BAB212011.
文摘Existing unsupervised person re-identification approaches fail to fully capture thefine-grained features of local regions,which can result in people with similar appearances and different identities being assigned the same label after clustering.The identity-independent information contained in different local regions leads to different levels of local noise.To address these challenges,joint training with local soft attention and dual cross-neighbor label smoothing(DCLS)is proposed in this study.First,the joint training is divided into global and local parts,whereby a soft attention mechanism is proposed for the local branch to accurately capture the subtle differences in local regions,which improves the ability of the re-identification model in identifying a person’s local significant features.Second,DCLS is designed to progressively mitigate label noise in different local regions.The DCLS uses global and local similarity metrics to semantically align the global and local regions of the person and further determines the proximity association between local regions through the cross information of neighboring regions,thereby achieving label smoothing of the global and local regions throughout the training process.In extensive experiments,the proposed method outperformed existing methods under unsupervised settings on several standard person re-identification datasets.
基金supported by the National Key R&D Program of China(2022ZD0160605)the National Natural Science Foundation of China(61976002)+3 种基金the University Synergy Innovation Program of Anhui Province(GXXT-2022-036)the Natural Science Foundation of Anhui Province(No.2208085J18)the National Natural Science Foundation of China under Grant(62106006)the Natural Science Foundation of Anhui Higher Education Institution(No.2022AH040014).
文摘The primary goal of visible-infrared person re-identification(VI-ReID)is to match pedestrian photos obtained during the day and night.The majority of existing methods simply generate auxiliary modalities to reduce the modality discrepancy for cross-modality matching.They capture modality-invariant representations but ignore the extraction of modality-specific representations that can aid in distinguishing among various identities of the same modality.To alleviate these issues,this work provides a novel specific and shared representations learning(SSRL)model for VI-ReID to learn modality-specific and modality-shared representations.We design a shared branch in SSRL to bridge the image-level gap and learn modality-shared representations,while a specific branch retains the discriminative information of visible images to learn modality-specific representations.In addition,we propose intra-class aggregation and inter-class separation learning strategies to optimize the distribution of feature embeddings at afine-grained level.Extensive experimental results on two challenging benchmark datasets,SYSU-MM01 and RegDB,demonstrate the superior performance of SSRL over state-of-the-art methods.
基金supported by the major science and technology special projects of Xinjiang(No.2024B03041)the scientific and technological projects of Kashgar(No.KS2024024).
文摘The requirement for precise detection and recognition of target pedestrians in unprocessed real-world imagery drives the formulation of person search as an integrated technological framework that unifies pedestrian detection and person re-identification(Re-ID).However,the inherent discrepancy between the optimization objectives of coarse-grained localization in pedestrian detection and fine-grained discriminative learning in Re-ID,combined with the substantial performance degradation of Re-ID during joint training caused by the Faster R-CNN-based branch,collectively constitutes a critical bottleneck for person search.In this work,we propose a cascaded person searchmodel(SeqXt)based on SeqNet and ConvNeXt that adopts a sequential end-to-end network as its core architecture,artfully integrates the design logic of the two-stepmethod and one-step method framework,and concurrently incorporates the two-step method’s advantage in efficient subtask handling while preserving the one-step method’s efficiency in end-toend training.Firstly,we utilize ConvNeXt-Base as the feature extraction module,which incorporates part of the design concept of Transformer,enhances the consideration of global context information,and boosts feature discrimination through an implicit self-attention mechanism.Secondly,we introduce prototype-guided normalization for calibrating the feature distribution,which leverages the archetype features of individual identities to calibrate the feature distribution and thereby prevents features from being overly inclined towards frequently occurring IDs,notably improving the intra-class compactness and inter-class separability of person identities.Finally,we put forward an innovative loss function named the Dynamic Online Instance Matching Loss Function(DOIM),which employs the hard sample assistantmethod to adaptively update the lookup table(LUT)and the circular queue(CQ)and aims to further enhance the distinctiveness of features between classes.Experimental results on the public datasets CUHK-SYSU and PRWand the private dataset UESTC-PS show that the proposed method achieves state-of-the-art results.