As a vital vision task,person re-identification(Re-ID)aims to retrieve the same person under non-overlapping cameras.It is a very challenging task due to the presence of complex backgrounds,diverse illuminations and d...As a vital vision task,person re-identification(Re-ID)aims to retrieve the same person under non-overlapping cameras.It is a very challenging task due to the presence of complex backgrounds,diverse illuminations and different perspectives.In this work,we integrate the advantages of convolutional neural networks(CNNs)and transformers,and propose a novel learning framework named convolutional multi-level transformer(CMT)for image-based person Re-ID.More specifically,wefirst propose a scale-aware feature enhancement(SFE)module to extract multi-scale local features from a pre-trained CNN backbone.Then,we introduce a part-aware transformer encoder(PTE)to further mine discriminative local information guided by global semantics.Finally,a deeply-supervised learning(DSL)technique is adopted to optimize the proposed CMT and improve its training efficiency.Extensive experiments on four large-scale Re-ID benchmarks demonstrate that our method performs favorably against several state-of-the-art methods.展开更多
文摘As a vital vision task,person re-identification(Re-ID)aims to retrieve the same person under non-overlapping cameras.It is a very challenging task due to the presence of complex backgrounds,diverse illuminations and different perspectives.In this work,we integrate the advantages of convolutional neural networks(CNNs)and transformers,and propose a novel learning framework named convolutional multi-level transformer(CMT)for image-based person Re-ID.More specifically,wefirst propose a scale-aware feature enhancement(SFE)module to extract multi-scale local features from a pre-trained CNN backbone.Then,we introduce a part-aware transformer encoder(PTE)to further mine discriminative local information guided by global semantics.Finally,a deeply-supervised learning(DSL)technique is adopted to optimize the proposed CMT and improve its training efficiency.Extensive experiments on four large-scale Re-ID benchmarks demonstrate that our method performs favorably against several state-of-the-art methods.