This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use Vi...This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use ViT for UDA Re-ID.We observe that the ViT structure provides a unique advantage for UDA Re-ID,i.e.,it has a prompt(the learnable class token)at its bottom layer,that can be used to efficiently condition the deep model for the underlying domain.To utilize this advantage,we propose a novel two-stage UDA pipeline named Prompting And Tuning(PAT)which consists of a prompt learning stage and a subsequent fine-tuning stage.In the first stage,PAT roughly adapts the model from source to target domain by learning the prompts for two domains,while in the second stage,PAT fine-tunes the entire backbone for further adaption to increase the accuracy.Although these two stages both adopt the pseudo labels for training,we show that they have different data preferences.With these two preferences,prompt learning and fine-tuning integrated well with each other and jointly facilitated a competitive PAT method for UDA Re-ID.展开更多
基金This work was supported by the National Key Research and Development Program of China in the 13th Five-Year(No.2016YFB0801301)in the 14th Five-Year(Nos.2021YFFO602103,2021YFF0602102,and 20210Y1702).
文摘This paper explores the Vision Transformer(ViT)backbone for Unsupervised Domain Adaptive(UDA)person Re-Identification(Re-ID).While some recent studies have validated ViT for supervised Re-ID,no study has yet to use ViT for UDA Re-ID.We observe that the ViT structure provides a unique advantage for UDA Re-ID,i.e.,it has a prompt(the learnable class token)at its bottom layer,that can be used to efficiently condition the deep model for the underlying domain.To utilize this advantage,we propose a novel two-stage UDA pipeline named Prompting And Tuning(PAT)which consists of a prompt learning stage and a subsequent fine-tuning stage.In the first stage,PAT roughly adapts the model from source to target domain by learning the prompts for two domains,while in the second stage,PAT fine-tunes the entire backbone for further adaption to increase the accuracy.Although these two stages both adopt the pseudo labels for training,we show that they have different data preferences.With these two preferences,prompt learning and fine-tuning integrated well with each other and jointly facilitated a competitive PAT method for UDA Re-ID.