Generalizable pedestrian attribute recognition(PAR)aims to learn a robust PAR model that can be directly adapted to unknown distributions under varying illumination,different viewpoints and occlusions,which is an esse...Generalizable pedestrian attribute recognition(PAR)aims to learn a robust PAR model that can be directly adapted to unknown distributions under varying illumination,different viewpoints and occlusions,which is an essential problem for real-world applications,such as video surveillance and fashion search.In practice,when a trained PAR model is deployed to real-world scenarios,the unseen target samples are fed into the model continuously in an online manner.Therefore,this paper proposes an efficient and flexible method,named AdaGPAR,for generalizable PAR(GPAR)via test-time adaptation(TTA),where we adapt the trained model through exploiting the unlabeled target samples online during the test phase.As far as we know,it is the first work that solves the GPAR from the perspective of TTA.In particular,the proposed AdaGPAR memorizes the reliable target sample pairs(features and pseudo-labels)as prototypes gradually in the test phase.Then,it makes predictions with a non-parametric classifier by calculating the similarity between a target instance and the prototypes.However,since PAR is a task of multi-label classification,only using the same holistic feature of one pedestrian image as the prototypes of multiple attributes is not optimal.Therefore,an attribute localization branch is introduced to extract the attribute-specific features,where two kinds of memory banks are further constructed to cache the global and attribute-specific features simultaneously.In summary,the AdaGPAR is training-free in the test phase and predicts multiple pedestrian attributes of the target samples in an online manner.This makes the AdaGPAR time efficient and generalizable for real-world applications.Extensive experiments have been performed on the UPAR benchmark to compare the proposed method with multiple baselines.The superior performance demonstrates the effectiveness of the proposed AdaGPAR that improves the generalizability of a PAR model via TTA.展开更多
Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method ba...Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention(VALA), which utilizes view information to guide the recognition process to focus on specific attributes and attention mechanism to localize specific attribute-corresponding areas. Concretely, view information is leveraged by the view prediction branch to generate four view weights that represent the confidences for attributes from different views. View weights are then delivered back to compose specific view-attributes, which will participate and supervise deep feature extraction. In order to explore the spatial location of a view-attribute, regional attention is introduced to aggregate spatial information and encode inter-channel dependencies of the view feature. Subsequently, a fine attentive attribute-specific region is localized, and regional weights for the view-attribute from different spatial locations are gained by the regional attention. The final view-attribute recognition outcome is obtained by combining the view weights with the regional weights. Experiments on three wide datasets(richly annotated pedestrian(RAP), annotated pedestrian v2(RAPv2), and PA-100 K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.展开更多
Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) ...Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.展开更多
基金supported in part by the National Science and Technology Major project,China(No.2022ZD0117901)in part by the National Natural Science Foundation of China(Nos.62373355,62276256 and 62106260).
文摘Generalizable pedestrian attribute recognition(PAR)aims to learn a robust PAR model that can be directly adapted to unknown distributions under varying illumination,different viewpoints and occlusions,which is an essential problem for real-world applications,such as video surveillance and fashion search.In practice,when a trained PAR model is deployed to real-world scenarios,the unseen target samples are fed into the model continuously in an online manner.Therefore,this paper proposes an efficient and flexible method,named AdaGPAR,for generalizable PAR(GPAR)via test-time adaptation(TTA),where we adapt the trained model through exploiting the unlabeled target samples online during the test phase.As far as we know,it is the first work that solves the GPAR from the perspective of TTA.In particular,the proposed AdaGPAR memorizes the reliable target sample pairs(features and pseudo-labels)as prototypes gradually in the test phase.Then,it makes predictions with a non-parametric classifier by calculating the similarity between a target instance and the prototypes.However,since PAR is a task of multi-label classification,only using the same holistic feature of one pedestrian image as the prototypes of multiple attributes is not optimal.Therefore,an attribute localization branch is introduced to extract the attribute-specific features,where two kinds of memory banks are further constructed to cache the global and attribute-specific features simultaneously.In summary,the AdaGPAR is training-free in the test phase and predicts multiple pedestrian attributes of the target samples in an online manner.This makes the AdaGPAR time efficient and generalizable for real-world applications.Extensive experiments have been performed on the UPAR benchmark to compare the proposed method with multiple baselines.The superior performance demonstrates the effectiveness of the proposed AdaGPAR that improves the generalizability of a PAR model via TTA.
基金supported by National Key R&D Program of China(No.2018YFB1308000)Natural Science Foundation of Zhejiang province(No.LY21F 030018)。
文摘Pedestrian attribute recognition in surveillance scenarios is still a challenging task due to the inaccurate localization of specific attributes. In this paper, we propose a novel view-attribute localization method based on attention(VALA), which utilizes view information to guide the recognition process to focus on specific attributes and attention mechanism to localize specific attribute-corresponding areas. Concretely, view information is leveraged by the view prediction branch to generate four view weights that represent the confidences for attributes from different views. View weights are then delivered back to compose specific view-attributes, which will participate and supervise deep feature extraction. In order to explore the spatial location of a view-attribute, regional attention is introduced to aggregate spatial information and encode inter-channel dependencies of the view feature. Subsequently, a fine attentive attribute-specific region is localized, and regional weights for the view-attribute from different spatial locations are gained by the regional attention. The final view-attribute recognition outcome is obtained by combining the view weights with the regional weights. Experiments on three wide datasets(richly annotated pedestrian(RAP), annotated pedestrian v2(RAPv2), and PA-100 K) demonstrate the effectiveness of our approach compared with state-of-the-art methods.
基金supported by the National Natural Science Foundation of China (41874173)。
文摘Pedestrian attribute recognition is often considered as a multi-label image classification task. In order to make full use of attribute-related location information, a saliency guided self-attention network(SGSA-Net) was proposed to weakly supervise attribute localization, without annotations of attribute-related regions. Saliency priors were integrated into the spatial attention module(SAM). Meanwhile, channel-wise attention and spatial attention were introduced into the network. Moreover, a weighted binary cross-entropy loss(WCEL) function was employed to handle the imbalance of training data. Extensive experiments on richly annotated pedestrian(RAP) and pedestrian attribute(PETA) datasets demonstrated that SGSA-Net outperformed other state-of-the-art methods.