Vehicle re-identification(ReID)is a challenging task in intelligent transportation,and urban surveillance systems due to its complications in camera viewpoints,vehicle scales,and environmental conditions.Recent transf...Vehicle re-identification(ReID)is a challenging task in intelligent transportation,and urban surveillance systems due to its complications in camera viewpoints,vehicle scales,and environmental conditions.Recent transformer-based approaches have shown impressive performance by utilizing global dependencies,these models struggle with aspect ratio distortions and may overlook fine-grained local attributes crucial for distinguishing visually similar vehicles.We introduce a framework based on Swin Transformers that addresses these challenges by implementing three components.First,to improve feature robustness and maintain vehicle proportions,our Aspect Ratio-Aware Swin Transformer(AR-Swin)preserve the native ratio via letterbox,uses a non-square(16×8)patch-embedding stem,and keeps fixed 7×7 token windows.Second,we introduce a Dynamic Feature Fusion Network(DFFNet)that adaptively integrates global Swin features with local attribute embeddings;such as color and vehicle type enablingmore discriminative representations.Third,our Regional Attention Blocks incorporate regionalmasks into the transformer’s windowed attentionmechanism,effectively highlighting critical details like manufacturer logos or lights.On VeRi-776,we obtain 82.55 mAP,97.26 Rank-1 and 99.23 Rank-5,and on VehicleID we obtain 91.8 Rank-1 and 97.75 Rank-5.The design is drop-in for Swin backbones and emphasizes robustness without increasing architectural complexity.Code:https://github.com/sft110/Swinvreid.展开更多
基金supported by SDAIA-KFUPM Joint Research Center of Artificial Intelligence,Deanship of Research,King Fahd University of Petroleum and Minerals,under Grant#CAI02562(JRC-AI-RFP-17).
文摘Vehicle re-identification(ReID)is a challenging task in intelligent transportation,and urban surveillance systems due to its complications in camera viewpoints,vehicle scales,and environmental conditions.Recent transformer-based approaches have shown impressive performance by utilizing global dependencies,these models struggle with aspect ratio distortions and may overlook fine-grained local attributes crucial for distinguishing visually similar vehicles.We introduce a framework based on Swin Transformers that addresses these challenges by implementing three components.First,to improve feature robustness and maintain vehicle proportions,our Aspect Ratio-Aware Swin Transformer(AR-Swin)preserve the native ratio via letterbox,uses a non-square(16×8)patch-embedding stem,and keeps fixed 7×7 token windows.Second,we introduce a Dynamic Feature Fusion Network(DFFNet)that adaptively integrates global Swin features with local attribute embeddings;such as color and vehicle type enablingmore discriminative representations.Third,our Regional Attention Blocks incorporate regionalmasks into the transformer’s windowed attentionmechanism,effectively highlighting critical details like manufacturer logos or lights.On VeRi-776,we obtain 82.55 mAP,97.26 Rank-1 and 99.23 Rank-5,and on VehicleID we obtain 91.8 Rank-1 and 97.75 Rank-5.The design is drop-in for Swin backbones and emphasizes robustness without increasing architectural complexity.Code:https://github.com/sft110/Swinvreid.