水下目标检测在海洋资源开发与生态环境监测中至关重要,但水下图像的低对比度、色彩失真及复杂背景干扰为精准检测带来巨大挑战。为克服传统方法在特征提取与小目标识别上的局限,本文提出一种深度融合Swin Transformer与YOLO11架构的新...水下目标检测在海洋资源开发与生态环境监测中至关重要,但水下图像的低对比度、色彩失真及复杂背景干扰为精准检测带来巨大挑战。为克服传统方法在特征提取与小目标识别上的局限,本文提出一种深度融合Swin Transformer与YOLO11架构的新型检测模型(A Novel Detection Model with Deep Integration of Swin Transformer and YOLO11 Architectures, YOLO11-Swin)。该模型以Swin Transformer作为主干特征提取网络,利用其分层设计与滑动窗口自注意力机制,有效捕获图像的全局上下文依赖关系,增强对模糊、遮挡目标的表征能力。在特征融合阶段,本文设计了一种跨层特征聚合机制(Cross-layer Feature Aggregation, CFA),通过全局池化与自适应权重计算,引导不同尺度特征图进行高效信息交互,以解决特征金字塔中的语义间隙与尺度不匹配问题。此外,在各级特征图输出端嵌入卷积注意力模块(Convolutional Block Attention Module, CBAM),通过串行的通道与空间注意力子模块,自适应地优化特征响应,突出目标区域并抑制背景噪声。针对水下数据集正负样本不均衡的问题,模型采用Focal Loss作为分类损失函数,以聚焦困难样本的训练,提升模型收敛速度与稳健性。在URPC数据集上的实验结果表明,YOLO11-Swin的mAP@50达到75.54%,相比基线YOLO11模型显著提升9.42%。特别地,对小目标(如扇贝)的检测平均精度(AP)提升10.16%,召回率(Recall)提高4.55%,充分验证了所提模型在复杂水下环境下的有效性与先进性。展开更多
Vehicle re-identification(ReID)is a challenging task in intelligent transportation,and urban surveillance systems due to its complications in camera viewpoints,vehicle scales,and environmental conditions.Recent transf...Vehicle re-identification(ReID)is a challenging task in intelligent transportation,and urban surveillance systems due to its complications in camera viewpoints,vehicle scales,and environmental conditions.Recent transformer-based approaches have shown impressive performance by utilizing global dependencies,these models struggle with aspect ratio distortions and may overlook fine-grained local attributes crucial for distinguishing visually similar vehicles.We introduce a framework based on Swin Transformers that addresses these challenges by implementing three components.First,to improve feature robustness and maintain vehicle proportions,our Aspect Ratio-Aware Swin Transformer(AR-Swin)preserve the native ratio via letterbox,uses a non-square(16×8)patch-embedding stem,and keeps fixed 7×7 token windows.Second,we introduce a Dynamic Feature Fusion Network(DFFNet)that adaptively integrates global Swin features with local attribute embeddings;such as color and vehicle type enablingmore discriminative representations.Third,our Regional Attention Blocks incorporate regionalmasks into the transformer’s windowed attentionmechanism,effectively highlighting critical details like manufacturer logos or lights.On VeRi-776,we obtain 82.55 mAP,97.26 Rank-1 and 99.23 Rank-5,and on VehicleID we obtain 91.8 Rank-1 and 97.75 Rank-5.The design is drop-in for Swin backbones and emphasizes robustness without increasing architectural complexity.Code:https://github.com/sft110/Swinvreid.展开更多
文摘水下目标检测在海洋资源开发与生态环境监测中至关重要,但水下图像的低对比度、色彩失真及复杂背景干扰为精准检测带来巨大挑战。为克服传统方法在特征提取与小目标识别上的局限,本文提出一种深度融合Swin Transformer与YOLO11架构的新型检测模型(A Novel Detection Model with Deep Integration of Swin Transformer and YOLO11 Architectures, YOLO11-Swin)。该模型以Swin Transformer作为主干特征提取网络,利用其分层设计与滑动窗口自注意力机制,有效捕获图像的全局上下文依赖关系,增强对模糊、遮挡目标的表征能力。在特征融合阶段,本文设计了一种跨层特征聚合机制(Cross-layer Feature Aggregation, CFA),通过全局池化与自适应权重计算,引导不同尺度特征图进行高效信息交互,以解决特征金字塔中的语义间隙与尺度不匹配问题。此外,在各级特征图输出端嵌入卷积注意力模块(Convolutional Block Attention Module, CBAM),通过串行的通道与空间注意力子模块,自适应地优化特征响应,突出目标区域并抑制背景噪声。针对水下数据集正负样本不均衡的问题,模型采用Focal Loss作为分类损失函数,以聚焦困难样本的训练,提升模型收敛速度与稳健性。在URPC数据集上的实验结果表明,YOLO11-Swin的mAP@50达到75.54%,相比基线YOLO11模型显著提升9.42%。特别地,对小目标(如扇贝)的检测平均精度(AP)提升10.16%,召回率(Recall)提高4.55%,充分验证了所提模型在复杂水下环境下的有效性与先进性。
基金supported by SDAIA-KFUPM Joint Research Center of Artificial Intelligence,Deanship of Research,King Fahd University of Petroleum and Minerals,under Grant#CAI02562(JRC-AI-RFP-17).
文摘Vehicle re-identification(ReID)is a challenging task in intelligent transportation,and urban surveillance systems due to its complications in camera viewpoints,vehicle scales,and environmental conditions.Recent transformer-based approaches have shown impressive performance by utilizing global dependencies,these models struggle with aspect ratio distortions and may overlook fine-grained local attributes crucial for distinguishing visually similar vehicles.We introduce a framework based on Swin Transformers that addresses these challenges by implementing three components.First,to improve feature robustness and maintain vehicle proportions,our Aspect Ratio-Aware Swin Transformer(AR-Swin)preserve the native ratio via letterbox,uses a non-square(16×8)patch-embedding stem,and keeps fixed 7×7 token windows.Second,we introduce a Dynamic Feature Fusion Network(DFFNet)that adaptively integrates global Swin features with local attribute embeddings;such as color and vehicle type enablingmore discriminative representations.Third,our Regional Attention Blocks incorporate regionalmasks into the transformer’s windowed attentionmechanism,effectively highlighting critical details like manufacturer logos or lights.On VeRi-776,we obtain 82.55 mAP,97.26 Rank-1 and 99.23 Rank-5,and on VehicleID we obtain 91.8 Rank-1 and 97.75 Rank-5.The design is drop-in for Swin backbones and emphasizes robustness without increasing architectural complexity.Code:https://github.com/sft110/Swinvreid.