期刊文献+

无人机视角多源目标检测数据集UAV-RGBT及算法基准 被引量:1

UAV-RGBT Multispectral Object Detection Dataset and Algorithm Benchmark
在线阅读 下载PDF
导出
摘要 基于无人机(Unmanned Aerial Vehicle,UAV)平台的可见光(Red Green Blue,RGB)和热红外(Thermal infrared,T)多源目标检测,可实现全天时、全天候的目标侦察,在军用和民用领域有着重要的应用价值.受限于数据拍摄获取和处理的复杂性,当前少有公开的UAV视角RGB-T多源目标检测数据集,一定程度上限制了UAV视角RGB-T多源目标检测算法的研究和应用.与此同时,UAV应用场景复杂多变,其飞行高度、速度、焦距和背景等快速变化,所拍摄目标在图像上呈现出尺度多样、稠密/稀疏分布不均衡、类别不平衡等特点,具有一定的挑战性.此外,在诸如目标侦察、交通监控等高时效性应用场景中,算法需在保证高精度的同时实现实时目标检测,因此,算法的设计必须充分考虑精度与速度之间的平衡.针对上述问题,本文构建了一个跨季节、跨昼夜、多类别、多尺度的大规模UAV视角RGB-T多源图像数据集UAV-RGBT,包含20个类别、5117对RGB-T图像和超11万个标注,有助于推进UAV视角多源目标检测算法的研究.同时,基于YOLOv8n模型,本文提出了一种UAV视角多源目标检测(UAV-based Dualbranch Multispectral object Detection,UAV-DMDet)模型,其通过多源交叉注意力融合和多源特征分解组合方法有效促进了多源特征的深度融合,较好地实现了模型参数量、检测速度和检测精度的均衡.实验结果表明:在UAVRGBT数据集上,UAV-DMDet模型较单源YOLOv8n模型,在RGB和T模态方面,mAP@0.5分别提高了3.61%、11.03%,mAP@0.5:0.95分别提高了0.84%、6.76%;在DroneVehicle数据集上,mAP@0.5和mAP@0.5:0.95较主流算法I2MDet提高了2.66%和12.36%;在检测速度方面,以640×640分辨率图像为例,UAV-DMDet模型在单张GeForce RTX 3090显卡上FP32精度推理速度可达31帧/s,在华为昇腾710处理器上FP16精度推理速度可达58帧/s,可有效应用于UAV视角RGB-T多源实时目标检测任务. Unmanned aerial vehicle(UAV)-based multispectral object detection utilizing both visible(RGB)and thermal infrared(T)images,makes all-weather and all-day target monitoring possible,serving critical roles in military and civilian applications.However,due to the complexity of data acquisition and processing,there is currently a lack of publicly available UAV-based RGB-T multispectral object detection datasets,which to some extent limits its research and application.Meanwhile,UAV operational scenarios are characterized by complex and variable conditions,including rapid changes in flight altitude,speed,focal length,and background.So,the captured targets exhibit diverse scales,uneven(dense/sparse)distributions,and category imbalances in images,which presents significant challenges for accurate detection.Furthermore,real-time requirement should be guaranted in applications such as reconnaissance and traffic monitoring.Therefore,it is the key to keep a trade-off between accuracy and speed in the algorithmic design of UAV RGB-T object detector.To address these issues,this paper introduces a large-scale UAV-based RGB-T multispectral dataset named UAV-RGBT,which spans across seasons and day-night cycles,and includes multiple categories and scales.Specifically,UAV-RGBT comprises 20 categories with 5117 pairs of RGB-T images and over 110000 annotations,which is conducive to advancing research in UAV-based multispectral object detection algorithms.Moreover,based on the YOLOv8n model,the UAV-based dualbranch multispectral object detection(UAV-DMDet)model is proposed to promote deep fusion of multispectral features through a multi-modal cross-attention fusion module and a multi-modal feature decomposition combination module.This approach achieves a batter trade-off among model parameter size,detection speed,and accuracy.Experimental results demonstrate that the UAV-DMDet model improves the mAP@0.5 on the UAV-RGBT dataset by 3.61%and 11.03%in the visible and thermal modalities,respectively,and enhances the mAP@0.5:0.95 by 0.84%and 6.76%,respectively.On the DroneVehicle dataset,the UAV-DMDet model outperforms the mainstream algorithm I2MDet,with mAP@0.5 and mAP@0.5:0.95 improvements of 2.66%and 12.36%,respectively.Furthermore,with 640´640 resolution images as input,the UAVDMDet model achieve FP32 precision inference speed of 31 frames per second on a GeForce RTX 3090 GPU,and FP16 precision inference speed of 58 frames per second on a Huawei Ascend 710 processor,making it effectively applicable for real-time UAV-based RGB-T multispectral object detection tasks.
作者 汪进中 戴顺 张秀伟 田雪涛 邢颖慧 汪芳 尹翰林 张艳宁 WANG Jin-zhong;DAI Shun;ZHANG Xiu-wei;TIAN Xue-tao;XING Yin-hui;WANG Fang;YIN Han-lin;ZHANG Yan-ning(School of Computer Science,Northwestern Polytechnical University,Xi’an,Shaanxi 710072,China;Xi’an ASN Technology Group Co.,Ltd.,Xi’an,Shaanxi 710065,China;Shenzhen Research Institute,Northwestern Polytechnical University,Shenzhen,Guangdong 518063,China)
出处 《电子学报》 北大核心 2025年第3期686-704,共19页 Acta Electronica Sinica
基金 国家自然科学基金(No.61971356) 陕西省自然科学基础研究计划(No.2024JC-DXWT-07,No.2024JCYBQN-0719) 陕西省重点研发计划(No.2023-YBGY-012) 广东省基础与应用基础研究基金(No.2024A1515030186)。
关键词 无人机(UAV) 可见光-热红外(RGB-T)多源目标检测 数据集 多源特征融合 YOLOv8 unmanned aerial vehicle(UAV) visible and thermal infrared multispectral object detection dataset multi-modal feature fusion YOLOv8
  • 相关文献

同被引文献12

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部