期刊文献+

基于多尺度网络与轴向注意力的3D目标检测算法

3D object detection algorithm based on multi-scale network and axial attention
在线阅读 下载PDF
导出
摘要 在3D目标检测中小目标诸如行人和骑行者的检测精确度较低,这是自动驾驶感知系统中存在的挑战性问题。为了准确估计周围环境的状态从而提高行车安全,对Voxel R-CNN(Voxel Region-based Convolutional Neural Network)算法进行改进,提出一种基于多尺度网络与轴向注意力的3D目标检测算法。首先,在主干网络中构建多尺度网络和像素级融合模块(PFM)获取更丰富和精准的特征表示,从而增强算法在复杂场景下的鲁棒性和泛化能力;其次,设计适用于具有3D空间维度特征的轴向注意力,并将它应用于感兴趣区域(RoI)的多尺度池化特征,以在有效捕捉局部和全局特征的同时保留3D空间结构中的重要信息,从而提升算法的目标检测和分类的精度和效率;最后,将一种旋转解耦的交并比(RDIoU)方法纳入回归和分类分支,从而使网络学习更精确的边界框,并解决分类和回归之间的对齐问题。在KITTI公开数据集上的实验结果表明,所提算法对行人和骑行者的平均精度均值(mAP)分别达到了62.25%和79.36%,与基准算法Voxel R-CNN相比分别提高了4.02和3.15个百分点,显示出了改进算法在难感知目标检测上的有效性。 In 3D object detection,the detection accuracy of small targets such as pedestrians and cyclists remains low,presenting a challenging issue to perception systems of autonomous vehicles.To estimate the state of surrounding environment accurately and enhance driving safety,a 3D object detection algorithm based on a multi-scale network and axial attention was proposed after improving Voxel R-CNN(Voxel Region-based Convolutional Neural Network)algorithm.Firstly,a multi-scale network and a Pixel-level Fusion Module(PFM)were constructed in the backbone network to obtain richer and more precise feature representations,thereby enhancing robustness and generalization of the algorithm in complex scenarios.Secondly,an axial attention mechanism,tailored for 3D spatial dimension features,was designed and applied to Region of Interest(RoI)multi-scale pooling features,so as to capture both local and global features effectively while preserving essential information in 3D spatial structure,thereby improving accuracy and efficiency of object detection and classification of the algorithm.Finally,a Rotation-Decoupled Intersection over Union(RDIoU)method was brought into regression and classification branches,thereby enabling network to learn more precise bounding boxes and addressing alignment issue between classification and regression.Experimental results on KITTI public dataset show that the proposed algorithm achieves the mean Average Precision(mAP)of 62.25%for pedestrians and 79.36%for cyclists,which are improved by 4.02 and 3.15 percentage points,respectively,compared to baseline algorithm Voxel R-CNN,demonstrating the effectiveness of the improved algorithm in detecting hard-to-perceive objects.
作者 颜承志 陈颖 钟凯 高寒 YAN Chengzhi;CHEN Ying;ZHONG Kai;GAO Han(School of Computer Science and Information Engineering,Shanghai Institute of Technology,Shanghai 201418,China)
出处 《计算机应用》 北大核心 2025年第8期2537-2545,共9页 journal of Computer Applications
基金 国家自然科学基金资助项目(61976140) 上海应用技术大学协同创新基金资助项目(XTCX2022-25)。
关键词 3D目标检测 多尺度网络 特征融合 轴向注意力 损失函数 3D object detection multi-scale network feature fusion axial attention loss function
  • 相关文献

参考文献1

二级参考文献2

共引文献16

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部