The fusion of millimeter-wave radar and camera modalities is crucial for improving the accuracy and completeness of 3-dimensional(3D)object detection.Most existing methods extract features from each modality separatel...The fusion of millimeter-wave radar and camera modalities is crucial for improving the accuracy and completeness of 3-dimensional(3D)object detection.Most existing methods extract features from each modality separately and conduct fusion with specifically designed modules,potentially resulting in information loss during modality transformation.To address this issue,we propose a novel framework for 3D object detection that iteratively updates radar and camera features through an interaction module.This module serves a dual purpose by facilitating the fusion of multi-modal data while preserving the original features.Specifically,radar and image features are sampled and aggregated with a set of sparse 3D object queries,while retaining the integrity of the original radar features to prevent information loss.Additionally,an innovative radar augmentation technique named Radar Gaussian Expansion is proposed.This module allocates radar measurements within each voxel to neighboring ones as a Gaussian distribution,reducing association errors during projection and enhancing detection accuracy.Our proposed framework offers a comprehensive solution to the fusion of radar and camera data,ultimately leading to heightened accuracy and completeness in 3D object detection processes.On the nuScenes test benchmark,our camera-radar fusion method achieves state-of-the-art 3D object detection results with a 41.6% mean average precision and 52.5% nuScenes detection score.展开更多
基金supported by the Young Scientists Fund of the National Natural Science Foundation of China(No.62203289)the Shanghai Sailing Program(No.22YF1413800)+1 种基金the National Natural Science Foundation of China(No.52371371)the Shanghai Municipal Natural Science Foundation(No.21ZR1423300).
文摘The fusion of millimeter-wave radar and camera modalities is crucial for improving the accuracy and completeness of 3-dimensional(3D)object detection.Most existing methods extract features from each modality separately and conduct fusion with specifically designed modules,potentially resulting in information loss during modality transformation.To address this issue,we propose a novel framework for 3D object detection that iteratively updates radar and camera features through an interaction module.This module serves a dual purpose by facilitating the fusion of multi-modal data while preserving the original features.Specifically,radar and image features are sampled and aggregated with a set of sparse 3D object queries,while retaining the integrity of the original radar features to prevent information loss.Additionally,an innovative radar augmentation technique named Radar Gaussian Expansion is proposed.This module allocates radar measurements within each voxel to neighboring ones as a Gaussian distribution,reducing association errors during projection and enhancing detection accuracy.Our proposed framework offers a comprehensive solution to the fusion of radar and camera data,ultimately leading to heightened accuracy and completeness in 3D object detection processes.On the nuScenes test benchmark,our camera-radar fusion method achieves state-of-the-art 3D object detection results with a 41.6% mean average precision and 52.5% nuScenes detection score.