The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To...The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To address these challenges,this paper proposes an enhanced 3D object detection framework(FastSECOND)based on an optimized SECOND architecture,designed to achieve rapid and accurate perception in autonomous driving scenarios.Key innovations include:(1)Replacing the Rectified Linear Unit(ReLU)activation functions with the Gaussian Error Linear Unit(GELU)during voxel feature encoding and region proposal network stages,leveraging partial convolution to balance computational efficiency and detection accuracy;(2)Integrating a Swin-Transformer V2 module into the voxel backbone network to enhance feature extraction capabilities in sparse data;and(3)Introducing an optimized position regression loss combined with a geometry-aware Focal-EIoU loss function,which incorporates bounding box geometric correlations to accelerate network convergence.While this study currently focuses exclusively on the detection of the Car category,with experiments conducted on the Car class of the KITTI dataset,future work will extend to other categories such as Pedestrian and Cyclist to more comprehensively evaluate the generalization capability of the proposed framework.Extensive experimental results demonstrate that our framework achieves a more effective trade-off between detection accuracy and speed.Compared to the baseline SECOND model,it achieves a 21.9%relative improvement in 3D bounding box detection accuracy on the hard subset,while reducing inference time by 14 ms.These advancements underscore the framework’s potential for enabling real-time,high-precision perception in autonomous driving applications.展开更多
3D pose transfer over unorganized point clouds is a challenging generation task,which transfers a source’s pose to a target shape and keeps the target’s identity.Recent deep models have learned deformations and used...3D pose transfer over unorganized point clouds is a challenging generation task,which transfers a source’s pose to a target shape and keeps the target’s identity.Recent deep models have learned deformations and used the target’s identity as a style to modulate the combined features of two shapes or the aligned vertices of the source shape.However,all operations in these models are point-wise and independent and ignore the geometric information on the surface and structure of the input shapes.This disadvantage severely limits the generation and generalization capabilities.In this study,we propose a geometry-aware method based on a novel transformer autoencoder to solve this problem.An efficient self-attention mechanism,that is,cross-covariance attention,was utilized across our framework to perceive the correlations between points at different distances.Specifically,the transformer encoder extracts the target shape’s local geometry details for identity attributes and the source shape’s global geometry structure for pose information.Our transformer decoder efficiently learns deformations and recovers identity properties by fusing and decoding the extracted features in a geometry attentional manner,which does not require corresponding information or modulation steps.The experiments demonstrated that the geometry-aware method achieved state-of-the-art performance in a 3D pose transfer task.The implementation code and data are available at https://github.com/SEULSH/Geometry-Aware-3D-Pose-Transfer-Using-Transfor mer-Autoencoder.展开更多
基金funded by the National Key R&D Program of China(Grant No.2022YFB4703400)the China Three Gorges Corporation(Grant No.2324020012)+2 种基金the National Natural Science Foundation of China(Grant No.62476080)the Jiangsu Province Natural Science Foundation(Grant No.BK20231186)Key Laboratory about Maritime Intelligent Network Information Technology of the Ministry of Education(EKLMIC202405).
文摘The inherent limitations of 2D object detection,such as inadequate spatial reasoning and susceptibility to environmental occlusions,pose significant risks to the safety and reliability of autonomous driving systems.To address these challenges,this paper proposes an enhanced 3D object detection framework(FastSECOND)based on an optimized SECOND architecture,designed to achieve rapid and accurate perception in autonomous driving scenarios.Key innovations include:(1)Replacing the Rectified Linear Unit(ReLU)activation functions with the Gaussian Error Linear Unit(GELU)during voxel feature encoding and region proposal network stages,leveraging partial convolution to balance computational efficiency and detection accuracy;(2)Integrating a Swin-Transformer V2 module into the voxel backbone network to enhance feature extraction capabilities in sparse data;and(3)Introducing an optimized position regression loss combined with a geometry-aware Focal-EIoU loss function,which incorporates bounding box geometric correlations to accelerate network convergence.While this study currently focuses exclusively on the detection of the Car category,with experiments conducted on the Car class of the KITTI dataset,future work will extend to other categories such as Pedestrian and Cyclist to more comprehensively evaluate the generalization capability of the proposed framework.Extensive experimental results demonstrate that our framework achieves a more effective trade-off between detection accuracy and speed.Compared to the baseline SECOND model,it achieves a 21.9%relative improvement in 3D bounding box detection accuracy on the hard subset,while reducing inference time by 14 ms.These advancements underscore the framework’s potential for enabling real-time,high-precision perception in autonomous driving applications.
基金supported by the Special Project on Basic Research of Frontier Leading Technology of Jiangsu Province,China(Grant No.BK20192004C).
文摘3D pose transfer over unorganized point clouds is a challenging generation task,which transfers a source’s pose to a target shape and keeps the target’s identity.Recent deep models have learned deformations and used the target’s identity as a style to modulate the combined features of two shapes or the aligned vertices of the source shape.However,all operations in these models are point-wise and independent and ignore the geometric information on the surface and structure of the input shapes.This disadvantage severely limits the generation and generalization capabilities.In this study,we propose a geometry-aware method based on a novel transformer autoencoder to solve this problem.An efficient self-attention mechanism,that is,cross-covariance attention,was utilized across our framework to perceive the correlations between points at different distances.Specifically,the transformer encoder extracts the target shape’s local geometry details for identity attributes and the source shape’s global geometry structure for pose information.Our transformer decoder efficiently learns deformations and recovers identity properties by fusing and decoding the extracted features in a geometry attentional manner,which does not require corresponding information or modulation steps.The experiments demonstrated that the geometry-aware method achieved state-of-the-art performance in a 3D pose transfer task.The implementation code and data are available at https://github.com/SEULSH/Geometry-Aware-3D-Pose-Transfer-Using-Transfor mer-Autoencoder.