As a cornerstone for applications such as autonomous driving,3D urban perception is a burgeoning field of study.Enhancing the performance and robustness of these perception systems is crucial for ensuring the safety o...As a cornerstone for applications such as autonomous driving,3D urban perception is a burgeoning field of study.Enhancing the performance and robustness of these perception systems is crucial for ensuring the safety of next-generation autonomous vehicles.In this work,we introduce a novel neural scene representation called Street Detection Gaussians(SDGs),which redefines urban 3D perception through an integrated architecture unifying reconstruction and detection.At its core lies the dynamic Gaussian representation,where time-conditioned parameterization enables simultaneous modeling of static environments and dynamic objects through physically constrained Gaussian evolution.The framework’s radar-enhanced perception module learns cross-modal correlations between sparse radardata anddense visual features,resulting ina22%reduction inocclusionerrors compared tovisiononly systems.A breakthrough differentiable rendering pipeline back-propagates semantic detection losses throughout the entire 3D reconstruction process,enabling the optimization of both geometric and semantic fidelity.Evaluated on the Waymo Open Dataset and the KITTI Dataset,the system achieves real-time performance(135 Frames Per Second(FPS)),photorealistic quality(Peak Signal-to-Noise Ratio(PSNR)34.9 dB),and state-of-the-art detection accuracy(78.1%Mean Average Precision(mAP)),demonstrating a 3.8×end-to-end improvement over existing hybrid approaches while enabling seamless integration with autonomous driving stacks.展开更多
文摘As a cornerstone for applications such as autonomous driving,3D urban perception is a burgeoning field of study.Enhancing the performance and robustness of these perception systems is crucial for ensuring the safety of next-generation autonomous vehicles.In this work,we introduce a novel neural scene representation called Street Detection Gaussians(SDGs),which redefines urban 3D perception through an integrated architecture unifying reconstruction and detection.At its core lies the dynamic Gaussian representation,where time-conditioned parameterization enables simultaneous modeling of static environments and dynamic objects through physically constrained Gaussian evolution.The framework’s radar-enhanced perception module learns cross-modal correlations between sparse radardata anddense visual features,resulting ina22%reduction inocclusionerrors compared tovisiononly systems.A breakthrough differentiable rendering pipeline back-propagates semantic detection losses throughout the entire 3D reconstruction process,enabling the optimization of both geometric and semantic fidelity.Evaluated on the Waymo Open Dataset and the KITTI Dataset,the system achieves real-time performance(135 Frames Per Second(FPS)),photorealistic quality(Peak Signal-to-Noise Ratio(PSNR)34.9 dB),and state-of-the-art detection accuracy(78.1%Mean Average Precision(mAP)),demonstrating a 3.8×end-to-end improvement over existing hybrid approaches while enabling seamless integration with autonomous driving stacks.