As a cornerstone for applications such as autonomous driving,3D urban perception is a burgeoning field of study.Enhancing the performance and robustness of these perception systems is crucial for ensuring the safety o...As a cornerstone for applications such as autonomous driving,3D urban perception is a burgeoning field of study.Enhancing the performance and robustness of these perception systems is crucial for ensuring the safety of next-generation autonomous vehicles.In this work,we introduce a novel neural scene representation called Street Detection Gaussians(SDGs),which redefines urban 3D perception through an integrated architecture unifying reconstruction and detection.At its core lies the dynamic Gaussian representation,where time-conditioned parameterization enables simultaneous modeling of static environments and dynamic objects through physically constrained Gaussian evolution.The framework’s radar-enhanced perception module learns cross-modal correlations between sparse radardata anddense visual features,resulting ina22%reduction inocclusionerrors compared tovisiononly systems.A breakthrough differentiable rendering pipeline back-propagates semantic detection losses throughout the entire 3D reconstruction process,enabling the optimization of both geometric and semantic fidelity.Evaluated on the Waymo Open Dataset and the KITTI Dataset,the system achieves real-time performance(135 Frames Per Second(FPS)),photorealistic quality(Peak Signal-to-Noise Ratio(PSNR)34.9 dB),and state-of-the-art detection accuracy(78.1%Mean Average Precision(mAP)),demonstrating a 3.8×end-to-end improvement over existing hybrid approaches while enabling seamless integration with autonomous driving stacks.展开更多
Intelligent Transportation Systems(ITS)represent a cornerstone in modern traffic management,leveraging surveillance cameras as primary visual sensors to monitor road conditions.However,the fixed characteristics of pub...Intelligent Transportation Systems(ITS)represent a cornerstone in modern traffic management,leveraging surveillance cameras as primary visual sensors to monitor road conditions.However,the fixed characteristics of public surveillance cameras,coupled with inherent image resolution limitations,pose significant challenges for Small ObjectDetection(SOD)in traffic surveillance.To address these challenges,this paper proposes Ghost-Attention YOLO(GA-YOLO),a lightweight model derived from YOLOv8 and specifically designed for traffic SOD.To enhance the attention of small targets and critical features,a novel channel-spatial attentionmechanism,termed Small-object Extend Attention(SEA),is introduced.In addition,the original C2fmodule is replaced with a more efficient Cross-Stage Partial(CSP)module,C3k2,to achieve improved feature processing with lower cost.Building upon these designs,a CSP-based Ghost Bottleneck with Attention(CGBA)module is further developed by integrating SEA into C3k2 and is deployed within the FPN–PAN network to strengthen feature extraction and fusion.Compared with similar-scale baseline modelsYOLOv8n andYOLOv11n,GA-YOLOdemonstrates clear performance advantages on theUA-DETRACdataset.Specifically,GA-YOLOachieves over 3%improvements in precision and mAP@50,along with a 5.6%gain inmAP@50-95,while reducing the parameter count by nearly 10%and computational complexity by 0.5 GFLOPS compared with YOLOv8n.In addition,GA-YOLO outperforms YOLOv11n by 8.6%in precision and 3.2%in mAP@50-95.These results indicate that GA-YOLO effectively balances detection accuracy and computational efficiency.Furthermore,additional evaluations across varying occlusion levels and representative detection models indicate the effectiveness and practicality of GA-YOLOfor traffic-oriented SODtasks.展开更多
文摘As a cornerstone for applications such as autonomous driving,3D urban perception is a burgeoning field of study.Enhancing the performance and robustness of these perception systems is crucial for ensuring the safety of next-generation autonomous vehicles.In this work,we introduce a novel neural scene representation called Street Detection Gaussians(SDGs),which redefines urban 3D perception through an integrated architecture unifying reconstruction and detection.At its core lies the dynamic Gaussian representation,where time-conditioned parameterization enables simultaneous modeling of static environments and dynamic objects through physically constrained Gaussian evolution.The framework’s radar-enhanced perception module learns cross-modal correlations between sparse radardata anddense visual features,resulting ina22%reduction inocclusionerrors compared tovisiononly systems.A breakthrough differentiable rendering pipeline back-propagates semantic detection losses throughout the entire 3D reconstruction process,enabling the optimization of both geometric and semantic fidelity.Evaluated on the Waymo Open Dataset and the KITTI Dataset,the system achieves real-time performance(135 Frames Per Second(FPS)),photorealistic quality(Peak Signal-to-Noise Ratio(PSNR)34.9 dB),and state-of-the-art detection accuracy(78.1%Mean Average Precision(mAP)),demonstrating a 3.8×end-to-end improvement over existing hybrid approaches while enabling seamless integration with autonomous driving stacks.
文摘Intelligent Transportation Systems(ITS)represent a cornerstone in modern traffic management,leveraging surveillance cameras as primary visual sensors to monitor road conditions.However,the fixed characteristics of public surveillance cameras,coupled with inherent image resolution limitations,pose significant challenges for Small ObjectDetection(SOD)in traffic surveillance.To address these challenges,this paper proposes Ghost-Attention YOLO(GA-YOLO),a lightweight model derived from YOLOv8 and specifically designed for traffic SOD.To enhance the attention of small targets and critical features,a novel channel-spatial attentionmechanism,termed Small-object Extend Attention(SEA),is introduced.In addition,the original C2fmodule is replaced with a more efficient Cross-Stage Partial(CSP)module,C3k2,to achieve improved feature processing with lower cost.Building upon these designs,a CSP-based Ghost Bottleneck with Attention(CGBA)module is further developed by integrating SEA into C3k2 and is deployed within the FPN–PAN network to strengthen feature extraction and fusion.Compared with similar-scale baseline modelsYOLOv8n andYOLOv11n,GA-YOLOdemonstrates clear performance advantages on theUA-DETRACdataset.Specifically,GA-YOLOachieves over 3%improvements in precision and mAP@50,along with a 5.6%gain inmAP@50-95,while reducing the parameter count by nearly 10%and computational complexity by 0.5 GFLOPS compared with YOLOv8n.In addition,GA-YOLO outperforms YOLOv11n by 8.6%in precision and 3.2%in mAP@50-95.These results indicate that GA-YOLO effectively balances detection accuracy and computational efficiency.Furthermore,additional evaluations across varying occlusion levels and representative detection models indicate the effectiveness and practicality of GA-YOLOfor traffic-oriented SODtasks.