基于空间感知增强VLM的自动驾驶轨迹规划

Autonomous Driving Trajectory Planning Based on Spatial Enhanced VLM

导出

摘要在智能驾驶系列任务中,使用视觉大语言模型(Vision Large Language Model,VLM)进行轨迹规划任务时面对的主要技术难题是:如何感知周围的世界并根据这些信息处理复杂的任务。现有开源视觉大语言模型在预训练阶段缺乏驾驶场景的空间先验,导致其对空间信息的理解能力显著不足,难以直接胜任轨迹规划任务。为此,提出一种“空间问答微调+鸟瞰图感知输入”双重增强的端到端轨迹规划框架:首先是第一重增强,即基于数据集的可用标注数据构建驾驶场景空间问答微调数据集,使2B参数的Qwen2-VL在障碍物类别辨识、相对距离及尺度估计方面获得显式空间先验;随后为第二重增强,即利用环视摄像头实时生成动态鸟瞰图(Bird Eye View,BEV),完成轻量级空间重建;最终,将鸟瞰图图像、原始环视帧及文本指令共同输入经LoRA微调的视觉大语言模型,以问答形式直接输出规范化轨迹。所提方法的有效性在nuScenes数据集和NAVSIM数据集上得到验证。研究结果表明:该方法在现实世界中具有优秀的轨迹规划能力,更符合真实人驾的驾驶习惯,具备多种场景的泛化能力。 In autonomous driving tasks,the main challenge faced by visual large language models is how to perceive the surrounding world and handle complex tasks.However,currently available open-source visual large language models have not been specifically trained during the pre-training stage,resulting in weak spatial understanding and perception capabilities,making them difficult to be directly applied to trajectory planning tasks.In this paper,a dual-enhanced end-to-end trajectory planning framework featuring“spatial question-answering fine-tuning+BEV perception input”is proposed to address the trajectory planning task.Firstly,the visual large language model is trained to recognize different obstacles and spatial messages encountered in autonomous driving based on the annotations of the dataset.Subsequently,bird eye view images are generated from the surround-view cameras to reconstruct spatial information.Finally,the bird eye view,surround-view cameras,and text prompt are input into the spatial enhanced visual large language model.The model is trained through question-and-answer pairs to obtain trajectory data in a standardized format.The effectiveness of the method was verified on the nuScenes dataset and the NAVSIM dataset in this paper.The test results demonstrate that this method has excellent trajectory planning capabilities in real-world scenarios,is more in line with the driving habits of real human drivers,and has generalization capabilities across multiple scenarios.

作者蒋正信牛铭奎韩佩伦高炳钊 JIANG Zheng-xin;NIU Ming-kui;HAN Pei-lun;GAO Bing-zhao(Shanghai Research Institute for Intelligent Autonomous Systems,Tongji University,Shanghai 201804,China;Zhejiang Founder Motor Co.,Ltd.,Shanghai 201806,China)

机构地区同济大学上海自主智能无人系统科学中心

出处《中国公路学报》 2026年第3期135-144,共10页 China Journal of Highway and Transport

基金国家自然科学基金项目(62373289) 上海汽车工业科技发展基金会项目(2407)。

关键词汽车工程智能驾驶视觉大语言模型空间感知增强鸟瞰图 automotive engineering autonomous driving vision large language model(VLM) spatial enhancement bird eye view

分类号 U461.91 [交通运输工程]

中国公路学报

2026年第3期

浏览历史

内容加载中请稍等...

基于空间感知增强VLM的自动驾驶轨迹规划

相关作者

相关机构

相关主题

浏览历史