DriveMLM:aligning multi-modal large language models with behavioral planning states for autonomous driving

下载PDF

导出

摘要 Large language models(LLMs)have opened up new possibilities for intelligent agents,endowing them with human-like thinking and cognitive abilities.In this work,we delve into the potential of large language models(LLMs)in autonomous driving(AD).We introduce DriveMLM,an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators.To this end,(1)we bridge the gap between the language decisions and the vehicle control commands by standardizing the decision states according to the off-the-shelf motion planning module.(2)We employ a multimodal LLM(MLLM)to model the behavior planning module of a module AD system,which uses driving rules,user commands,and inputs from various sensors(e.g.,camera,LiDAR)as input and makes driving decisions and provide explanations.This model can plug-and-play in existing AD systems such as Autopilot and Apollo for close-loop driving.(3)We design an effective data engine to collect a dataset that includes decision state and corresponding explanation annotation for model training and evaluation.We conduct extensive experiments and show that replacing the decision-making modules of the Autopilot and Apollo with DriveMLM resulted in significant improvements of 3.2 and 4.7 points on the CARLA Town05 Long,respectively,demonstrating the effectiveness of our model.We hope this work can serve as a baseline for autonomous driving with LLMs.

作者 Erfei Cui Wenhai Wang Zhiqi Li Jiangwei Xie Haoming Zou Hanming Deng Gen Luo Lewei Lu Xizhou Zhu Jifeng Dai

机构地区 Shanghai Jiao Tong University

出处《Visual Intelligence》 2025年第1期350-364,共15页 视觉智能(英文)

基金 supported by the National Key R&D Program of China(No.2022ZD0161300) the National Natural Science Foundation of China(Nos.U24A20325,62321005 and 62376134).

关键词 Autonomous driving Multi-modal large language model Motion planning Closed-loop control

分类号 U463.6 [交通运输工程] TP18 [机械工程]

Visual Intelligence

2025年第1期

浏览历史

内容加载中请稍等...

DriveMLM:aligning multi-modal large language models with behavioral planning states for autonomous driving

相关作者

相关机构

相关主题

浏览历史