This paper investigates the potential of Vision-Language Models(VLMs)to enhance Human–Vehicle Interaction(HVI)in Autonomous Driving(AD)scenarios,particularly in interactions between vehicles and other traffic partici...This paper investigates the potential of Vision-Language Models(VLMs)to enhance Human–Vehicle Interaction(HVI)in Autonomous Driving(AD)scenarios,particularly in interactions between vehicles and other traffic participants,with a focus on rationality and safety in external HVI.Leveraging recent advancements in large language models,VLMs demonstrate remarkable capabilities in understanding real-world contexts and generating significant interest in HVI applications.This paper provides an overview of AD,HVI,and VLMs,along with the historical context of large language model applications in HVI.The HVI discussed herein involves dynamic game processes encompassing perception and decision-making between vehicles and traffic participants,such as pedestrians.Furthermore,we examine the perceptual challenges associated with applying VLMs to HVI and compile relevant datasets.This research fills a gap in the existing literature by systematically analyzing the current status,challenges,and future opportunities of VLM applications in HVI.To advance VLM integration in AD,various implementation strategies are discussed.The findings highlight the potential of VLMs to transform HVI in AD,improving both passenger experience and driving safety.Overall,this study contributes to a comprehensive understanding of VLM applications in HVI and provides insights to guide future research and development.展开更多
基金supported by the Shanghai Municipal Science and Technology Major Project(No.2021SHZDZX0100)the National Natural Science Foundation of China(No.62088101)+1 种基金the Fundamental Research Funds for the Central Universities(No.22120220642)the Opening Project of the State Key Laboratory of Autonomous Intelligent Unmanned Systems(No.ZZKF2025-2-3).
文摘This paper investigates the potential of Vision-Language Models(VLMs)to enhance Human–Vehicle Interaction(HVI)in Autonomous Driving(AD)scenarios,particularly in interactions between vehicles and other traffic participants,with a focus on rationality and safety in external HVI.Leveraging recent advancements in large language models,VLMs demonstrate remarkable capabilities in understanding real-world contexts and generating significant interest in HVI applications.This paper provides an overview of AD,HVI,and VLMs,along with the historical context of large language model applications in HVI.The HVI discussed herein involves dynamic game processes encompassing perception and decision-making between vehicles and traffic participants,such as pedestrians.Furthermore,we examine the perceptual challenges associated with applying VLMs to HVI and compile relevant datasets.This research fills a gap in the existing literature by systematically analyzing the current status,challenges,and future opportunities of VLM applications in HVI.To advance VLM integration in AD,various implementation strategies are discussed.The findings highlight the potential of VLMs to transform HVI in AD,improving both passenger experience and driving safety.Overall,this study contributes to a comprehensive understanding of VLM applications in HVI and provides insights to guide future research and development.