Driver behavior is a critical factor in road safety,highlighting the need for advanced methods in Distracted riving lassification(DDC).In this study,we introduce DDC-Chat,a novel classification method based on a isual...Driver behavior is a critical factor in road safety,highlighting the need for advanced methods in Distracted riving lassification(DDC).In this study,we introduce DDC-Chat,a novel classification method based on a isual large anguageodel(VLM).DDC-Chat is an interactive multimodal system built upon LLAVA-Plus,fine-tuned specifically for addressing distracted driving detection.It utilizes logical reasoning chains to activate visual skills,including segmentation and pose detection,through end-to-end training.Furthermore,instruction tuning allows DDC-Chat to continuously incorporate new visual skills,enhancing its ability to classify distracted driving behavior.Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets,surpassing previous benchmarks.In evaluations on the 100-Driver dataset,the model exhibits superior results in both zero-shot and few-shot learning contexts,establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction.Due to the computational intensity of inference,DDC-Chat is optimized for deployment on remote servers,with data streamed from in-vehicle monitoring systems for real-time analysis.展开更多
This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts...This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts perturbation amplitude based on task complexity and optimizes the perturbation direction based on the gradient direction in real time to enhance attack efficiency.Experimental results demonstrate that AMA elevates attack success rates from approximately 78.95%to 89.56%on visual question answering and from78.82%to 84.96%on visual reasoning tasks across representative vision-language benchmarks.These findings demonstrate AMA’s superior attack efficiency and reveal the vulnerability of current visual language models to carefully crafted adversarial examples,underscoring the need to enhance their robustness.展开更多
XCD is a design-by-contract based architecture description language that supports modular specifications in terms of components and connectors (i.e., interaction protocols). XCD is supported by a translator that produ...XCD is a design-by-contract based architecture description language that supports modular specifications in terms of components and connectors (i.e., interaction protocols). XCD is supported by a translator that produces formal models in SPIN’s ProMeLa formal verification language, which can then be formally analysed using SPIN’s model checker. XCD is extended with a visual notation set called VXCD. VXCD extends UML’s component diagram and adapts it to XCD’s structure, contractual behaviour, and interaction protocol specifications. Visual VXCD specifications can be translated into textual XCD specifications for formal analysis. To illustrate VXCD, the well-known gas station system is used. The gas system is specified contractually using VXCD’s visual notation set and then formally analysed using SPIN’s model checker for a number of properties including deadlock and race-condition.展开更多
基金supported by the National Natural Science Foundation of China(62173253,52272374)the Research and Practice Project of New Engineering in Ordinary Undergraduate Universities in Guangxi Zhuang Autonomous Region(XGK202310)+1 种基金educational reform projects(JGT202302,JGKQ202309)the 2024 Guangxi Collegiate Innovation and Entrepreneurship Training Project"Eye-Smart Driving-Fatigue Driving Monitoring and Warning System Based on Computer Vision"(Project No.S202410595158).
文摘Driver behavior is a critical factor in road safety,highlighting the need for advanced methods in Distracted riving lassification(DDC).In this study,we introduce DDC-Chat,a novel classification method based on a isual large anguageodel(VLM).DDC-Chat is an interactive multimodal system built upon LLAVA-Plus,fine-tuned specifically for addressing distracted driving detection.It utilizes logical reasoning chains to activate visual skills,including segmentation and pose detection,through end-to-end training.Furthermore,instruction tuning allows DDC-Chat to continuously incorporate new visual skills,enhancing its ability to classify distracted driving behavior.Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets,surpassing previous benchmarks.In evaluations on the 100-Driver dataset,the model exhibits superior results in both zero-shot and few-shot learning contexts,establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction.Due to the computational intensity of inference,DDC-Chat is optimized for deployment on remote servers,with data streamed from in-vehicle monitoring systems for real-time analysis.
基金funded by the Natural Science Foundation of Jiangsu Province(Program BK20240699)National Natural Science Foundation of China(Program 62402228).
文摘This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts perturbation amplitude based on task complexity and optimizes the perturbation direction based on the gradient direction in real time to enhance attack efficiency.Experimental results demonstrate that AMA elevates attack success rates from approximately 78.95%to 89.56%on visual question answering and from78.82%to 84.96%on visual reasoning tasks across representative vision-language benchmarks.These findings demonstrate AMA’s superior attack efficiency and reveal the vulnerability of current visual language models to carefully crafted adversarial examples,underscoring the need to enhance their robustness.
文摘XCD is a design-by-contract based architecture description language that supports modular specifications in terms of components and connectors (i.e., interaction protocols). XCD is supported by a translator that produces formal models in SPIN’s ProMeLa formal verification language, which can then be formally analysed using SPIN’s model checker. XCD is extended with a visual notation set called VXCD. VXCD extends UML’s component diagram and adapts it to XCD’s structure, contractual behaviour, and interaction protocol specifications. Visual VXCD specifications can be translated into textual XCD specifications for formal analysis. To illustrate VXCD, the well-known gas station system is used. The gas system is specified contractually using VXCD’s visual notation set and then formally analysed using SPIN’s model checker for a number of properties including deadlock and race-condition.