期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
DDC-Chat:Achieving accurate distracted driver classification through instruction tuning of visual language model
1
作者 Chupei Liao Kuoyi Lin 《Journal of Safety Science and Resilience》 2025年第2期250-264,共15页
Driver behavior is a critical factor in road safety,highlighting the need for advanced methods in Distracted riving lassification(DDC).In this study,we introduce DDC-Chat,a novel classification method based on a isual... Driver behavior is a critical factor in road safety,highlighting the need for advanced methods in Distracted riving lassification(DDC).In this study,we introduce DDC-Chat,a novel classification method based on a isual large anguageodel(VLM).DDC-Chat is an interactive multimodal system built upon LLAVA-Plus,fine-tuned specifically for addressing distracted driving detection.It utilizes logical reasoning chains to activate visual skills,including segmentation and pose detection,through end-to-end training.Furthermore,instruction tuning allows DDC-Chat to continuously incorporate new visual skills,enhancing its ability to classify distracted driving behavior.Our extensive experiments demonstrate that DDC-Chat achieves state-of-the-art performance on public DDC datasets,surpassing previous benchmarks.In evaluations on the 100-Driver dataset,the model exhibits superior results in both zero-shot and few-shot learning contexts,establishing it as a valuable tool for improving driving safety by accurately identifying driver distraction.Due to the computational intensity of inference,DDC-Chat is optimized for deployment on remote servers,with data streamed from in-vehicle monitoring systems for real-time analysis. 展开更多
关键词 Classifying distracted driving visual language model LLAVA-plus Logical chain
原文传递
KnowBench:Evaluating the Knowledge Alignment on Large Visual Language Models
2
作者 Zheng Ma Hao-Tian Yang +1 位作者 Jian-Bing Zhang Jia-Jun Chen 《Journal of Computer Science & Technology》 2025年第5期1209-1219,共11页
Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fai... Large visual language models(LVLMs)have revolutionized the multimodal domain,demonstrating exceptional performance in tasks requiring fusing visual and textual information.However,the current evaluation benchmarks fail to adequately assess the knowledge alignment between images and text,focusing primarily on answer accuracy rather than the reasoning processes behind them.To address this gap and enhance the understanding of LVLMs’capabilities,we introduce KnowBench,a novel benchmark designed to assess the alignment of knowledge between images and text for LVLMs.KnowBench comprises 1081 image-question pairs,each with four options and four pieces of corresponding knowledge across 11 major categories.We evaluate mainstream LVLMs on KnowBench,including proprietary models like Gemini,Claude,and GPT,and open-source models like LLaVA,Qwen-VL,and InternVL.Our experiments reveal a notable discrepancy in the models’abilities to select correct answers and corresponding knowledge whether the models are opensource or proprietary.This indicates that there is still a significant gap in the current LVLMs’knowledge alignment between images and text.Furthermore,our further analysis shows that model performance on KnowBench improves with increased parameters and version iterations.This indicates that scaling laws have a significant impact on multimodal knowledge alignment,and the iteration of the model by researchers also has a positive effect.We anticipate that KnowBench will foster the development of LVLMs and motivate researchers to develop more reliable models.We have made our dataset publicly available at https://doi.org/10.57760/sciencedb.29672. 展开更多
关键词 large visual language model(LVLM) knowledge alignment image and text fusing evaluation benchmark
原文传递
AMA:Adaptive Multimodal Adversarial Attack with Dynamic Perturbation Optimization
3
作者 Yufei Shi Ziwen He +2 位作者 Teng Jin Haochen Tong Zhangjie Fu 《Computer Modeling in Engineering & Sciences》 2025年第8期1831-1848,共18页
This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts... This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts perturbation amplitude based on task complexity and optimizes the perturbation direction based on the gradient direction in real time to enhance attack efficiency.Experimental results demonstrate that AMA elevates attack success rates from approximately 78.95%to 89.56%on visual question answering and from78.82%to 84.96%on visual reasoning tasks across representative vision-language benchmarks.These findings demonstrate AMA’s superior attack efficiency and reveal the vulnerability of current visual language models to carefully crafted adversarial examples,underscoring the need to enhance their robustness. 展开更多
关键词 Adversarial attack visual language model black-box attack adaptive multimodal attack disturbance intensity
在线阅读 下载PDF
Visual Specification and Analysis of Contract-Based SoftwareArchitectures
4
作者 Mert Ozkaya 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第5期1025-1043,共19页
XCD is a design-by-contract based architecture description language that supports modular specifications in terms of components and connectors (i.e., interaction protocols). XCD is supported by a translator that produ... XCD is a design-by-contract based architecture description language that supports modular specifications in terms of components and connectors (i.e., interaction protocols). XCD is supported by a translator that produces formal models in SPIN’s ProMeLa formal verification language, which can then be formally analysed using SPIN’s model checker. XCD is extended with a visual notation set called VXCD. VXCD extends UML’s component diagram and adapts it to XCD’s structure, contractual behaviour, and interaction protocol specifications. Visual VXCD specifications can be translated into textual XCD specifications for formal analysis. To illustrate VXCD, the well-known gas station system is used. The gas system is specified contractually using VXCD’s visual notation set and then formally analysed using SPIN’s model checker for a number of properties including deadlock and race-condition. 展开更多
关键词 architectural language design-by-contract visual modelling language interaction protocol formal analysis
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部