This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts...This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts perturbation amplitude based on task complexity and optimizes the perturbation direction based on the gradient direction in real time to enhance attack efficiency.Experimental results demonstrate that AMA elevates attack success rates from approximately 78.95%to 89.56%on visual question answering and from78.82%to 84.96%on visual reasoning tasks across representative vision-language benchmarks.These findings demonstrate AMA’s superior attack efficiency and reveal the vulnerability of current visual language models to carefully crafted adversarial examples,underscoring the need to enhance their robustness.展开更多
基金funded by the Natural Science Foundation of Jiangsu Province(Program BK20240699)National Natural Science Foundation of China(Program 62402228).
文摘This article proposes an innovative adversarial attack method,AMA(Adaptive Multimodal Attack),which introduces an adaptive feedback mechanism by dynamically adjusting the perturbation strength.Specifically,AMA adjusts perturbation amplitude based on task complexity and optimizes the perturbation direction based on the gradient direction in real time to enhance attack efficiency.Experimental results demonstrate that AMA elevates attack success rates from approximately 78.95%to 89.56%on visual question answering and from78.82%to 84.96%on visual reasoning tasks across representative vision-language benchmarks.These findings demonstrate AMA’s superior attack efficiency and reveal the vulnerability of current visual language models to carefully crafted adversarial examples,underscoring the need to enhance their robustness.