In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become promine...In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become prominent.VLMs are vulnerable to jailbreak attacks,where attackers craft carefully designed prompts to bypass safety mechanisms,leading them to generate harmful content.To address this,we investigate the alignment between visual inputs and task execution,uncovering locality defects and attention biases in VLMs.Based on these findings,we propose VOTI,a novel jailbreak framework leveraging visual obfuscation and task induction.VOTI subtly embeds malicious keywords within neutral image layouts to evade detection,and breaks down harmful queries into a sequence of subtasks.This approach disperses malicious intent across modalities,exploiting VLMs’over-reliance on local visual cues and their fragility in multi-step reasoning to bypass global safety mechanisms.Implemented as an automated framework,VOTI integrates large language models as red-team assistants to generate and iteratively optimize jailbreak strategies.Extensive experiments across seven mainstream VLMs demonstrate VOTI’s effectiveness,achieving a 73.46%attack success rate on GPT-4o-mini.These results reveal critical vulnerabilities in VLMs,highlighting the urgent need for improving robust defenses and multimodal alignment.展开更多
The widespread application of large language models(LLMs)has highlighted new security challenges and ethical concerns,attracting significant academic and societal attention.Analysis of the security vulnerabilities of ...The widespread application of large language models(LLMs)has highlighted new security challenges and ethical concerns,attracting significant academic and societal attention.Analysis of the security vulnerabilities of LLMs and their misuse in cybercrime reveals that their advanced text-generation capabilities pose serious threats to personal privacy,data security,and information integrity.In addition,the effectiveness of current LLM-based defense strategies has been reviewed and evaluated.This paper examines the social implications of LLMs and proposes future directions for enhancing their security applications and ethical governance,aiming to inform the development of the field.展开更多
文摘In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become prominent.VLMs are vulnerable to jailbreak attacks,where attackers craft carefully designed prompts to bypass safety mechanisms,leading them to generate harmful content.To address this,we investigate the alignment between visual inputs and task execution,uncovering locality defects and attention biases in VLMs.Based on these findings,we propose VOTI,a novel jailbreak framework leveraging visual obfuscation and task induction.VOTI subtly embeds malicious keywords within neutral image layouts to evade detection,and breaks down harmful queries into a sequence of subtasks.This approach disperses malicious intent across modalities,exploiting VLMs’over-reliance on local visual cues and their fragility in multi-step reasoning to bypass global safety mechanisms.Implemented as an automated framework,VOTI integrates large language models as red-team assistants to generate and iteratively optimize jailbreak strategies.Extensive experiments across seven mainstream VLMs demonstrate VOTI’s effectiveness,achieving a 73.46%attack success rate on GPT-4o-mini.These results reveal critical vulnerabilities in VLMs,highlighting the urgent need for improving robust defenses and multimodal alignment.
基金Beijing Key Laboratory of Behavior and Mental Health,Peking University,China.
文摘The widespread application of large language models(LLMs)has highlighted new security challenges and ethical concerns,attracting significant academic and societal attention.Analysis of the security vulnerabilities of LLMs and their misuse in cybercrime reveals that their advanced text-generation capabilities pose serious threats to personal privacy,data security,and information integrity.In addition,the effectiveness of current LLM-based defense strategies has been reviewed and evaluated.This paper examines the social implications of LLMs and proposes future directions for enhancing their security applications and ethical governance,aiming to inform the development of the field.