这是一则来自拉丁美洲危地马拉的消息。一对美貌孪生女子在重兵 把守的监狱“救”出了数十名罪犯,令人发指!当笔者正在纳闷:是何故让两个美女冒死“劫狱”?本文的第三段便泄露了“天机”:to free their kidnapper boyfriends who were s...这是一则来自拉丁美洲危地马拉的消息。一对美貌孪生女子在重兵 把守的监狱“救”出了数十名罪犯,令人发指!当笔者正在纳闷:是何故让两个美女冒死“劫狱”?本文的第三段便泄露了“天机”:to free their kidnapper boyfriends who were serving life sentences there.展开更多
In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become promine...In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become prominent.VLMs are vulnerable to jailbreak attacks,where attackers craft carefully designed prompts to bypass safety mechanisms,leading them to generate harmful content.To address this,we investigate the alignment between visual inputs and task execution,uncovering locality defects and attention biases in VLMs.Based on these findings,we propose VOTI,a novel jailbreak framework leveraging visual obfuscation and task induction.VOTI subtly embeds malicious keywords within neutral image layouts to evade detection,and breaks down harmful queries into a sequence of subtasks.This approach disperses malicious intent across modalities,exploiting VLMs’over-reliance on local visual cues and their fragility in multi-step reasoning to bypass global safety mechanisms.Implemented as an automated framework,VOTI integrates large language models as red-team assistants to generate and iteratively optimize jailbreak strategies.Extensive experiments across seven mainstream VLMs demonstrate VOTI’s effectiveness,achieving a 73.46%attack success rate on GPT-4o-mini.These results reveal critical vulnerabilities in VLMs,highlighting the urgent need for improving robust defenses and multimodal alignment.展开更多
This review paper explores advanced methods to prompt Large LanguageModels(LLMs)into generating objectionable or unintended behaviors through adversarial prompt injection attacks.We examine a series of novel projects ...This review paper explores advanced methods to prompt Large LanguageModels(LLMs)into generating objectionable or unintended behaviors through adversarial prompt injection attacks.We examine a series of novel projects like HOUYI,Robustly Aligned LLM(RA-LLM),StruQ,and Virtual Prompt Injection that compel LLMs to produce affirmative responses to harmful queries.Several new benchmarks,such as PromptBench,AdvBench,AttackEval,INJECAGENT,and Robustness Suite,have been created to evaluate the performance and resilience of LLMs against these adversarial attacks.Results show significant success rates in misleading models like Vicuna-7B,LLaMA-2-7B-Chat,GPT-3.5,and GPT-4.The review highlights limitations in existing defense mechanisms and proposes future directions for enhancing LLM alignment and safety protocols,including the concept of LLM SELF DEFENSE.Our study emphasizes the need for improved robustness in LLMs,which will potentially shape the future of Artificial Intelligence(AI)driven applications and security protocols.Understanding the vulnerabilities of LLMs is crucial for developing effective defenses against adversarial prompt injection attacks.This paper proposes a systemic classification framework that discusses various types of prompt injection attacks and defenses.We also go through a broad spectrum of stateof-the-art attack methods(such as HouYi and Virtual Prompt Injection)alongside advanced defense mechanisms(like RA-LLM,StruQ,and LLM Self-Defense),providing critical insights into vulnerabilities and robustness.We also integrate and compare results from multiple recent benchmarks,including PromptBench,INJECENT,and BIPIA.展开更多
The release of Apple’s iPhone was one of the most intensively publicized product releases in the history of mobile devices. While the iPhone wowed users with its exciting design and features, it also angered many for...The release of Apple’s iPhone was one of the most intensively publicized product releases in the history of mobile devices. While the iPhone wowed users with its exciting design and features, it also angered many for not allowing installation of third party applications and for working exclusively with AT & T wireless services (in the US). Besides the US, iPhone was only sold only in a few other selected countries. Software attacks were developed to overcome both limitations. The development of those attacks and further evaluation revealed several vulnerabilities in iPhone security. In this paper, we examine some of the attacks developed for the iPhone as a way of investigating the iPhone’s security structure. We also analyze the security holes that have been discovered and make suggestions for improving iPhone security.展开更多
文摘这是一则来自拉丁美洲危地马拉的消息。一对美貌孪生女子在重兵 把守的监狱“救”出了数十名罪犯,令人发指!当笔者正在纳闷:是何故让两个美女冒死“劫狱”?本文的第三段便泄露了“天机”:to free their kidnapper boyfriends who were serving life sentences there.
文摘In recent years,large vision-language models(VLMs)have achieved significant breakthroughs in cross-modal understanding and generation.However,the safety issues arising from their multimodal interactions become prominent.VLMs are vulnerable to jailbreak attacks,where attackers craft carefully designed prompts to bypass safety mechanisms,leading them to generate harmful content.To address this,we investigate the alignment between visual inputs and task execution,uncovering locality defects and attention biases in VLMs.Based on these findings,we propose VOTI,a novel jailbreak framework leveraging visual obfuscation and task induction.VOTI subtly embeds malicious keywords within neutral image layouts to evade detection,and breaks down harmful queries into a sequence of subtasks.This approach disperses malicious intent across modalities,exploiting VLMs’over-reliance on local visual cues and their fragility in multi-step reasoning to bypass global safety mechanisms.Implemented as an automated framework,VOTI integrates large language models as red-team assistants to generate and iteratively optimize jailbreak strategies.Extensive experiments across seven mainstream VLMs demonstrate VOTI’s effectiveness,achieving a 73.46%attack success rate on GPT-4o-mini.These results reveal critical vulnerabilities in VLMs,highlighting the urgent need for improving robust defenses and multimodal alignment.
文摘This review paper explores advanced methods to prompt Large LanguageModels(LLMs)into generating objectionable or unintended behaviors through adversarial prompt injection attacks.We examine a series of novel projects like HOUYI,Robustly Aligned LLM(RA-LLM),StruQ,and Virtual Prompt Injection that compel LLMs to produce affirmative responses to harmful queries.Several new benchmarks,such as PromptBench,AdvBench,AttackEval,INJECAGENT,and Robustness Suite,have been created to evaluate the performance and resilience of LLMs against these adversarial attacks.Results show significant success rates in misleading models like Vicuna-7B,LLaMA-2-7B-Chat,GPT-3.5,and GPT-4.The review highlights limitations in existing defense mechanisms and proposes future directions for enhancing LLM alignment and safety protocols,including the concept of LLM SELF DEFENSE.Our study emphasizes the need for improved robustness in LLMs,which will potentially shape the future of Artificial Intelligence(AI)driven applications and security protocols.Understanding the vulnerabilities of LLMs is crucial for developing effective defenses against adversarial prompt injection attacks.This paper proposes a systemic classification framework that discusses various types of prompt injection attacks and defenses.We also go through a broad spectrum of stateof-the-art attack methods(such as HouYi and Virtual Prompt Injection)alongside advanced defense mechanisms(like RA-LLM,StruQ,and LLM Self-Defense),providing critical insights into vulnerabilities and robustness.We also integrate and compare results from multiple recent benchmarks,including PromptBench,INJECENT,and BIPIA.
文摘The release of Apple’s iPhone was one of the most intensively publicized product releases in the history of mobile devices. While the iPhone wowed users with its exciting design and features, it also angered many for not allowing installation of third party applications and for working exclusively with AT & T wireless services (in the US). Besides the US, iPhone was only sold only in a few other selected countries. Software attacks were developed to overcome both limitations. The development of those attacks and further evaluation revealed several vulnerabilities in iPhone security. In this paper, we examine some of the attacks developed for the iPhone as a way of investigating the iPhone’s security structure. We also analyze the security holes that have been discovered and make suggestions for improving iPhone security.