OpenAI’s new GPT-4V version supports image uploading, which brings a new attack path, making large language models (LLM) vulnerable to multi-modal injection image attacks. Attackers can embed commands, malicious scripts, and code in images, which the model then complies with.
Multimodal prompt injection image attacks can leak data, redirect queries, generate error messages, and execute more complex scripts to redefine how LLM interprets data. They can repurpose LLMs to ignore previously erected security guardrails and execute commands that could compromise the organization, posing threats ranging from fraud to operational sabotage.
All businesses that use LLM as part of their workflow face difficulties, but those that use LLM as the core of their business for image analysis and classification face the greatest risk. Attackers utilizing a variety of techniques can quickly change how images are interpreted and classified, leading to more confusing results. When LLM's prompts are overwritten, malicious commands and execution scripts are more likely to be ignored. Attackers can commit fraud and operational sabotage by embedding commands in a series of images uploaded to LLM, and can also facilitate social engineering attacks
Images are an attack vector that LLM cannot defend againstEnterprise-owned In the case of private LLM, least privilege access must be adopted as a core network security policy
Simon Willison recently explained in detail in a blog post why GPT-4V has become the main avenue for prompt injection attacks, and pointed out that LLM Fundamentally gullible. Blog post link: https://simonwillison.net/2023/Oct/14/multi-modal-prompt-injection/
Willison shows how to hijack autonomous artificial intelligence agents, such as Auto-GPT, through prompt injection. He explained in detail a simple visual hint injection example that started with embedding a command in a single image and gradually developed into a visual hint injection penetration attack. Paul Ekwere, senior manager of data analytics and artificial intelligence at BDO in the UK, said : "Injection attacks pose a serious threat to the security and reliability of LLM, especially to vision-based models that process images or videos. These models are widely used in areas such as face recognition, autonomous driving, medical diagnosis, and monitoring."
OpenAI currently does not have a solution for multi-modal prompt injection image attacks, and users and enterprises can only rely on themselves. A blog post on the NVIDIA developer website (https://developer.nvidia.com/blog/mitigating-stored-prompt-injection-attacks-against-llm-applications/) provides some suggestions, including for all data storage and The system enforces minimum privilege access
How the multi-modal prompt injection image attack worksWhat needs to be rewritten is: 2. Improve the platform architecture and separate user input from system logic
The purpose should be to eliminate the risk of user input directly affecting LLM code and data. Any image cues need to be handled so as not to impact internal logic or workflow.
Use a multi-stage processing workflow to identify malicious attacks
We can build a multi-stage process to catch image-based attacks early to better manage this threat
4. Customizing defense prompts to prevent jailbreaking
Jailbreaking is a common prompt engineering technique used to mislead LLM into performing illegal actions. Attaching prompts to malicious-looking image inputs helps protect LLM. . However, researchers warn that advanced attacks can still bypass this approach.
As more and more LLMs transition to multi-modal models, images have become the latest threat vector that attackers can rely on. Used to bypass and redefine protection measures. Image-based attacks vary in severity, from simple commands to more complex attack scenarios designed to cause industrial damage and spread widespread misinformation
This article is sourced from: https: //venturebeat.com/security/why-gpt-4-is-vulnerable-to-multimodal-prompt-injection-image-attacks/. For reprint, please indicate the source
The above is the detailed content of Why is GPT-4P vulnerable to multi-modal hint injection image attacks?. For more information, please follow other related articles on the PHP Chinese website!