AI Prompt Injection Exploration

In this project, students explore how images can be used to override the prompt instructions in an AI language model.

Medium

1 Hour

High School

Middle School

Project Description

In this project, you’ll learn about “Prompt Injections” and how they can trick AI language models into doing things they shouldn’t, like leaking data or performing unauthorized actions. We’ll talk about ways to make AI interactions safer, such as checking inputs carefully and keeping AI models up-to-date to recognize harmful prompts.

So what is prompt injection? The Open Worldwide Application Security Project Foundation has put out the following description, common vulnerabilities, preventions, and scenarios:

Description:
Prompt injections involve bypassing filters or manipulating an AI language model using carefully crafted prompts that make the model ignore previous instructions or perform unintended actions. These vulnerabilities can lead to unintended consequences, including data leakage, unauthorized access, or other security breaches.

Common Prompt Injection Vulnerabilities:
- Crafting prompts that manipulate the AI language model into revealing sensitive information.
- Bypassing filters or restrictions by using specific language patterns or tokens.
- Exploiting weaknesses in the AI language model’s tokenization or encoding mechanisms.
- Misleading the AI language model to perform unintended actions by providing misleading context.

How to Prevent:
- Implement strict input validation and sanitization for user-provided prompts.
- Use context-aware filtering and output encoding to prevent prompt manipulation.
- Regularly update and fine-tune the AI language model to improve its understanding of malicious inputs and edge cases.
- Monitor and log AI language model interactions to detect and analyze potential prompt injection attempts.

Example Attack Scenarios:
- Scenario #1: An attacker crafts a prompt that tricks the AI language model into revealing sensitive information, such as user credentials or internal system details, by making the model think the request is legitimate.
- Scenario #2: A malicious user bypasses a content filter by using specific language patterns, tokens, or encoding mechanisms that the AI language model fails to recognize as restricted content, allowing the user to perform actions that should be blocked.

source: OWASP

In this project, we’ll specifically be looking at how images can be used to manipulate the responses given by Google’s Gemini.

Project Overview

Here is an outline of the project activities:

What are Prompt Injections?

Image Prompt Injection

Prompt Injection Reflection

AI Prompt Injection Exploration

Project Description

Project Overview

Products

Use Cases

Platform

Curriculum

PD

Programming Languages

Resources

Company

Products

Platform

PD

Resources

Use Cases

Curriculum

Programming Languages

Company

Products

Computer Science Curriculum

Customizable K-12 Computer Science Curriculum

AI Prompt Injection Exploration

Project Description

Project Overview