The return of the jailbreak: Prompt injection 101

Tuesday
 
26
 
November
3:50 pm
 - 
4:30 pm

Speakers

Welbie Chan

Welbie Chan

AI Security Architect
Accenture

Synopsis

Large Language Models (LLM) have emerged as a new breed of enabling technology, comparable to the advent of the Mobile Phone or the Internet. With the ability to mimic human cognitive labour and problem solving, Generative AI is continually expanding its use cases to integrate technology in our lives more deeply than ever. Whilst LLMs have provided incredible utility, they harbor a critical vulnerability that demands the attention of every security professional and LLM developer - Prompt Injection.

This session will provide listeners with a practical understanding of how Prompt Injections are executed, the taxonomy behind these exploits, and how to prevent these risks in production systems.

The session will begin by establishing a baseline understanding of Large Language Models mechanics at relevant to the key concepts needed to understand Prompt Injection. A brief and diagrammatic explanation of prompts, vectors, transformers and LLMs architectures will be introduced.

Then, an introduction to Prompt Injection and its two attack vectors, Direct and Indirect Prompt Injection. Direct prompt injection describes directly inputting malicious prompts into LLM chat windows, whereas Indirect prompt injection employs malicious inputs from external systems, links or attachments.

A live demonstration of an active exploits will be run to illustrate the severity of Prompt Injection in live attacks (against demo environments for research purposes only). This demonstration will break down the taxonomy of the exploit and how it circumvents existing security controls in place. A segment with crowd participation to allow voting on pre-defined Jailbreak options will be included.

We will explore key security controls to reduce Prompt Injection impacting real world systems. Next-generation content filtering mechanisms like LLM firewalls that can monitor and block malicious inputs, outputs and files processed by AI systems with context-aware capabilities. Enhanced logging and monitoring to observe AI-human interactions. Designing AI systems with robust trust boundaries between users, system interfaces and environments.

To wrap up security controls, an overview of AI Red Teaming techniques will be provided. This will showcase open-source tools (i.e Microsoft PyRIT) that can automate adversarial testing to proactively detect security vulnerabilities during the LLM Operations Lifecycle.

Expanding beyond architecture and technical controls, we will also share anti-patterns for secure LLM design and development. This will illustrate the common but counterproductive solutions in LLM system design that introduce recurring vulnerabilities in LLMs. This will include practices like the ‘Mega Prompt’, excessive focus on a single, overly complex prompt to steer functionality and secure the LLM.

The final topic will close the session with an outlook on the current and future state of Prompt Injection risks in the real world to emphasise the severity of insecure Large Language Models.

The fight to secure Generative AI systems demands a proactive approach. Let's move beyond just detection. Integrate AI Red Teaming activities into your AI deployment lifecycle to proactively identify and address vulnerabilities. By anticipating threats, we can ensure our defences stay ahead of the curve.

Acknowledgement of Country

We acknowledge the traditional owners and custodians of country throughout Australia and acknowledge their continuing connection to land, waters and community. We pay our respects to the people, the cultures and the elders past, present and emerging.

Acknowledgement of Country