Protecting and jailbreaking LLM systems: Insights gained from a wargame

Wednesday
 
27
 
November
10:30 am
 - 
11:10 am

Speakers

Pedram Hayati

Pedram Hayati

CEO & Founder
SecDim

Synopsis

LLMs currently struggle to consistently adhere to instructions in the presence of jailbreak (prompt injection) attacks. This poses a significant challenge to their utilisation in various applications. To assess the efficacy of defensive measures for AI applications against prompt injections, we conducted a novel research experiment.

We created an online wargame where participants were tasked with safeguarding their AI applications from revealing their secret phrases while simultaneously attempting to compromise other players' apps to extract theirs. In the event of a successful exfiltration of a secret phrase, the compromised app was removed from the game. The player then had an opportunity to enhance the security of their AI app and rejoin the competition.

Our experiment revealed that every app was exploited, indicating how the defensive strategies employed to protect the apps exhibited vulnerabilities. This supports our hypothesis that constructing a secure LLM app remains an ongoing challenge, partly due to a limited understanding of the underlying prompt injection mechanism.

This talk presents the outcomes of our ongoing attack and defence wargame and highlights the inherent challenges associated with securing LLM based apps.

Acknowledgement of Country

We acknowledge the traditional owners and custodians of country throughout Australia and acknowledge their continuing connection to land, waters and community. We pay our respects to the people, the cultures and the elders past, present and emerging.

Acknowledgement of Country