Speakers
Synopsis
LLMs currently struggle to consistently adhere to instructions in the presence of jailbreak (prompt injection) attacks. This poses a significant challenge to their utilisation in various applications. To assess the efficacy of defensive measures for AI applications against prompt injections, we conducted a novel research experiment.
We created an online wargame where participants were tasked with safeguarding their AI applications from revealing their secret phrases while simultaneously attempting to compromise other players' apps to extract theirs. In the event of a successful exfiltration of a secret phrase, the compromised app was removed from the game. The player then had an opportunity to enhance the security of their AI app and rejoin the competition.
Our experiment revealed that every app was exploited, indicating how the defensive strategies employed to protect the apps exhibited vulnerabilities. This supports our hypothesis that constructing a secure LLM app remains an ongoing challenge, partly due to a limited understanding of the underlying prompt injection mechanism.
This talk presents the outcomes of our ongoing attack and defence wargame and highlights the inherent challenges associated with securing LLM based apps.