Bypassing AI Security Controls with Prompt Formatting
2025-07-01 , Room 2

In this talk we will present the prompt formatting technique, which we used to reliably bypass the Sensitive Information Filter functionality within Bedrock Guardrails, a service used to secure AI systems in AWS. Sensitive Information Filters are used by Guardrails to prevent Bedrock AI systems from returning sensitive information to users, such as Names and Email Addresses. By instructing the AI model to return data using programmatic, SQL-like queries, the returned data was modified sufficiently to bypass this security control, similar to WAF evasion. We have also developed a system prompt to help AWS customers mitigate this bypass, which we will discuss during the talk.

My name is Nathan Kirk, and I’m a Director at NR Labs (https://nrlabs.com/), a cybersecurity consulting startup. I have over a decade of experience with penetration testing, mostly focused on hardware and web applications. Before NR Labs, I was a Senior Consultant at Mandiant working with their Offensive Services division, and a Director at Hilton, where I helped build their penetration testing and Bug Bounty programs.