2025-08-04 –, Firenze
Prompt injection remains one of the most critical and under-addressed vulnerabilities in LLM applications. Despite its growing impact, most developers still rely on ad hoc, manual methods to evaluate and secure system prompts, often missing subtle weaknesses that attackers can exploit. Prompt Hardener is an open source toolkit that automates the evaluation, hardening, and adversarial testing of system prompts using the LLM itself. It applies modern prompt hardening techniques such as spotlighting, signed prompts, rule reinforcement, and structured output to improve prompt resilience. The tool also performs injection testing with categorized payloads that simulate real world threats, including system prompt leaking and improper output handling based on OWASP Top 10 for LLM Applications 2025. It is mainly intended for use by LLM application developers and security engineers at business companies for evaluating, improving, and testing system prompts for their LLM applications. In this talk, we will also give a live demo of how to strengthen system prompts using the Prompt Hardener CLI mode and Web UI. Join us to learn how to strengthen your system prompts.
As LLMs become foundational components of modern applications, prompt security has emerged as a critical concern. Developers often rely on handcrafted system prompts without testing how they behave under adversarial conditions. While multiple techniques exist to harden prompts as part of a layered defense strategy, there is no unified way to apply and evaluate them systematically.
Prompt Hardener addresses this by automating both refinement and validation of system prompts. Using the LLM itself, it performs structured evaluations based on predefined criteria and applies improvements using layered security strategies:
- Spotlighting: Visually separates untrusted user inputs using tagging and encoding
- Signed prompt: Adds embedded markers to delineate trusted instructions
- Rule reinforcement: Repeats and reasserts behavioral boundaries
- Structured output: Enforces strict, parseable response formats
You can check the details of each hardening techniques from here.
After hardening, the tool performs automated injection testing with a corpus of categorized payloads that simulate common attack scenarios. These include prompt leaking, improper output handling, tool enumeration, and function call hijacking. These are basically based on OWASP Top 10 for LLM Applications 2025 but also including other modern attacks. The results are summarized in JSON and visualized in HTML reports, making it easy for LLM application developers and security engineer to measure resilience.
You can check the examples of using Prompt Hardener to improve and test various system prompts from here.
A simple Gradio UI allows non CLI users to access the full pipeline: input prompts, evaluate and harden them, and run attack simulations with just a few types and clicks.
By the end of this talk, attendees will understand how to:
- Identify prompt weaknesses before deployment
- Apply defense-in-depth techniques to prompts
- Validate the effectiveness of defenses with attack simulations
- Integrate prompt security testing into their CI pipelines or red team workflows
GitHub URL: https://github.com/cybozu/prompt-hardener