Prompt Hardener - Automatically Evaluating and Securing LLM System Prompts :: Security BSides Las Vegas 2025

Prompt Hardener - Automatically Evaluating and Securing LLM System Prompts
.ical

2025-08-04 15:00–15:25, Firenze

Prompt injection remains one of the most critical and under-addressed vulnerabilities in LLM applications. Despite its growing impact, most developers still rely on ad hoc, manual methods to evaluate and secure system prompts, often missing subtle weaknesses that attackers can exploit. Prompt Hardener is an open source toolkit that automates the evaluation, hardening, and adversarial testing of system prompts using the LLM itself. It applies modern prompt hardening techniques such as spotlighting, random sequence enclosure, instruction defense, and role consistency to improve prompt resilience. The tool also performs injection testing with categorized payloads that simulate real world threats, including system prompt leaking and improper output handling based on OWASP Top 10 for LLM Applications 2025. It is mainly intended for use by LLM application developers and security engineers at business companies for evaluating, improving, and testing system prompts for their LLM applications. In this talk, we will also give a live demo of how to strengthen system prompts using the Prompt Hardener CLI mode and Web UI. Join us to learn how to strengthen your system prompts.

As LLMs become foundational components of modern applications, prompt security has emerged as a critical concern. Developers often rely on handcrafted system prompts without testing how they behave under adversarial conditions. While multiple techniques exist to harden prompts as part of a layered defense strategy, there is no unified way to apply and evaluate them systematically.

Prompt Hardener addresses this by automating both refinement and validation of system prompts. Using the LLM itself, it performs structured evaluations based on predefined criteria and applies improvements using layered security strategies:

Spotlighting: Explicitly marks and isolates all user-controlled input using tags and special characters to prevent injection
Random Sequence Enclosure: Encloses trusted system instructions in unpredictable tags, ensuring only those are followed and not leaked
Instruction Defense: Instructs the model to ignore new instructions, persona switching, or attempts to reveal/modify system prompts
Role Consistency: Ensures each message role (system, user, assistant) is preserved and not mixed, preventing role confusion attacks

You can check the details of each hardening techniques from here.

After hardening, the tool performs automated injection testing with a corpus of categorized payloads that simulate common attack scenarios. These include prompt leaking, improper output handling, tool enumeration, and function call hijacking. These are basically based on OWASP Top 10 for LLM Applications 2025 but also including other modern attacks. The results are summarized in JSON and visualized in HTML reports, making it easy for LLM application developers and security engineer to measure resilience.

You can check the examples of using Prompt Hardener to improve and test various system prompts from here.

A simple Gradio UI allows non CLI users to access the full pipeline: input prompts, evaluate and harden them, and run attack simulations with just a few types and clicks.

By the end of this talk, attendees will understand how to:

Identify prompt weaknesses before deployment
Apply defense-in-depth techniques to prompts
Validate the effectiveness of defenses with attack simulations
Integrate prompt security testing into their CI pipelines or red team workflows

GitHub URL: https://github.com/cybozu/prompt-hardener

Krity Kharbanda

Krity is a dedicated cybersecurity professional with a strong foundation in application security, data analysis, and machine learning. As an Application Security Engineer at ServiceNow, she leverages her diverse experience and research background to enhance security practices. Beyond her technical role, Krity serves as the Community & Development Lead at Breaking Barriers Women in Cybersecurity (BBWIC), a nonprofit dedicated to empowering women in the field. Her work reflects a deep commitment to both advancing cybersecurity and fostering inclusive community growth, making her a passionate advocate for innovation, collaboration, and leadership in the industry.

Junki Yuasa

Junki Yuasa (@melonattacker) is a security engineer at Cybozu, Inc., specializing in vulnerability assessment and threat analysis. In recent years, he has focused on AI security, developing security tools and conducting bug hunting for LLM applications. He is also a member of the SECCON Beginners organizing team.

Yoshiki Kitamura

Yoshiki Kitamura is a security engineer at Cybozu, Inc., where he focuses on web security and designing optimal security frameworks for the organization. He is also a member of the internal PSIRT (Product Security Incident Response Team), conducting vulnerability testings and handling security issues to ensure the safety and reliability of Cybozu’s services.

Prompt Hardener - Automatically Evaluating and Securing LLM System Prompts .ical 2025-08-04 15:00–15:25, Firenze

Prompt Hardener - Automatically Evaluating and Securing LLM System Prompts
.ical

2025-08-04 15:00–15:25, Firenze