2025年12月5日 –, Main Stream 語言: English
Large Language Models are trained to produce helpful, safe answers—but what happens when someone tries to make them misbehave? Malicious users can manipulate prompts to generate unsafe content like hate speech or violent instructions. Every time we add an LLM to an app, we open up that risk.
In this talk, I’ll show how to use Python to red-team an LLM-powered app: simulating hundreds of bad actors to see how the system holds up. We’ll explore public datasets of adversarial prompts, and use the open-source pyrit package to obfuscate attacks with strategies like base-64 encoding and Caesar cipher. Finally, we’ll evaluate which attacks succeed, using another LLM to score the results.
Attendees will walk away with a practical understanding of how to stress-test their own apps and a toolkit for keeping them safer against trolls.
Pamela Fox is a human that loves to learn, teach, and create. She's currently a Cloud Advocate in Python at Microsoft, where she helps developers use Python with the many Azure offerings.
On the teaching front, Pamela has taught computer science at UC Berkeley and volunteered in bay area classrooms as part of the TEALS, GirlsWhoCode, and CoderDojo organizations. She also started the SF chapter of GirlDevelopIt, where she taught dozens of web development workshops.
Pamela's been in the tech industry for 15 years now, starting with her first role at Google as one of their first developer advocates. She went on to be an early full-stack engineer at Coursera and spent many years after at Khan Academy, both as an engineer and the creator of the computer programming content.
