BSides Cape Town 2025

Weaponizing AI for Red Teaming
2025-12-06 , Track 3

Traditional red team operations depend on rigid scripts and brittle automation, forcing operators to spend more time managing tools than testing strategy. This talk explores how Large Language Model (LLM) driven agents transform red teaming by combining reasoning, memory, and tool execution. We’ll show how agents can adapt in real-time scanning, interpreting results, pulling live vulnerability intel, and autonomously exploiting targets. Attendees will see a live demonstration of an AI-powered red team agent completing a full attack chain, and walk away with a clear view of the opportunities of AI in offensive security.


The outline with a better formatting and demo are visible here https://tryhackme.notion.site/Weaponizing-AI-for-Red-Teaming-256abddf65c780de9dfdd2c2c1fd5730

  1. LLM & Agent Fundamentals

What is an LLM? A neural model trained to predict text. Strong at language tasks but limited since it cannot execute tools or access live data.
From LLM to Agent: By adding reasoning loops (ReAct pattern), tool calls, and memory, an LLM becomes an agent that can solve problems step by step.
Agent features for red teaming:
Task distribution (separate agents for recon, exploitation, reporting).
Shared memory (results passed between agents).
Adaptive planning (choosing the next tool based on prior results).
Why ChatGPT/CoPilot alone falls short: Cannot run tools, cannot save state, has outdated CVE knowledge, and can generate false commands.

  1. LangChain & Multi-Agent Ecosystem

LangChain as orchestrator: Connects prompts, tools, and memory, enabling agents to act.
LangGraph for more advanced workflows:
Handles conditional flows (retry on failure, branch logic).
Supports persistent agents with state across sessions.
Allows loops, such as an exploit agent retrying until success or exhaustion.
Agents aligned with red team phases:
Recon Agent: runs scans (Nmap), parses output.
Vulnerability Agent: checks services against CVEs (Nuclei or RAG).
Exploitation Agent: selects and runs Metasploit or payloads.
Reporting Agent: converts logs into human-readable findings.
Shared memory/state: passes discoveries from one phase to the next.

  1. Retrieval-Augmented Generation (RAG) in Security

Challenge: CVEs and exploits change too quickly for static models.
Solution RAG: Keep the LLM static but connect it to external data sources.
How it works: Services and version info from scans query a database of CVEs/exploits. Retrieved text is fed into the LLM before it decides its next action.
Advantages over fine-tuning:
No retraining required.
New data instantly usable.
Fewer hallucinations since answers are grounded in retrieved content.
Security relevance: Ensures agents know the most recent exploits before acting.

  1. Benefits of Agent-Based Red Teaming

Multi-stage automation that adapts to tool output.
Validation chains reduce false positives (one agent checks another’s results).
Natural language summaries accelerate reporting.
Multiple agent workflows can run at the same time.

  1. Hands-On Lab / Demonstration

Scenario: Live demo of a LangChain red team agent performing an attack chain in a sandbox.
Workflow:
Recon Agent runs Nmap dynamically.
Vuln Agent parses services, uses Nuclei templates, discovers CVEs.
Exploit Agent picks and launches a Metasploit module.
Report Agent creates a structured summary (target, vuln, exploit, result).
Key points:
All commands generated dynamically.
Memory persists across agents.
Safe demo in a CTF-style environment.

Andrea Brosio is a Security Researcher and Senior Content Engineer at TryHackMe, specializing in red teaming, malware development, and offensive security. With prior experience as a Bug Hunter and Red Team Operator he combines real-world adversarial expertise with a passion for creating engaging cybersecurity training.