Security BSides Las Vegas 2025

Inside the Open-Source Kill Chain: How LLMs Helped Catch Lazarus and Stop a Crypto Backdoor
2025-08-05 , Florentine A

This talk presents findings from a multi-year research project exploring how LLMs can be used in real-world threat detection across the open-source software supply chain. By applying LLMs to analyze large public datasets like changelogs, package metadata, and behavioral signals, we uncovered over 900 undisclosed vulnerabilities, including high-severity issues from popular packages like Axios and thousands of malicious packages published to public registries. This includes intercepting a live operation by North Korea’s Lazarus Group and preventing a backdoor from being shipped in the official Ripple (XRP) cryptocurrency SDK.

The talk also introduces the concept of the open-source kill chain, mapping how attackers abuse trust in public ecosystems to gain access, deliver payloads, and persist undetected.

Attendees will learn how out-of-the-box frontier LLMs like GPT-4 can be used today to augment traditional vulnerability discovery, identify patterns in attacker behavior, and assist in threat triage at scale. The talk is grounded in operational examples, focused on reproducible techniques, and offers a current view into how APTs and malware authors are actively exploiting the open-source ecosystem.


This talk presents findings from a multi-year research project that applied Large Language Models (LLMs) to real-world threat detection in the open-source software ecosystem. Rather than theorizing about AI’s future role in security, this work focuses on practical applications—showing how LLMs can be deployed today to detect vulnerabilities and malware that bypass traditional scanners, rulesets, and threat feeds.

The project centered around two key threat surfaces:
- Silently patched vulnerabilities in popular open-source libraries
- Malware published to package registries such as NPM and PyPI

LLM Pipeline: Silent Patch Detection
The first LLM pipeline was designed to analyze changelogs across thousands of open-source projects to identify likely security patches that were fixed but never disclosed (a practice often referred to as "silent patching"). This pipeline involved two stages:

LLM 1: Changelog Standardization and Parsing
- Changelogs vary wildly in structure, format, and tone—often written in markdown, HTML, or plaintext, hosted in GitHub, docs sites, or even PDFs. We used an LLM to extract, standardize, and structure this unbounded data into a consistent schema. This model also flagged ambiguous or security-relevant language (e.g., “stability fix”, “edge case resolved”) that would be easily overlooked by regex or keyword rules.

LLM 2: Patch Classification
- The parsed changelog entries were then passed to a second model trained to classify whether a given commit or entry was likely to contain a security fix, even if no security keywords were used. The model was tuned to be sensitive to euphemistic phrasing and changelog norms. High-confidence results were sent to human reviewers who reverse-engineered the patch to confirm and rate severity.

Findings:
This system uncovered over 900 silently patched vulnerabilities, many in major packages like Axios, Apache ECharts, and Chainlit.
- 67% never obtained a CVE or were published in any vulnerability databases
- 25% were rated high or critical severity
- Examples included a critical path traversal bug, stored XSS, and a prototype pollution issue exploitable via browser inputs.
- These vulnerabilities would have gone completely undetected by CVE-based tools

LLM Pipeline: Malware Detection in Registries
The second LLM-based detection pipeline was used to scan all newly published and updated packages on public registries, primarily NPM and Pypi.

LLM 1: Metadata Anomaly Detection
- This model ingested human-written data such as README files, descriptions, contributor metadata, and author behavior. It was trained to identify inconsistencies, abnormal phrasing, typosquatting patterns, and red flags in descriptions (e.g., toolsets pretending to be SDKs with unrelated language or package names mimicking popular libraries with low-quality documentation).

LLM 2: Orchestration and Triage
- The second LLM acted as an orchestrator of static scanning tools. We capture over 30 weighted indicators by running various static scans on the code. The LLM then uses these indicators and indicators from the previous model to decide whether to mark the package immediately as malware or escalate the package to a human researcher.

Findings:
- Over 600 malicious packages were discovered in a single month (March 2025).
- Detection time averaged 5 minutes post-publish, compared to 10+ days for OpenSSF.

Most common techniques included:
- Encoded payloads decoded at runtime
- Time-delayed execution using setTimeout()
- Clipboard hijackers and credential stealers
- Obfuscated C2 infrastructure, often hidden in build scripts

Notable Case Studies
Lazarus Group NPM Campaign
- The pipeline flagged a malicious package (react-html2pdf.js) uploaded to NPM containing obfuscated code and an embedded C2 call. We observed the attacker—later attributed to Lazarus Group—re-uploading new variants every 10 minutes, likely debugging live. We reported the campaign before a functional version was deployed.

Ripple SDK Backdoor
- A malicious version of the official Ripple SDK (@xrplf/xrpl) was published by a compromised maintainer token. It included a Node.js-only backdoor that connected to an external C2 server and stole private crypto keys. Detection occurred within minutes, and coordination with Ripple and NPM teams prevented what could have had a catastrophic impact on the crypto community.

Rand-User-Agent RAT Supply Chain Campaign
- In this campaign, attackers uploaded a popular NPM package was compromised via a dev token and a Remote Access Trojan (RAT) was injected into the project. The malware sent outbound C2 traffic using a randomized User-Agent string to evade common detection heuristics and proxy logs. It also used system profiling logic to avoid execution in CI/CD environments. This was not detected by any other databases even after 10 days from the malicious contribution.

This talk provides a deep technical look into how LLMs can assist in detecting real threats. It also focuses on how this research can be replicated using currently available frontier out-of-the-box models like GPT-4.

Mackenzie is a security researcher and advocate with a passion for code security. He is the former CTO and founder of Conpago, where he learned firsthand the importance of building secure applications. Today, Mackenzie works for Aikido Security to help developers and DevOps engineers build secure systems. He also shares his knowledge as a contributor to many technology publications like Dark Reading, Financial Times, and Security Boulevard and was featured as an expert in the documentary “Logins aus dem Darknet” (EN: Logins from the Darknet).

This speaker also appears in: