The Fault in Our Metrics: Rethinking How We Measure Detection & Response
.ical

2024-08-06 11:30–12:15, Florentine A

Your metrics are boring and dangerous. Recycled slides with meaningless counts of alerts, incidents, true and false positives… SNOOZE. Even worse, it’s motivating your team to distort the truth and subvert progress. This talk is your wake-up call to rethink your detection & response metrics.

Metrics tell a story. But before we can describe the effectiveness of our capabilities, our audience first needs to grasp what modern detection & response is and its value. So, how do we tell that story, especially to leadership?

Measurements help us get results. But if you’re advocating for faster response times, you might be encouraging your team to make hasty decisions that lead to increased risk. So, how do we find a set of measurements, both qualitative and quantitative, that incentivizes progress and serves as a north star to modern detection & response?

At the end of this talk, you’ll walk away with a practical framework for developing your own metrics, a new maturity model for measuring detection & response capabilities, data gathering techniques that tell a convincing story using micro-purple testing, and lots of visual examples of metrics that won’t put your audience to sleep.

What’s new in this talk?

This talk addresses the problem of detection and response programs not getting the funding and support needed to succeed because the current industry standard for metrics fails to tell the story of what detection and response does, how it provides value, and how data across security operations, threat intelligence, detection coverage, threat hunting, and incidents should shape decisions that reduce risk. This talk presents a new approach to detection and response metrics. I propose moving away from the typical approach of measuring effectiveness solely based on quantitative indicators, such as event counts, which are often used by security operation centers or legacy detection and response programs. I introduce a new maturity model – the Threat Detection and Response Maturity Model (TDRMM) – for measuring detection and response capabilities. I provide a new practical methodology for utilizing micro-purple testing – tests that validate detection logic and analysis and response processes – to measure overall visibility into threats. Finally, I walk the audience through a new framework – the SAVER framework – for developing and improving detection and response metrics, along with visual examples, that will help them develop and present their own metrics across teams and leadership.

Key takeaways

A new maturity model that helps tell the story of modern detection and response, the value it provides, and how your current capabilities level against your goal state.
A framework for developing your own detection and response metrics with visual examples and practical advice on how to strategically move to these modern metrics when change is hard and leadership hates surprises.
Methods to measure and prioritize threat coverage with micro-purple testing – tests that validate detection logic and analysis and response processes – without the fatigue that “100% MITRE ATT&CK coverage” brings.

Who will enjoy this talk?

A CISO that wants to better understand what modern detection and response metrics should look like and how to include them in their overall program metrics.
Managers and directors that present detection and response metrics to leadership and the rest of their organization.
Engineers and analysts that are tired of their work being misrepresented with sad, unmotivating metrics.
Anyone interested in learning more about detection and response.

Outline

1. Introduction

I will share a personal story of how bad metrics have led to bad decisions and motivated unproductive behaviors. Then I will present the key takeaways:
A) a new maturity model, the Threat Detection and Response Maturity Model, to describe and measure detection and response capabilities
B) a new framework, the SAVER Framework, for developing and improving detection and response metrics, along with visual examples that can be used today to present metrics across teams and leadership
C) a practical methodology for prioritizing detection engineering and determining threat coverage by utilizing micro-purple tests and the MITRE ATT&CK Framework

2. Background and Motivation

This will provide the context for why I believe blue teams have been doing detection and response metrics all wrong and how it’s prevented them from getting the support and funding needed to succeed. I will discuss how using sole quantitative indicators like event counts not only fails to tell the story of the value a detection and response program can bring to an organization, but also incorrectly incentivizes people to focus on the specific detections and events instead of thinking about the overall effectiveness of the program. I will also provide the required terminology and background research regarding metrics and measurements.

3. Five Terrible Mistakes I’ve Made When Creating Metrics

This will walk the audience through the following five mistakes (with examples) I’ve made when creating metrics:
A) Losing sight of the goal
B) Using quantities that lack controls
C) Thinking proxy metrics were bad
D) Not adjusting to the altitude
E) Asking “Why?” instead of “How?”

4. Using the TDR Maturity Model to Measure our Program

We will begin our journey into modern metrics by telling the story of what we do and why it’s important. I will introduce the Threat Detection and Response Maturity Model (TDRMM), a maturity model I created to measure detection and response across 3 pillars (Observability, Proactive Threat Detection, and Rapid Response) and 15 capabilities. I will provide examples of how to use the maturity model to visualize current and target state, and how to visualize for leadership.

5. Using Micro-Purple Testing to Measure Detection Coverage

Next, I will introduce the concept of measuring and prioritizing threat coverage with micro-purple testing – tests that validate detection logic, analysis procedures, and response processes. I will walk the audience through a methodology that combines external threat intel, internal incident trends, and organizational security risks to prioritize threat techniques from the MITRE ATT&CK Framework. I will visualize how to implement this testing and how to use the results to measure what types of threats can be detected (and not). I discuss the problems and challenges that a “100% MITRE ATT&CK coverage” mindset brings, and how to scope micro-purple testing in a way that’s actionable today.

6. Using the SAVER Framework to build metrics that matter

Continuing, I will introduce the SAVER framework, a framework I created for building better detection and response metrics. I will walk the audience through its five categories:

A) Streamlined: Efficiency, accuracy, and automation of the SOC. For example, how much time is spent on manual triage vs automated, and how often the automation leads to incorrect conclusions.
B) Awareness: Context and intelligence about existing and emerging threats, vulnerabilities, and risks. For example, how complete is the threat model for the environments being protected and how is threat intelligence sourcing increasing or decreasing the associated risks.
C) Vigilance: Visibility and detection coverage for known threats. For example, the percentage of MITRE ATT&CK techniques that can be investigated, detected, and responded to.
D) Exploration: Proactive investigations that expand our awareness and vigilance. For example, the discovery of gaps in current protections and illumination of new threats to the organization using threat hunting.
E) Readiness: How prepared are we for the next big incident? For example: the speed, accuracy, and completeness of runbooks across the organization.

I will then use the previous mistakes from section 3 as examples and improve each metric by putting each one into the following structure:

A) Outcome: What is the goal of measuring this metric?
B) Question: What question does this metric answer?
C) Category: Which SAVER category does this metric fall under?
D) Metric control: How do we control this metric today?
E) Risk reward: What risks could this measurement reward?
F) Data requirements: What data and sample size is required?
G) Effort cost: How much new effort is needed to improve this metric?
H) Metric cost: How much time does it cost to collect this metric?
I) Metric expiration: When is this metric not needed anymore?

For each of these categories, I’ll discuss the signals needed and how to collect them. We’ll discuss how to avoid the common pitfalls of metric selection and instead use metrics that can be measured today but impact future outcomes, ensure metrics can be affected by the team, and balance metrics to align with the overall goal. And finally, we’ll go through many examples of data visualizations, showing how to present these metrics. The result is measurements that lead to better decisions, data that adds qualitative data to quantitative indicators, and consideration for the set of measurements so the team is incentivized toward progress.

7. Putting it all together

I will discuss how to strategically move to these new types of metrics considering that change is hard and leadership hates surprises. I will give examples of transitional metrics and how to describe the shift to these more modern metrics. Finally, this section will provide a moment of bliss where we reflect on how we previously used to present metrics, and how going forward the audience is now empowered to tell the story of their detection and response program with the TDRMM, describe threat coverage and priorities using targeted micro-purple testing, and build actionable metrics using the SAVER framework.

Allyn Stott

Allyn Stott is a senior staff engineer at Airbnb where he works on the infosec technology leadership team. He spends most of his time working on enterprise security, threat detection, and incident response. Over the past decade, he has built and run detection and response programs at companies including Delta Dental of California, MZ, and Palantir. Red team tears are his testimonials.

In the late evenings, after his toddler ceases all antics for the day, Allyn writes a semi-regular, exclusive security newsletter. This morning espresso shot can be served directly to your inbox by subscribing at meoward.co.

Allyn has previously presented at Black Hat Europe, Black Hat Asia, Kernelcon, The Diana Initiative, Texas Cyber Summit, and BSides around the world. He received his Masters in High Tech Crime Investigation from The George Washington University as part of the Department of Defense Information Assurance Scholarship Program.

The Fault in Our Metrics: Rethinking How We Measure Detection & Response .ical 2024-08-06 11:30–12:15, Florentine A