Security Bsides Las Vegas 2024

BOLABuster: Harnessing LLMs for Automating BOLA Detection
2024-08-07 , Florentine A

BOLA poses severe threats to modern APIs and web applications. It's considered the top risk by OWASP API and a regularly reported vulnerability on HackerOne Top10. However, automatically identifying BOLAs is challenging due to application complexity, wide range of input parameters, and the stateful nature of modern web applications.

To overcome these issues, we leverage LLM's reasoning and generative capabilities to automate tasks, such as understanding application logic, revealing endpoint dependencies, generating test cases, and interpreting results. This AI-backed method, coupled with heuristics, enables full-scale automated BOLA detection. We dub this research BOLABuster.

Despite being in its early stages, BOLABuster has exposed multiple vulnerabilities in open-source projects. Notably, we submitted 15 CVEs for a single project, leading to critical privilege escalation. Our latest disclosed vulnerability, CVE-2024-1313, was a BOLA vulnerability in Grafana, an open-source platform with over 20 million users. When benchmarked against other state-of-the-art fuzzing tools, BOLABuster sends less than 1% of the API requests to detect a BOLA.

In this talk, we'll share the methodology and lessons from our research. Join us to learn about our AI journey and explore a novel approach to vulnerability research.


Broken Level Object Authorization (BOLA)
- What is BOLA
- Motivation for the research: The popularity and impact of BOLA vulnerabilities

The challenges of automating BOLA hunting
- Application complexity: Modern API applications often have complex authorization logic involving multiple endpoints, resources, and user roles. Understanding the application's architecture, data flow, and business logic is crucial for identifying BOLA, making it a challenging task.
- Stateful property: Most contemporary web applications are stateful, where each API call can alter the application's state and influence other API calls' responses. Therefore, interacting with these endpoints in sequences that align with the application logic is crucial.
- Lack of vulnerability indicators: BOLA, a logical error, doesn't display error messages when a vulnerable endpoint is triggered. The input and response of a successful exploit resemble a benign API request, making it difficult to interpret the output and determine vulnerability.
- Context-sensitive inputs: BOLA testing involves manipulating input parameter values. Only specific values for the correct parameters can trigger the vulnerability. Therefore, successful BOLA detection relies on identifying parameters referencing sensitive user data and providing those parameters with existing values, a task challenging with traditional fuzzing techniques.

The use of LLMs for automating BOLA hunting
- Identifying Potentially Vulnerable Endpoints: This stage involves identifying endpoints potentially susceptible to BOLA, typically authenticated endpoints operating on user-specific data. Examples include updating a user's email, retrieving an invoice, creating a post, or deleting a comment. AI aids in analyzing the functionalities and parameters of each API endpoint to determine those that operate on or return sensitive user data.
- Uncovering Endpoint Relationships and Creating Test Plans: This stage involves analyzing application logic and workflow to reveal relationships between API endpoints. Given the stateful nature of modern web applications, understanding the context and prerequisites of each API endpoint is crucial for meaningful and effective testing. AI assists in analyzing endpoint relationships and grouping relevant ones, generating a test plan that includes one or multiple test cases. Each test case consists of a sequence of API calls starting with logging in to retrieve authentication secrets and ending with a call to a potentially vulnerable endpoint.
- Executing Plans and Analyzing Responses: This stage involves executing test plans against the API server and analyzing responses to determine BOLA vulnerability. The process of user registration, login, and token refresh is automated to ensure uninterrupted test plan execution. AI analyzes the logs and output of each test plan, with human verification when an endpoint is deemed vulnerable. AI generates an executable bash script to interact with the API server for each test case, involving at least two different authenticated users, with one user attempting to access another user's data. AI then analyzes the log and response of each test case to determine potential BOLA vulnerability.

Evaluation
- Testing Apps with Known BOLA: We evaluated our methodology's performance and accuracy using three open-source projects with known BOLA vulnerabilities. We compared BOLABuster’s performance with RESTLer, a widely used open-source API fuzzer. BOLABuster successfully identified all known BOLA endpoints in these projects, while RESTler, using its default configurations, failed to detect any BOLA vulnerabilities. On average, BOLABuster makes fewer than 1% of the API requests to a target server, yet uncovers more vulnerabilities than RESTler.

Discovery of 16 new CVEs
- Hunting BOLA in real world applications: Despite our methodology undergoing continuous refinement, we have discovered and reported numerous previously unknown BOLA vulnerabilities in open-source projects, some leading to critical privilege escalation. Our latest disclosed vulnerability was CVE-2024-1313, a BOLA vulnerability in Grafana, used by 20+ million users. We will provide an overview of a few sample findings and outline the discovery process. Our primary objective is to expedite and scale up the testing process to enhance the security of open-source applications.

Lessons learned from AI partnership
- Not all LLMs are equal: We observed that BOLABuster’s performance varies with the reasoning capabilities of LLMs. While designed to be LLM-agnostic, pairing it with early-generation LLMs resulted in accuracy degradation. However, as AI technology advances, our tool's accuracy and speed will improve with new models.
- Triumph and failure: Our research taught us that tasks with heuristic solutions should not be tackled using AI. Despite LLM's versatility, its inherent uncertainty and tendency to hallucinate make it less efficient and precise than heuristic solutions. Simple tasks should be left to simple solutions.

Remaining Challenges
- Spec quality: BOLABuster’s performance heavily depends on the accuracy of the web app’s OpenAPI specifications. The quality of API specifications varies significantly between applications. Adhering to conventions, such as specifying the security scope for paths, indicating "required" keywords for parameters, and avoiding circular references in components, greatly facilitates the generation of valid test cases.
- Cost: The cost of utilizing generative AIs remains a barrier to scaling our solution. State-of-the-art models that provide the best results are usually costly. Even self-hosted LLMs require powerful GPUs and CPUs to handle large data volumes and processing tasks. However, we anticipate that cost overhead will decrease as technology evolves.

Jay Chen is a Cloud Security Researcher with Prisma Cloud and Unit 42 at Palo Alto Networks. He has extensive research experience in cloud security. In his role at Palo Alto Networks, he focuses on investigating the vulnerabilities, design flaws, and adversarial TTPs in cloud-native technologies such as containers and public cloud services. He works to develop methodologies for identifying and remediating security gaps in public clouds and works to protect Prisma Cloud customers from threats.

In previous roles, he has researched mobile cloud security and distributed storage security. Jay has authored 25+ academic and industrial papers.

Ravid is a Senior Security Researcher with more than 6 years of hands-on experience in the Application & API Security field. As a Bachelor of Information Systems with a specialization in Cyber, Ravid brings an innovative attitude to the table, while researching different aspects in the AppSec world. He’s eager to experience, experiment, and learn something new every day. In his free time, Ravid likes to travel, exercise, and have a good time with friends and family.

Jay Chen is a Cloud Security Researcher with Prisma Cloud and Unit 42 at Palo Alto Networks. He has extensive research experience in cloud security. In his role at Palo Alto Networks, he focuses on investigating the vulnerabilities, design flaws, and adversarial TTPs in cloud-native technologies such as containers and public cloud services. He works to develop methodologies for identifying and remediating security gaps in public clouds and works to protect Prisma Cloud customers from threats.

In previous roles, he has researched mobile cloud security and distributed storage security. Jay has authored 25+ academic and industrial papers.