NeuroCTI - a custom LLM for CTI - benchmarking, successes, failures and lessons learned (updates) :: hack.lu 2024

NeuroCTI - a custom LLM for CTI - benchmarking, successes, failures and lessons learned (updates)
.ical

2024-10-22 09:30–10:00, Europe - Main Room

LLMs turn out to be highly practical for summarising and extracting information from unstructured Cyber Threat Intelligence (CTI) reports. However, most models were not trained specifically for understanding CTI. We will present a custom LLM, fine-tuned for CTI purposes. But of course, that only makes sense with a CTI text benchmark dataset. Creating these two systems is a challenging journey. Set-backs guaranteed. We will share our findings.

(This is an update from the FIRSTCON24 talk)

Many CTI practitioners and companies experimented with LLMs for extracting information from unstructured CTI reports in the last year. Often, the dream is to automate the analyst's job to correctly identify, copy & paste TTPs, threat actors and relationships from the report and to convert it into STIX.

Alas, off-the-shelf LLMs often fail at this task (GPT-4-turbo being already pretty good at the submission time). But there is another caveat: the requirements for IT security often demand that data remains on-premise or at least in a virtual server which is fully and only under the control of the organisation's IT team. For that we need local LLMs (as opposed to cloud bases SaaS/FaaS solutions such as openai.com's API). But how to achieve good results with local LLMs ? Can we beat openai?

To address the CTI text summarisation and information extraction problem, we

propose an open source CTI LLM benchmark dataset which can be used to compare different LLMs and prompts
a fine-tuned custom CTI LLM model ("neuroCTI") and
evaluate it (as well as other LLMs) against the benchmark dataset and
finally, integrate serving the model via ollama and MISP integration.

The model is freely available for local deployments.

NeuroCTI - a custom LLM for CTI - benchmarking, successes, failures and lessons learned (updates) .ical 2024-10-22 09:30–10:00, Europe - Main Room

NeuroCTI - a custom LLM for CTI - benchmarking, successes, failures and lessons learned (updates)
.ical

2024-10-22 09:30–10:00, Europe - Main Room