Automating Dark Web CTI Reports ​ with RAG Insight for MISP Sharing
10-22, 14:30–15:00 (Europe/Luxembourg), Europe - Main Room

In the current digital landscape, organizations often do not become aware immediately when their data is compromised and sold online. Our objective is to minimize the duration between the exposure of data on the internet and its detection by the public. The dark web serves as a primary marketplace for the trade of personal information, accessible safely only through the use of the Tor browser. This paper focuses on monitoring significant trading forums on the dark web and demonstrates the method of web scraping specifically designed for dark web sites. Utilizing data harvested from these sites, we have trained a BERT classification model to categorize transaction posts into five distinct types of data leaks, enabling rapid identification of the leak type associated with each post.

Further, we employ the Retrieval-Augmented Generation (RAG) technique to vectorize dark web data, maintaining privacy while leveraging mainstream large language models to address concerns pertinent to cybersecurity analysts. This approach allows researchers to analyze dark web data effectively. Ultimately, the data collected from the dark web is formatted into STIX (Structured Threat Information Expression) and integrated into the MISP (Malware Information Sharing Platform) system to automate the generation of Cyber Threat Intelligence (CTI) reports. This methodology not only enhances the timeliness and accuracy of threat detection but also contributes to more efficient and proactive cybersecurity management.


This talk will include the following topic:
Introduce dark web forums
Dark web crawler
BERT classification
retrieval augmented generation introduction and application
Dark web CTI case study
STIX format CTI
MISP for sharing CTI

Shing-Li (Yuki) Hung is currently a cybersecurity researcher at CyCraft and he is graduated from National Tsing Hua University, Taiwan. His research primarily focuses on the analysis of dark web intelligence, applying deep learning models within the cybersecurity field. He has also conducted visiting research at the National Institute of Information and Communications Technology (NICT) in Japan. Yuki's research findings have been presented at prestigious platforms such as HITCON and PyCon TW. Additionally, he is a co-author for the cybersecurity resource website https://sectools.tw.