PyConDE & PyData Berlin 2024

Using LLMs to Create Knowledge Graphs From a Large Corpus of Parliamentary Debates
2024-04-22 , B05-B06

Large Language Models (LLMs) have proven to be incredibly powerful on a range of tasks. They do however, have certain limitations when the input context becomes significantly large. Solutions such as Retrieval Augmented Generation (RAG) do a great job in providing context from custom data without retraining any models but they too have limitations, especially when the context is spread out over many documents. Consider the question “Which projects has person X worked on?”. Information required to answer this question may be spread out over hundreds of documents, making it difficult for an LLM alone to answer. One way to overcome this issue is to use an LLM as an entity extraction tool, which can extract entities and relationships from documents and load that data into a structured format such as a knowledge graph. In this talk, I will demonstrate this process on a dataset of parliamentary debates, showing how downstream analytics becomes more intuitive and feasible.


In this talk, I will demonstrate the process through which I implemented a solution to create knowledge graphs using LLMs and why this can be powerful.

Agenda:
- Limitations of LLMs and RAG for specific tasks
- Knowledge graph (KG) bascis
- Creating KGs using LLMs
- Dataset and use-case: official parliamentary debates
- Practical experience in creating an LLM-based pipeline
- Retrieving data using natural language i.e. Text2SQL
- Future works


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Novice

Abstract as a tweet (X) or toot (Mastodon):

This talk demonstrates how we can intuitively analyze political debates using knowledge graphs created using LLMs.

Usman is a Machine Learning Engineer working for Xebia Data, with an interest for graph theory, low-level machine learning frameworks and the bridge between research and real-world implementation.