PyConDE & PyData Berlin 2024

Build an AI Document Inquiry Chat with Offline LLMs
2024-04-22 , A05-A06

As we descend from the peak of the hype cycle around Large Language Models (LLMs), chat-based document inquiry systems have emerged as a high-value practical use case. Retrieval-Augmented Generation (RAG) is a technique to share relevant context and external information (retrieved from vector storage) to LLMs, thus making them more powerful and accurate.

In this hands-on tutorial, we’ll dive into RAG by creating a personal chat app that accurately answers questions about your selected documents. We’ll use a new OSS project called Ragna that provides a friendly Python and REST API, designed for this particular case. We’ll test the effectiveness of different LLMs and vector databases, including an offline LLM (i.e., local LLM) running on GPUs on the cloud-machines provided to you. And, we’ll conclude by demonstrating how to quickly build personal or company-level chat-based document interrogation systems.


The ability to ask natural language questions and get relevant and accurate answers from a large corpus of documents can fundamentally transform organizations and make institutional knowledge accessible. Foundational LLM models like OpenAI’s GPT4 provide powerful capabilities, but using them directly to answer questions about a collection of documents presents accuracy-related limitations. Retrieval-augmented generation (RAG) is the leading approach to enhancing the capabilities and usability of Large Language Models.

In this tutorial, we will learn to use RAG to build document-inquiry chat systems using different commercial and locally running LLMs. The topics we’ll cover include:

  • Introduction to RAG, how it works and interacts with LLMs, and Ragna - a framework for RAG orchestration
  • Creating a basic chat function that uses popular LLMs (like GPT) answers questions about your documents, using a Python API in Jupyter Notebooks
  • Optimizing the chat through experiments with different LLMs, vector databases, context windows, and more
  • Running a local LLM on GPUs on the provided platform, and comparing its performance to commercial LLMs
  • Walkthrough of the REST API for building web-apps and user interfaces and exploration of the built-in (Panel-based) web application
    By the end of this tutorial, you will have an understanding of the fundamental components that form a RAG model, and practical knowledge of open source tools that can help you or your organization explore and build on your own applications. This tutorial is designed to enable enthusiasts in our community to explore an interesting topic using some beginner-friendly Python libraries.

Expected audience expertise: Domain:

Novice

Expected audience expertise: Python:

Intermediate

Abstract as a tweet (X) or toot (Mastodon):

In this hands-on tutorial, we'll build an LLM-powered document inquiry chat application that uses Retrieval-Augmented Generation (RAG) for more accurate results. We'll test different LLMs, run an offline LLM on GPUs, and demonstrate a fully functional web app.

Public link to supporting material, e.g. videos, Github, etc.:

This is a new tutorial, which is yet to be created. However, a subset of the topics have been presented in a 20-30 minute talk: https://github.com/Quansight/ragna-presentations

Pavithra Eswaramoorthy is a Developer Advocate at Quansight, where she works to improve the developer experience and community engagement for several open source projects in the PyData community. Currently, she maintains the Bokeh visualization library, and contributes to the Nebari (adjacent to the Jupyter community), conda-store (part of the conda ecosystem), and Ragna (RAG orchestration framework) projects. Pavithra has been involved in the open source community for over 5 years, notable as a maintainer of the Dask library and an administrator for Wikimedia’s OSS programs. In her spare time, she enjoys a good book and hot coffee. :)

Philip is a Senior Software Engineer at Quansight. His recent worked focused on Ragna (https://ragna.chat) an OSS RAG orchestration framework.