PyConDE & PyData Berlin 2024

Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations
2024-04-22 , B05-B06

Are you secretly a spy and/or passionate about open-source? Maybe you don't trust a cloud-hosted service with your highly classified information, or perhaps you like to build things for yourself. In this light-hearted talk, you will learn how to make a real-time on-device GenAI-powered application that can live transcribe and summarize conversations without internet access, using open-source components.

Our journey begins with an introduction to open-source LLMs and the latest trends in running GenAI tools on your own hardware. We will build up our application step-by-step, first creating a live streaming voice-to-text transcription pipeline, then an LLM-based conversation summarization layer, presented within a Streamlit frontend, with conversation summaries sent to a lightweight Django API backend for storage.

This talk is tailored for Python enthusiasts and requires no ML expertise. By seeing a practical demo come together piece by piece, attendees will gain a deeper understanding of how to build their own complex Generative AI applications and be pushed to imagine what they could make for themselves using on-device computation in real-world scenarios.


This light-hearted talk will aim to introduce the audience to the latest trends and possibilities for building GenAI applications using open-source components. Here's why this matters:

  • Cloud-hosted SaaS tools cannot store highly sensitive information.
  • Good open-source alternatives exist for most GenAI tasks; the more people who use them, the more they will thrive.
  • Commercial tools will solve for common use cases, but developers can build personalized tools that are highly specialized for their own bespoke needs.

During the course of this talk, we will build a real-time conversation pipeline including transcription, summarization and topic analysis layers. We will use open-source Python libraries, including a Streamlit frontend and a Django API backend. The primary focus is to demonstrate the simplicity of building complex LLM-based applications, specifically tailored for attendees with a basic understanding of Python but who may not have prior experience using LLMs.

We'll explore a variety of tools*, the use of Whisper for accurate live transcription, delving into its capabilities and integration with Streamlit. Additionally, we'll discuss LangChain + llama.cpp + Llama-2 for efficient summarization and topic analysis, highlighting their performance on standard hardware like a MacBook Pro. For the web API, Django will be our framework of choice, providing a robust and scalable solution for storing and displaying our conversation transcripts and summaries. We will also demonstrate how additional tools can be easily integrated into our workflow, for example using the Chroma vector database to build a simple semantic search function.

Expect plenty of Python code and some fun live demos, with GitHub code provided for attendees to try it at home. This demo only covers a small fraction of the immensely versatile capabilities available from the modern open-source AI landscape, but will leave attendees with a sense that building complex LLM-powered applications that solve real-world problems has never been this easy.

* The exact tools presented may be different from those mentioned here, due to the rapidly evolving nature of this landscape. The goal is to ensure that attendees are provided with state-of-the-art content that is fully up-to-date come April 2024.


Expected audience expertise: Domain

Novice

Expected audience expertise: Python

Intermediate

Abstract as a tweet (X) or toot (Mastodon)

🕵️ Calling all Spythonistas: Do you need a live speech transcription and summarization "secret agent" that works offline by running on your own hardware? Learn about the latest trends in open-source GenAI tools and how to build your own in this light-hearted talk.

Public link to supporting material, e.g. videos, Github, etc.

I will provide a GitHub link for the talk's code if selected

John Sandall is the CEO and Principal Data Scientist at Coefficient.

His experience in data science and software engineering spans multiple industries and applications, and his passion for the power of data extends far beyond his work for Coefficient’s clients. In April 2017 he created SixFifty in order to predict the UK General Election using open data and advanced modelling techniques. Previous experience includes Lead Data Scientist at YPlan, business analytics at Apple, genomics research at Imperial College London, building an ed-tech startup at Knodium, developing strategy & technological infrastructure for international non-profit startup STIR Education, and losing sleep to many hackathons along the way.

John is also a co-organiser of PyData London, co-founded Humble Data in 2019 to promote diversity in data science through a programme of free bootcamps, and in 2020 was a Committee Chair for the PyData Global Conference. He is currently a Fellow of Newspeak House with interests in open data, AI ethics and promoting diversity in tech.