PyConDE & PyData Berlin 2024

John Sandall

John Sandall is the CEO and Principal Data Scientist at Coefficient.

His experience in data science and software engineering spans multiple industries and applications, and his passion for the power of data extends far beyond his work for Coefficient’s clients. In April 2017 he created SixFifty in order to predict the UK General Election using open data and advanced modelling techniques. Previous experience includes Lead Data Scientist at YPlan, business analytics at Apple, genomics research at Imperial College London, building an ed-tech startup at Knodium, developing strategy & technological infrastructure for international non-profit startup STIR Education, and losing sleep to many hackathons along the way.

John is also a co-organiser of PyData London, co-founded Humble Data in 2019 to promote diversity in data science through a programme of free bootcamps, and in 2020 was a Committee Chair for the PyData Global Conference. He is currently a Fellow of Newspeak House with interests in open data, AI ethics and promoting diversity in tech.


X / Twitter handle

@john_sandall

Github

https://github.com/john-sandall/

LinkedIn

https://www.linkedin.com/in/johnsandall/


Session

04-22
14:35
30min
Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations
John Sandall

Are you secretly a spy and/or passionate about open-source? Maybe you don't trust a cloud-hosted service with your highly classified information, or perhaps you like to build things for yourself. In this light-hearted talk, you will learn how to make a real-time on-device GenAI-powered application that can live transcribe and summarize conversations without internet access, using open-source components.

Our journey begins with an introduction to open-source LLMs and the latest trends in running GenAI tools on your own hardware. We will build up our application step-by-step, first creating a live streaming voice-to-text transcription pipeline, then an LLM-based conversation summarization layer, presented within a Streamlit frontend, with conversation summaries sent to a lightweight Django API backend for storage.

This talk is tailored for Python enthusiasts and requires no ML expertise. By seeing a practical demo come together piece by piece, attendees will gain a deeper understanding of how to build their own complex Generative AI applications and be pushed to imagine what they could make for themselves using on-device computation in real-world scenarios.

PyData: Generative AI
B05-B06