PyConDE & PyData Berlin 2024

Building Professional Voice AI with Vocode
04-23, 11:40–12:10 (Europe/Berlin), B07-B08

Dive into the world of AI voice agents with Vocode, the leading framework for creating interactive, voice-based AI assistants. In this talk, we'll explore how Vocode integrates speech-to-text, response generation, and speech synthesis APIs to create agents that not only speak but also understand and adapt to the nuances of human conversation. We'll discuss the challenges of teaching these agents the etiquette of real conversations, such as knowing when to pause, not interrupt, and conclude interactions. Plus, we'll showcase Vocode's LLM function-calling feature through a practical example: real-time appointment booking. Join us to uncover the secrets behind building AI voice agents that are as engaging and efficient as they are innovative.


The AI open-source package Vocode (https://github.com/vocodedev/vocode-python) has emerged as a leader in creating AI voice agents since May 2023. These are the interactive voices on the other end of the phone, ready to assist with various tasks. My journey with Vocode began in August while developing a commercial platform that allows for no-code creation of voice agents utilizing Vocode's capabilities.
This presentation delves into the intricacies of Vocode. It's not just about voice; it's about crafting an experience. The framework seamlessly integrates external APIs for speech-to-text conversion, Large Language Model (LLM) response generation, and speech synthesis. But the real challenge lies in the nuances of human conversation: teaching the bot to pause when interrupted, not to speak over others, and to recognize the natural end of a conversation. These subtleties are what make interactions with Vocode feel remarkably human.
A significant part of this talk will focus on the LLM function-calling feature of Vocode, particularly in real-time tasks like booking appointments. Imagine a scenario where you're speaking to 'Jane', a virtual plumber, to schedule a visit. The interaction feels real, with the bot understanding and responding to changes in appointment preferences, such as switching from a suggested time of "tomorrow at 9 am" to a more suitable slot "next month".
This talk aims to share insights and practical knowledge about building and refining AI voice agents, making them more than just voices on a call but rather engaging, interactive entities capable of performing complex tasks with ease and human-like finesse.


Expected audience expertise: Domain

Novice

Expected audience expertise: Python

Novice

Abstract as a tweet (X) or toot (Mastodon)

Meet Vocode, an open-source framework for AI voice agents. We'll cover its integration of speech APIs, LLMs, and conversation etiquette in real-world applications. #OpenSource #AI #VoiceAgents

Public link to supporting material, e.g. videos, Github, etc.

https://github.com/vocodedev/vocode-python

See also: slides (779.2 KB)

Lev Konstantinovskiy is the Head of Engineering at a Berlin start-up Synthflow that specialises in AI voice agents. Long time ago he used to maintain a python Natural Language Processing library gensim