JuliaCon 2025

Enhancing Deterministic Voice Control with LLM Interaction
2025-07-24 , Main Room 2

The approach presented here bridges a gap in human-AI interaction, enabling LLM intelligence for small tasks where traditional chat is too cumbersome. It combines deterministic voice control with offline execution of distilled LLMs, offering privacy, efficiency, and performance. The LLM output can be converted to audio via text-to-speech, enabling almost human-like interaction. Implemented in JustSayIt.jl, it lets anyone define or program voice-enhanced LLM interaction tailored to their needs.


Large Language Models (LLMs) are rapidly advancing, with increasingly cost-effective training and inference, and the emergence of powerful, compact models such as distilled variants of DeepSeek. These models can now be executed efficiently on commodity hardware, opening new avenues for offline applications. Operating offline offers significant advantages, most notably maintaining data privacy by ensuring all information processing remains local to the user's device. Furthermore, offline execution reduces latency, enabling efficient processing of small, routine tasks within a fraction of a second. One compelling use case is the development of voice assistants that leverage LLMs for intelligence while maintaining strict control over system operations.

While LLM-based AI agents can reason, generate text, and even execute predefined actions, granting them unrestricted control over a personal computer poses safety risks due to potential hallucinations and ambiguous user inputs. To address this, we present an approach that integrates deterministic voice control with LLM-enhanced interactions, enabling safe and efficient offline voice assistants. This approach is implemented in the Julia package JustSayIt.jl. This talk focuses on the enhancement of the initially purely deterministic voice control application (see the paper [1], the 2022 talk [2], and the 2023 talk [3]) with LLM technology.

JustSayIt.jl operates entirely offline, utilizing locally installed LLMs via Ollama, with model management and interaction handled through Python packages such as ollama, accessed via PyCall. LLM interactions are streamlined by incorporating dynamic context such as selected text and clipboard content. This enables, for example, near-instantaneous text summaries or translations via single-word voice commands. More complex tasks are enabled by directly forwarding free speech to the LLM, and the LLM output can be instantly converted to audio output using a text-to-speech engine. As a result, almost human-like interaction with the LLM is possible.

For speech recognition, JustSayIt.jl employs a dual approach: lightweight, low-latency constrained speech recognition using Vosk for deterministic commands and high-accuracy free speech recognition via a faster reimplementation of OpenAI's Whisper for natural language interactions. The system automatically switches between these engines to optimize responsiveness and accuracy, ensuring seamless voice control and LLM interaction without performance bottlenecks.

The approach presented here bridges a crucial gap in human-AI interaction by enabling the seamless integration of LLM intelligence into small, everyday tasks where traditional LLM-chat interactions are overly cumbersome and time-consuming. By combining deterministic, low-latency voice control with LLM capabilities and leveraging offline execution of distilled LLM models, this solution ensures privacy, efficiency, and performance. With limitless customizability and programmability, JustSayIt.jl empowers users to create their ideal, AI-powered voice assistant tailored to their individual needs.

References:

[1] Omlin, S. (2024). JustSayIt.jl: A Fresh Approach to Open Source Voice Assistant Development. The Proceedings of the JuliaCon Conferences, 6(66), 121, https://doi.org/10.21105/jcon.00121

[2] Omlin, S. (2022). JustSayIt.jl: A Fresh Approach to Open Source Voice Assistant Development. JuliaCon 2022 conference, 2022. https://www.youtube.com/watch?v=W7oQb7pLc04

[3] Omlin, S. (2023). Quick Assembly of Personalized Voice Assistants with JustSayIt. JuliaCon 2023 conference, 2023. https://www.youtube.com/watch?v=_gpH-mkrdGM

Computational Scientist | Responsible for Julia computing, Swiss National Supercomputing Centre (CSCS), ETH Zurich

This speaker also appears in: