JuliaCon 2023

Quick Assembly of Personalized Voice Assistants with JustSayIt
2023-07-26 , 32-123

We present an approach to quickly assemble fully personalized voice assistants with JustSayIt.jl. To assemble a voice assistant, it is sufficient to define a dictionary with command names as keys and objects representing actions as values. Objects of type Cmd, for example, will automatically open the corresponding application. To define application-specific commands - a key feature for voice assistants - a command dictionary can simply be tied to the Cmd-object triggering the application.


Creating a feature-complete voice assistant for desktop computing is practically impossible, because it would mean to support any possible software that exists, including every tiny application written by individuals. Moreover, the way computers are used varies strongly from one user to the other, making personalizability indispensable for voice assistants. We address these challenges by empowering the users themselves to quickly assemble the voice assistant they desire.

JustSayIt.jl enables to quickly assemble fully personalized voice assistants. One solely needs to define a normal Julia dictionary with command names as keys and objects representing actions as values. The object type determines the kind of action that will be taken at runtime. For example, if the object is a Function, then it will be called; if it is a Tuple of keyboard keys representing a keyboard shortcut, then the keys will be pressed; and if it is a Cmd, then the corresponding application will be opened. Furthermore, the object can also be an array of action objects, representing a sequence of commands. In addition, it is trivial to define application-specific commands, which is key to effective voice control: it is sufficient to create a dictionary with the application-specific commands and tie it to the Cmd-object triggering the application. JustSayIt.jl will then automatically take care of activating the application-specific commands when the application is opened, and deactivating them again as soon as another application is opened. The activation and deactivation of commands requires adaption with respect to the speech recognizers used and the transition between them is a challenge when required within a contiguously spoken word group; JustSayIt.jl solves this challenge by dynamically generating those recognizers in function of the word group context, when beneficial for recognition accuracy.

The action-semantics that is associated to the object types in the command dictionaries combined with the possibility to define command sequences on-the-fly and application-specific commands without effort results in an unprecedented, highly expressive and effective way to assemble personalized voice assistants. Naturally, it is trivial to share the command dictionaries defining the assembly of a voice assistant and the JustSayItRecipes.jl repository provides a platform for it. As a result, JustSayIt.jl empowers the world-wide open source community to shape together each one's personal daily assistant.

Computational Scientist and Responsible for Julia computing at the Swiss National Supercomputing Centre, ETH Zurich

This speaker also appears in: