PyCon DE & PyData 2026

Making bad CLIs fun with Small Language Models
2026-04-16 , Platinum [2nd Floor]

Command Line Interfaces (CLIs) offer an efficient and powerful way to interact with software, but poorly designed interfaces can be incredibly frustrating. Complicated parameter names and unconventional formats can turn using a great tool into a burdensome experience.

Large Language Models (LLMs) seem like a great solution to this problem as they can easily add a natural-language interface to any CLI. However, LLMs can introduce their own challenges, such as requiring API keys or high-performance GPUs. In this talk, I'll demonstrate a method for creating natural-language interfaces for any CLI using fine-tuned Small Language Models. These models are lightweight enough to be run directly on laptops or even smartphones.

We'll explore the process of generating synthetic data, fine-tuning models, and evaluating their performance using both an in-house CLI and a well-known open-source package as examples.


I've often had to rely on a poorly designed home-grown CLI, leading to frustration due to constantly forgetting argument names and allowable values. While Large Language Models (LLMs) initially appeared to be an ideal fix, their limitations quickly became evident, suggesting the need for a more efficient approach.

To begin, we'll have a look at what makes CLIs hard to use and articulate why LLMs fall short in addressing them. Following this, we'll examine the process of generating synthetic data tailored for any CLI, whether it's proprietary or open-source. Then, I'll show you how to use this synthetic dataset to fine-tune a Small Language Model on your laptop or in the cloud. We will use the smallest variant of Google's Gemma 3 models, which boasts a lean 270 million parameters, to transform natural language instructions into actionable CLI commands.

Lastly, I'll share benchmark results to illustrate that these models can operate smoothly on various machines without needing API keys or GPUs, showcasing their robust capability and practical deployment potential.


Expected audience expertise in your talk's domain:: Novice Expected audience expertise in Python:: Novice

Moritz Bauer is a Senior Data Scientist at Blue Yonder, where he currently develops software for demand forecasting. In a previous career, he obtained a Ph.D. in high-energy particle physics and contributed research to the Belle II flavor physics experiment at KEK.

While demand forecasting works very well without language models, he can't escape the fascination of modern AI and is always looking for excuses to spend some time in this domain.