PyCon DE & PyData 2025

Building Serverless Python AI skills as WASM components
2025-04-23 , Ferrum

Frameworks like llama-stack and langchain allow for quick prototyping of generative AI applications. However, companies often struggle to deploy these applications into production quickly. This talk explores the design of a Python SDK that enables the development of AI skills in Python and their compilation into WebAssembly (WASM) components, targeting a specific host runtime that offers interfaces for interacting with LLMs and associated tooling.


Why do companies struggle so hard to get their AI skills into production quickly?

This talk is about building an SDK that enables the development of production-ready AI skills in Python that can be run as serverless functions within a WASM runtime and interact with LLMs via a WIT (WASM Interface Type) world. On a less technical note, we will explore the design of an SDK that offers a streamlined development experience for AI skills.

We will explore the implications for topics such as testability, traceability, and the evaluation of AI logic. How can software engineering best practices, such as separation of concerns and modularity, be applied to the design of AI applications?

Fred Brooks' excellent essay, No Silver Bullet, distinguishes between accidental complexity and essential complexity. This talk will explore how an SDK for AI skills can reduce accidental complexity during development and deployment, providing developers with a focused environment for innovating prompts and retrieval strategies.

How can a WIT that supports running AI applications be designed, and how can bindings to such a WIT world be generated and consumed in a Python module? We will examine abstractions that allow local testing and debugging without the compilation step by encapsulating the WIT host interface behind a Protocol.

The talk also covers benefits of running AI skills as WASM components: When compiling a Python module to WebAssembly, the Python interpreter is part of the compiled component. Although this results in longer start-up times compared to components written in compiled languages like Rust, it provides a key advantage: The interpreter can securely execute Python code generated by an LLM within a highly restricted environment, ensuring no network or file system access.

Key takeaways include developing a foundational understanding of WASM and WIT, and how they can interface with Python. You will gain insights into the challenges of deploying AI skills into production and discover how testing, tracing, and evaluation can simplify this process.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Intermediate

Public link to supporting material, e.g. videos, Github, etc.:

https://github.com/Aleph-Alpha/pharia-kernel-sdk-py/

I currently work as a Software Engineer at Aleph Alpha. Before joining Aleph Alpha, I founded WeGlide and worked on error back-propagation in Spiking Neural Networks. I enjoy spending time outdoors and flying gliders.