2026-03-11 –, Main Hall
Large language models are now a common tool for writing code and exploring ideas, but using them on HPC systems can still be a challenge. To make this simpler, I built an interactive Open OnDemand application that lets users access a local LLM directly inside Jupyter Notebook using the Jupyter AI extension. Behind the scenes, user requests are routed through a load balancer to a pool of vLLM inference servers running on Intel GPUs, which serve open-source models up to 70B parameters.
In this talk, I’ll walk through how the system works, why we built it, and how it gives researchers easy, reliable access to LLM assistance without relying on external cloud services.
The talk will demonstrate the capabilities of the application: debugging programs, and assisting with writing code for simple tasks. I will then walk through the system design and the challenges of creating an application like this.