GPU Accelerated Containers on Apple Silicon with libkrun and podman machine
Advances in AI have allowed users to run machine learning models locally on their desktop. However, due to the heterogeneous software stack for AI accelerators, they can be difficult to run efficiently locally. For example, when running macOS on Apple Silicon, a user can build llama.cpp (with Metal backend) and offload the inference work to the M-based GPU. However, running this model will result in your desktop getting a thorough exercise in the process. In an effort to control resources and scope of the model, is it possible to run GPU-accelerated applications from containers, even on macOS?
As containers are mainly a Linux paradigm, the use of them on macOS implies the use of virtualization. Software tools such a podman machine are able to accomplish this by running containers inside of a Linux virtual machine. Recently, libkrun was accepted as a hypervisor backend (i.e. the hypervisor that runs the Linux containers in virtual machines) for podman machine. Latest enhancements in libkrun on macOS have allowed users to run workloads with Apple GPU acceleration.
In this talk, we will discuss how podman machine and libkrun work together (coupled by a new project, krunkit) to make this possible. The talk will conclude with a demonstration running an AI workload offloaded to the host GPU on macOS. In conclusion, users will have a better understanding of how you can leverage podman machine and libkrun to make the most of your hardware when running AI workloads in containers on macOS.