KVM Forum 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:00
09:00
10min
Keynote (KVM)
Room 1
09:15
09:15
30min
Preserving VFIO PCI Devices During Kernel Live Updates
Vipin Sharma

Typically, updating a host kernel requires live-migrating virtual machines (VMs) to other hosts. However, this approach isn't feasible for VMs that rely on GPUs or for large-scale Language Model (LLM) training clusters spread across numerous hosts, where migration is complex and disruptive.

To address this challenge, Google is developing a Live Update mechanism [1]. This feature allows devices assigned to VMs or the Virtual Machine Monitor (VMM) via VFIO (Virtual Function I/O) to remain operational even as the host transitions to a new kernel using Kexec.

VFIO PCI device preservation is the key enabling technology here. It ensures that a PCI device can continue its direct memory access (DMA) and interrupt operations without being reset while the host kernel undergoes a Kexec-based update. Achieving this requires significant modifications to the VFIO, IOMMU (Input/Output Memory Management Unit), and PCI subsystems.

This talk will delve into Google's approach to preserving VFIO PCI devices during live kernel updates and the challenges encountered during its development.

[1] https://lore.kernel.org/lkml/20250515182322.117840-1-pasha.tatashin@soleen.com/

Room 1
09:15
30min
Single-binary: Unify QEMU system binaries per target architecture
Pierrick Bouvier

QEMU has been historically designed for having a different binary for 
each target. Nowadays, with the advent of heterogeneous systems, it has 
become a barrier to be able to emulate those. As a first step, we have 
been working on the QEMU architecture to be able to build multiple
targets together in the same binary. This is what we call the ‘single binary’.

In this presentation, we'll introduce the approach we chose and the 
challenges we met on the road, from the build system to target code, 
going through a wide range of QEMU subsystems. Finally, we'll give you a 
status of this project, and what will be our next steps.

Room 2
09:45
09:45
30min
Upstreaming NVIDIA vGPU Support: Architecture, Implementation, and Roadmap
Zhi Wang

NVIDIA vGPU technology brings high-performance GPU capabilities to virtualized environments, supporting a wide range of workloads - from graphics-intensive virtual desktops to AI and data science applications. Enabling GPU resource sharing or exclusive assignment on physical GPUs deployed in cloud or enterprise data centers combines the performance benefits of NVIDIA hardware with the flexibility and manageability of virtualization.

Moving upstream, we propose a software architecture based on SR-IOV, where each vGPU is represented by a PCI Virtual Function (VF) managed through the standard Linux VFIO framework. The NVIDIA vGPU VFIO driver, implemented as a VFIO variant driver, exposes standard userspace interfaces and supports critical features such as vGPU type selection, runtime creation and teardown of vGPU instances, and live migration. At its core, the driver interacts with NVKM, a core driver responsible for managing hardware. The architectural goal is to let NVKM support the DRM for host graphics, other NVIDIA GPU use cases, and the VFIO driver for vGPU.

Attendees will gain insight into the design architecture and upstream changes. We will also share our upstream roadmap and areas where community input is most needed.

Room 1
09:45
30min
virtual secure boot in 2025 -- the confidential computing edition
Gerd Hoffmann

Roughly ten years ago secure boot support for virtual machines made
its debut. Available for x86 architecture and q35 machine type,
building on SMM emulation in qemu and kernel, essentially following
what physical hardware is doing.

Since then the world has moved forward, putting up a number of
challenges for secure boot support.

  • confidential computing - SEV-ES, SEV-SNP and TDX are by design
    incompatible with SMM emulation because the host has no access to
    guest register state (which is needed to emulate SMM context
    switch).

  • aarch64 platform - el3 aka secure world emulation (roughly
    compareable to SMM mode) is unlikely to happen anytime soon.

  • riscv64 platform - simliar to aarch64 (except it's named supervisor
    mode there).

  • CONFIG_KVM_SMM - kvm support for SMM emulation is optional now.
    Proposed by google at kvm forum, to reduce kvm complexity, was
    merged in 2022.

This will talk will discuss how secure boot can be supported without
depending on SMM emulation and it will present the work in various
projects (tianocore edk2, qemu, coconut svsm) to make that happen.

Room 2
10:15
10:15
30min
NVIDIA vGPU Support on Grace Blackwell Superchip: Architecture, Design, Upstreaming Status
Ankit Agrawal

The NVIDIA Grace Blackwell Superchip is a high-performance, ARM-based server platform designed for datacenter applications. It features a unified, cache-coherent memory subsystem that optimizes CPU-GPU interactions, facilitating efficient resource allocation. The system enables coherent memory access between the CPU and GPU via an NVLINK-based chip-to-chip interconnect, providing a unified memory view and allocation control at the OS level. GPU memory poison errors are managed through CPU firmware, while Address Translation Services (ATS) support allow a shared virtual address space between CPU and GPU.

NVIDIA vGPU extends these advanced capabilities to virtualized environments, enabling multi-tenancy and efficient GPU resource sharing across multiple virtual machines (VMs). Leveraging Multi-Instance Graphics (MIG), vGPU partitions GPUs into secure instances for independent VM assignment. Additionally, vSMMU support and PASID ensure process isolation within virtualized environments.

This presentation explores the system architecture of Grace Blackwell, detailing the design and implementation of vGPU to support these new platform-specific features. We will also discuss the status of the ongoing upstreaming efforts.

Room 1
10:15
30min
The State of QEMU WebAssembly Port
Kohei Tokunaga

QEMU's system emulator has recently merged initial support for Emscripten-based cross-compilation to WebAssembly (Wasm) in its 32bit TCI mode. Since Wasm is a binary format widely supported by modern browsers, this enhancement enables QEMU to run directly within the browser, opening up new use cases such as web-based playgrounds.

In this talk, Kohei will discuss this feature and its implementation. He'll also share the current status of ongoing discussion, including support for 64bit guests, a Wasm-based TCG backend and broader device support.

Room 2
10:45
10:45
30min
Coffee break
Room 1
10:45
30min
Coffee break
Room 2
11:15
11:15
30min
Improving Windows Hypervisor-Protected Code Integrity (HVCI) Performance on KVM
Jon Kohler, Sergey Dyasli

Enabling Windows HVCI on KVM currently poses significant performance challenges due to missing hardware acceleration enablement. This talk will briefly cover the value of HVCI, why Microsoft wants this enabled by default in Windows 11 and Server 2025, and provide details on our proposed KVM improvements to leverage hardware acceleration from both Intel and AMD.

Preexisting hardware acceleration support exists in the form of both Intel Mode Based Execute Control (MBEC) and AMD Guest Mode Execute Trap (GMET). Exposing these processor capabilities requires targeted modifications to KVM MMU and vendor CPU feature enablement code. In addition to implementation details, we’ll be providing detailed performance benchmarks for the current state and observed performance improvements.

Room 1
11:15
30min
The next generation QEMU functional testing framework
Thomas Huth, Daniel Berrange

In the course of the past year, the functional tests of the QEMU project have completely been rewritten: Instead of using the Avocado test runner and its libraries, the tests have been adapted to the meson test runner with newly implemented, more lightweight library functions instead. This talk will show why this huge effort has been made, and talk about the hurdles and design decision that we took to get to the final goal.

Room 2
11:45
11:45
30min
Hybrid KVM/Hyper-V guest
Mickaël Salaün

Virtual Secure Mode is a Hyper-V mechanism to enforce restrictions on a VM (VTL0) thanks to a dedicated sidecar VM (VTL1). This enables guest kernels to drop privileges and limit attackers' ability to get full kernel rights.

KVM is gaining VSM support with the Hyper-V emulation layer. We're working on creating a hybrid KVM guest that could use some Hyper-V hypercalls, especially those related to VSM. We'd like to talk about our approach to creating this hybrid guest.

Room 1
11:45
30min
Making io_uring pervasive in QEMU
Stefan Hajnoczi

In 2019 Linux introduced io_uring as an asynchronous I/O interface that minimizes system call overhead. Since then io_uring has expanded beyond file I/O to become a general-purpose asynchronous system call interface. This presentation discusses recent changes and the next steps for QEMU's io_uring support.

As more Linux kernel features are exposed through io_uring, QEMU components will increasingly need to call it. This led to the development of the new QEMU aio_add_sqe() API that allows custom io_uring operations to be submitted and integrates with QEMU's event loop.

Making io_uring accessible in the event loop also led to enabling io_uring-based file descriptor monitoring in QEMU's event loop. Instead of using ppoll(2) or epoll(7) to wait for events, io_uring can drive the whole event loop.

Come find out about the challenges and performance of these changes, as well as use cases for io_uring in QEMU. This talk is for developers interested in using io_uring themselves in QEMU as well as anyone learning more generally about how applications can take advantage io_uring.

Room 2
12:15
12:15
90min
Lunch
Room 1
12:15
90min
Lunch
Room 2
13:45
13:45
30min
NeVer again: the last KVM/arm64 rewrite?
Marc Zyngier

Nested Virtualisation (NV) support for KVM/arm64 is expected to go live in Linux v6.16, should everything work according to plan.

Although an initial patch series had been maintained out of tree since 2017, its level of complexity was too high (and admittedly quality too low) to be seriously considered a merge candidate.

It took some effort to significantly refactor KVM/arm64 to a point where the NV support would be maintainable by drastically reducing its complexity, while ensuring the changes would will benefit non-NV setups. It also took time for the architecture to reach a point where supporting NV in KVM was actually worth the effort.

This isn't the first time KVM/arm64 undergoes a major redesign. But this instance radically changes the way new architectural features are introduced to the hypervisor. This has been achieved in part by using the ARM Architecture Machine Readable Specification (AARCHMRS), which was recently released under a permissive license. This allowed the modelling of a sizeable chunk of architectural behaviour. Not only does this ensure compliance with the specification, it also helps find issues with it.

This talk will describe why such a formalism was needed, how it has been put to a good use, what other challenges were tackled to get to this point, and what remains to be done.

Room 1
13:45
30min
rust-vmm: updates, adoption, and future directions
Stefano Garzarella, Patrick Roy, Ruoqing He

It has been several years since the last rust-vmm update at KVM Forum, but the community has continued to grow. Our goal remains the same: to provide reusable Rust crates that make it easier and faster to build virtualization solutions.

This talk will present the main progress and achievements from the past few years. It reviews how rust-vmm crates integrate into projects such as Firecracker, Cloud Hypervisor, libkrun, and virtiofsd. We will cover recent work supporting new architectures like RISC-V and additional operating systems. The talk will also discuss plans to consolidate all crates into a single monorepo to simplify development and releases. Finally, we will review the support for virtio and vhost-user devices that can be used by any VMM.

Room 2
14:15
14:15
30min
IOMMU in rust-vmm, and new FUSE+VDUSE use cases
Hanna Czenczek, Eugenio Pérez

We’ll give an overview over the IOMMU model in vhost-user and efforts to integrate support in the rust-vmm ecosystem. Doing so requires changes to the memory model and implementing the vhost-user protocol part, so is an effort across various crates in the ecosystem, from vm-memory up to vhost-user-backend.

Presenting these changes and why they’re necessary will also give general insight into how all of these crates even work together in the first place, which we hope will serve as a good introduction to the ecosystem.

Adding IOMMU capabilities to these crates also enables interesting use cases, especially related with VDUSE exposed through vhost vdpa and virtio vDPA. This allows exposing vhost-user devices to containers through a vhost-user to VDUSE bridge.

Talking about combining virtiofs and VDUSE, there is another interesting combination: To expose FUSE filesystems through VDUSE. Again, this allows the existing (and varied!) ecosystem of FUSE apps
to be exposed to containers and VMs, without the need to modify any of the FUSE app, the guest, or the containerized app.

Room 2
14:15
30min
PPaPaarraraallllelelelll vCPU onlining for arm64
Will Deacon

CONFIG_HOTPLUG_PARALLEL was introduced to the kernel to enable parallel booting of CPUs, primarily to accelerate the application of microcode updates on x86. However, much of the logic driving the onlining is implemented in core code and so this talk will cover the grotty details of enabling it for arm64 and reveal whether or not it can accelerate the onlining of vCPUs under KVM.

Room 1
14:45
14:45
30min
Optimizing vPMU on ARM
Colton Lewis

KVM's current vPMU implementation on ARM traps and emulates the PMU in entirety. This is a significant cause of overhead for any use of performance monitoring capabilities inside a guest.

This talk will explain my work over the past several months to improve the matter. [1] Relying on modern ARM CPU features such as PMUv3 and FGT (fine grain traps), it becomes possible to selectively untrap the most common PMU registers and features to allow guests direct hardware access to cut the overhead and significantly improve performance. A more detailed explanation with some notable performance improvements can be found in my cover letter on the kvmarm mailing list.

[1] https://lore.kernel.org/kvmarm/20250602192702.2125115-1-coltonlewis@google.com/

Room 1
14:45
30min
Virtio 2025 state of the union
Michael S. Tsirkin

A lot has happened in virtio land in the last year - new faces, new devices, new drivers, new functionality.
There's new work on testing, and a lot more!
This will give an overview of where we are and what to expect in 2026 and beyond.

Room 2
15:15
15:15
30min
Rust firmware for EFI direct kernel boot on mach-virt/arm64
Ard Biesheuvel

Superfast boot is important for micro-VMs, and this is usually accomplished by booting the kernel directly from the VMM, rather than going through the usual firnware and bootloader. EFI is typically avoided in these cases, as it has a reputation for being slow and buggy on x86.

On arm64, the situation is a bit different: without firmware, the kernel is entered with MMU and caches disabled, which poses its own set of problems. And without EFI, accessing ACPI and SMBIOS tables is problematic as well.

This talk describes an alternative proposal for doing direct kernel boot on arm64 virtual machines: a minimal re-implementation of EFI in Rust, tightly coupled with QEMU to boot the guest in kernel in EFI mode with all caching and memory protections enabled from reset. I will explain why it is faster and more secure, and results in less maintenance overhead than the non-firmware case.

Room 1
15:15
30min
Towards new migration protocol with unified channels
Prasad Pandit

QEMU live migration moves a running virtual machine from one host to another. While the basic concept of live migration is fairly simple, there is a lot of complexity in the current implementation. Current implementation has evolved over many years with different features added at different times to serve specific migration needs, while migration lost its place as one unit. Consequently, we now have limitations like TCP connections (aka channels) are uni-directional, they come up and shut-down asynchronously while migration is running, multifd migrates only RAM state, Postcopy can not use multifd channels etc.

To make it all work in practice, additional coordination is required between QEMU and management layer like Libvirtd(8). Features (eg postcopy-preempt) available in QEMU may not be usable from virsh(1)/libvirtd(8) side, because they need to be taught to handle these new features.

In this session, we'll look at these implementation details and discuss possible way(s) to improve things through a robust migration protocol which could accommodate all of the current requirements and allow for future enhancements, while keeping the overall architecture simple and intuitive.

Room 2
15:45
15:45
30min
Coffee break
Room 1
15:45
30min
Coffee break
Room 2
16:15
16:15
30min
BoF sessions
Room 1
16:15
30min
BoF sessions
Room 2
09:00
09:00
10min
Keynote (QEMU)
Room 1
09:15
09:15
30min
Automatic Frontend Generation for RISC V Extensions
Anton Johansson

QEMU is an extremely useful tool during testing and development of new architectures, yet adding support for new targets is error prone and incurs a significant entry cost in terms of learning QEMU internals. Especially so when keeping up with an evolving ISA specification.

We present our methodology for rapidly implementing and testing Qualcomms qc_iu set of RISC V extensions, in the absence of a compiler toolchain. As a first step, C++ code and later LLVM IR was produced from instruction definitions provided by riscv-unified-db. Secondly, the LLVM based helper-to-tcg tool was used to generate TCG implementations for 143/172 instructions. Usage of helper-to-tcg enables a emulator-in-the-loop process of designing instruction set extensions, good for rapid prototyping, validation and design space exploration

Automatic generation of per-instruction tests covering memory operations, branches, and corner cases, was accomplished with the LLVM IR based symbolic execution engine KLEE. All in all, 289 tests were generated covering 143 instructions, for each version of the ISA specification. This proved incredibly useful in finding bugs in the original instruction definitions.

This is a follow up to our 2023 KVM forum talk, where we successfully applied helper-to-tcg to the Hexagon frontend. Since then, the tool has evolved significantly, allowing it to be applied in more general settings.

Room 1
09:15
30min
guest_memfd: Unmapped Potential
Fuad Tabba, Ackerley Tng

guest_memfd: Unmapped Potential

The guest_memfd interface was introduced to support hardware-based confidential computing by creating guest memory that is entirely not mappable by the host nor accessible by the host userspace, offering protection against a compromised or buggy host. While effective for its initial purpose, this strict isolation prevents its use for a broader set of virtualization use cases. The current implementation limits utilization by non-confidential computing guests. Also, it lacks the ability to convert memory between private and shared states in-place, which introduces unnecessary work when used to provide memory for software-based confidential computing solutions like pKVM [1]. Furthermore, this design makes adding huge page support difficult without incurring significant memory overhead [2].

This presentation will cover new developments, expected to be merged upstream before the conference [3], that extend the capabilities of guest_memfd and move it from a specialized feature toward a universal API for KVM guest memory. The core of this effort involves carefully allowing guest_memfd-backed memory to be mapped in the host under specific, controlled conditions, which unlocks several new capabilities. We will present the mechanism that enables guest_memfd to back standard, non-confidential VMs, which allows additional hardening against potential host-side transient execution attacks.

Building on this foundation, we will give an overview of the ongoing development to support in-place conversion between private and shared pages within a single guest_memfd region [4]. This is a key requirement for software-based confidential computing solutions and also serves as the enabling technology for efficient huge page support. The talk will explain how these extensions work together to make guest_memfd a more flexible and powerful tool for managing guest memory, paving the way for it to become the primary memory backing interface for all guests in KVM.

[1] https://lpc.events/event/18/contributions/1758/
[2] https://lpc.events/event/18/contributions/1764/
[3] https://lore.kernel.org/all/20250605153800.557144-1-tabba@google.com/
[4] https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google.com/

Room 2
09:45
09:45
30min
Lorelei: Enable QEMU to leverage native shared libraries
Ziyang Zhang

We extend the translation flow of QEMU, so that when the control flow initiates the call request of a dynamic library function in the executable file, it can be transferred to the native dynamic library of the same version, and correctly return after executing the expected procedure. Based on our test data in several common applications, QEMU emulation can be improved several or even tens of times faster after applying a hybrid-execution scheme, and exhabits FPS improvement in GUI applications.

Room 1
09:45
30min
guest_memfd for Non-Confidential VMs and Spectre Protection
Patrick Roy

guest_memfd, introduced in Linux 6.8, receives a lot of attention in the context of confidential computing, with KVM support for Intel TDX, AMD SNP, ARM CCA and pKVM being built on top of it, where guest_memfd manages the VM’s encrypted/private memory. However, its design as “guest-first” memory also makes it attractive to for traditional, non-confidential VMs that wish to enjoy additional hardening against Spectre-style transient execution issues.

In this talk, we cover how guest_memfd with support for shared memory 1 can be used to run non-confidential VMs solely backed by guest_memfd. We further explore how this mode can be extended by removing direct map entries for guest_memfd folios 2, protecting guest memory from ~60% of Spectre-like transient execution issues, and how we plan to utilize this functionality in the Firecracker VMM.

Room 2
10:15
10:15
30min
QEMU Time Control Redefined: What’s the Time, Mr. Wolf?
Mahmoud Kamel, Alwalid Salama, Mark Burton

Historically, QEMU has supported two distinct methods for timekeeping: traditional wall-clock time and instruction counting via the icount mode. While icount enables deterministic simulation by advancing time based on the number of instructions executed, it comes with notable limitations. Chief among them is the loss of multithreaded execution — icount disables MTTCG, forcing QEMU to run all CPUs on a single thread. This drastically reduces simulation speed and introduces ambiguity when interpreting instruction counts across multiple CPUs.

The fundamental problem is this: icount provides a raw instruction count across all CPUs, not on a per-CPU basis. Until now, there’s been no way to derive meaningful time metrics from icount in a multi-core context (whether multithreaded or not).

A New Approach: TCG Plugin API to the Rescue
Enter the new TCG plugin API. While QEMU ships with a basic example of instruction-based timing using this API, it oversimplifies the problem. This talk introduces a more advanced, practical approach that uses the TCG plugin API to redefine QEMU's time model for better realism and scalability.

Key Mechanism: Independent Per-CPU Time and Global Time Coordination
The proposed mechanism leverages two key features of the TCG plugin API:
- Scoreboards: To track execution progress across CPUs
- Timeouts: To trigger plugin callbacks after a CPU executes a certain number of instructions

Each virtual CPU (vCPU) maintains its own local clock, which increments based on:
- A configured instruction rate (insn_per_second)
- The number of instructions it executes (quantum_insn)

Meanwhile, the global QEMU time is coordinated through a concept called the active token. The vCPU holding the active token is responsible for advancing global time. As vCPUs hit their instruction quantum (end_of_quantum) or go idle, they update their local clocks. If the active token goes idle, the plugin designates the next most active vCPU to take over time progression.

Advantages of This Model
- Realistic Instruction-Based Timing: Time progresses according to the activity of the most active vCPU, not a summed instruction count.
- Multithreaded Support: Each vCPU can be treated independently, maintaining MTTCG compatibility.
- Idle-Aware Timekeeping: Idle vCPUs are excluded from time advancement. If all CPUs go idle, the system smoothly reverts to wall-clock time.
- Modular and Extendable: Implemented as a plugin (icount_plugin), this mechanism cleanly integrates into QEMU without core architectural changes.

Room 1
10:15
30min
RISC-V pKVM
Radim Krčmář

The RISC-V pKVM (Protected KVM) draws its name and core design ideas from the Arm pKVM, enabling confidential virtual machines by leveraging "existing" RISC-V hypervisor extensions.

The talk first describes how the initialization process deprivileges Linux into a virtual machine, ensuring that pKVM is executing exclusively in the hypervisor mode. With the untrusted part of the system securely isolated, the discussion shifts to the binary interfaces that cross architectural boundaries to enable confidential virtual machines: userspace ABI, guest SBI, and hypervisor SBI.

The hypervisor SBI is reframed as an internal kernel API, giving it flexibility without the burden of compatibility. Another important reason to develop pKVM was the potential for code reuse with both the RISC-V KVM and other Protected KVM solutions -- the talk explores the extent to which the potential has been fulfilled, and why pKVM is not written in Rust (yet?).

And for those who enjoy waiting for RISC-V, the talk will tease a different KVM-based solution utilizing upcoming ISA extensions.

Room 2
10:45
10:45
30min
Coffee break
Room 1
10:45
30min
Coffee break
Room 2
11:15
11:15
30min
From C to a Rust interface, brick by brick
Zhao Liu, Paolo Bonzini

QEMU's Rust adventure began with direct use of the C-style interfaces generated by bindgen: this first prototype, merged not long after last year's KVM Forum, focused on build system integration and set the stage for a long journey creating safe Rust interfaces for QEMU. In this talk we will explain the process of distilling the invariants that were required and promised by QEMU's C code, and how we mapped them to concepts such as interior mutability and smart pointers. We will present the path that was followed over the past year, how we converted QEMU's HPET device model to readable Rust code, and how various ideas and components from the Rust ecosystem help bridging the Rust and C codebases.

Room 1
11:15
30min
Libkrun Meets ARM Confidential Computing Architecture — No Hardware Required (for Now ;))
Matias Vara Larsen

Libkrun is a lightweight virtual machine monitor written in Rust, used in contexts like Podman to securely run workloads in micro-VMs. In this talk, we present our ongoing work to bring support for ARM's Confidential Computing Architecture (CCA) to libkrun. Confidential computing enables strong isolation between the guest and the host by encrypting memory and CPU state, preventing the host from inspecting or modifying sensitive data. CCA, along with AMD SEV-SNP and Intel TDX, extends this model to the ARM world. Memory is encrypted, access violations trigger exceptions, and attestation mechanisms let guests verify they are running in a trusted environment. To develop this support, we’ve built on top of ARM’s FVP simulator, which allows us to test and iterate rapidly. While guest-side support for CCA is already upstreamed, kernel support (KVM) is still under review. We’ll walk through the design, the integration with virtee/cca, and demonstrate how libkrun can already launch a confidential ARM guest. Finally, we’ll cover what’s left — particularly attestation — and where we go from here.

Room 2
11:45
11:45
30min
Physical memory allocation constraints for Confidential Computing guests
Quentin Perret

Running confidential computing (CoCo) payloads on arm64 mobile platforms presents unique challenges due to a wide spectrum of hardware constraints and vastly different power/performance characteristics. Some devices feature non-translating Stage-2 IOMMUs or IOMMUs with reduced addressing capabilities, while others have constraints stemming from their TrustZone implementation. Furthermore, many are very sensitive to Stage-2 page-table fragmentation, whether on the CPU side, DMA side, or both. The emergence of CoCo in the mobile space also brings new use-cases with demanding power and performance requirements.

In this talk, we will first detail these specific problems, explaining how mobile hardware nuances impact the deployment of confidential computing. Secondly, we will formulate a proposal on how to approach these challenges. A core part of the proposal involves physical memory allocation constraints on the memory backing CoCo guests as well as hypervisor data structures. We believe many of these issues can be significantly mitigated through this approach. This session will initiate a discussion on the best way to express these allocation constraints, ideally by extending existing infrastructure such as guest_memfd and dmabuf.

Room 2
11:45
30min
Rust in QEMU: strengths and challenges
Manos Pitsidianakis

QEMU 9.2.0 was released in December 2024 with experimental Rust support. By now the strengths and challenges of writing QEMU code in Rust have become more apparent. This talk will summarize the whys and hows of Rust for QEMU development, tailored for people who are interested in getting involved.

We will briefly go over the following topics:

  • We know Rust has memory safety. How much memory safety is achievable in QEMU and why (not)?

Machines and devices need to interact with internal APIs and Rust is no exception. This means we cannot make fundamental assumptions that make our lives easier, such as "the code that calls my function obeys the same static analysis my code does". The borrow checker is not immediately valuable when C code does not utilise it.

We can create abstractions that leverage safe code, for example we can ensure exclusive access with locks/interior mutability or by simply trusting API contracts. We will demystify the unsafe keyword and understand how it helps in allowing safe code.

  • What are the steps for onboarding a device/subsystem to Rust?

We will show how to declare code to Meson, how to use external dependencies, how to generate required C bindings with bindgen and most importantly, how to write safe wrappers for them in qemu-api crate library.

  • Rust language and ecosystem idioms/practices/abstractions we can use.

Rust can heavily utilize boilerplate code generation with procedural and declarative macros: we can use them to safely bridge our code with QEMU APIs, such as qdev.

Rust can statically check that invariants hold: we will see how to do that with exhaustive pattern matching, strong typing, dead code lints, state machines, builder pattern.

  • Potential future work ideas (such as QEMU internals and not just device models, async + executors, custom QEMU-specific lints, etc)

If time allows, we will also show a simple "Hello world" device implementation.

Room 1
12:15
12:15
90min
Lunch
Room 1
12:15
90min
Lunch
Room 2
13:45
13:45
30min
Exploring VM placement strategies for chiplet architectures
Shaju Abraham, Het Gala, Shivam Kumar, Soham Ghosh, Gulshan Gabel

Modern processors are increasingly adopting chiplet-based architectures that distribute CPU cores across multiple chiplets, each containing one or more core complexes (CCX) with a shared last-level cache. Inter-chiplet communication penalties can significantly degrade workload performance. While the current Linux scheduler has NUMA awareness and NUCA (non-uniform cache access) awareness through its scheduling domain hierarchy, it lacks adequate consideration for the significantly higher inter-CCX communication penalties inherent in chiplet architectures. This leads to suboptimal VM placement and therefore, degraded performance.

This talk proposes an enhanced VM scheduling framework designed for chiplet-based processors.
1. The framework utilizes lightweight monitoring techniques, such as hardware performance counters, to monitor cache efficiency, memory access patterns and inter-chiplet communication metrics. These insights help formulate informed policies for VM placement and vCPU group migrations.
2. The framework implements intelligent VM vCPU group placement strategies that optimize the initial allocation of vCPU groups and their associated contexts, such as vhost and other datapath threads, across chiplet boundaries. The algorithm balances maximizing chiplet locality while minimizing intra-cache contention. The algorithm also studies the dynamic behaviour of VM placements in overloaded cases.
3. These performance measurements guide the assignment of optimal virtual topologies to guests, improving performance through chiplet-locality-aware decisions starting at the guest scheduler level.

Room 1
13:45
30min
Towards Reliable Timekeeping in COCONUT-SVSM
Vaishali Thakkar

COCONUT-SVSM currently lacks a reliable, monotonic timer source, which is essential for supporting trusted services. In SEV-SNP guests, the TSC is the only trusted time source, but its actual frequency may slightly differ from the nominal P0 frequency due to spread spectrum clocking. This small deviation can lead to clock drift over time. This talk explores adding SecureTSC support to COCONUT-SVSM to establish a safe timer foundation, and proposes integrating the KVM clock to improve accuracy -- inviting discussion on how best to ensure reliable timekeeping in SVSM.

Room 2
14:15
14:15
30min
COCONUT SVSM: From Persistent State to New Trusted Services
Oliver Steffen, Stefano Garzarella

Following last year’s presentation on persistent state in COCONUT SVSM, a platform for delivering secure and trusted services to Confidential Virtual Machines (CVMs), this talk will highlight the progress made in implementing key services such as a stateful vTPM and a UEFI variable store. We’ll also discuss upcoming features under consideration, including a secure console, log buffering, enhanced debugging capabilities, and support for live migration. If you’re interested in these features or have ideas for additional services, we invite you to join the discussion.

Room 2
14:15
30min
GiantVM: A Many-to-one Virtualization System Built Atop the QEMU/KVM Hypervisor
Xiong Tianlei, stx

We propose GiantVM, a many-to-one virtualization framework built atop QEMU/KVM. GiantVM consolidates multiple physical servers into a unified virtual machine. We extend QEMU/KVM to enable inter-machine forwarding of I/O and interrupts, allowing multiple physical machines to communicate over the network and aggregate CPU and I/O resources. In addition, we implement a distributed shared memory protocol by leveraging the EPT to support memory synchronization across machines.

Our implementation is based on QEMU 9.0 and the Linux 6.6 LTS kernel, and is capable of successfully running operating systems such as Ubuntu. We are currently exploring further enhancements leveraging the emerging Compute Express Link (CXL) interconnect.

Room 1
14:45
14:45
30min
Attesting Confidential Devices and Provisioning Secure Workload Identities with Trustee
Tobin Feldman-Fitzthum

Trustee is an attestation and resource management service for confidential guests. This talk will cover a year of Trustee development and highlight the features that are on the horizon. The two most significant areas of development and discussion are attesting CVMs with confidential devices attached to them and provisioning identities to confidential guests. While these topics have been been a stumbling block in the past, we have made big steps forward. For confidential devices, the first iteration of Trustee support allows us to attest confidential VMs that have devices like the NVIDIA H100 attached via cold-plug. This talk will describe how this is implemented and show the plan for generalizing this to TDISP devices.

The second area, confidential identity, is one of the most subtle parts of confidential computing. This talk will classify why it is so difficult to reason about the identity of a confidential guest and show how we are finally adding an identity system to Trustee.

Room 2
14:45
30min
Shadow ioeventfd: Accelerating MMIO in vfio-user with Kernel-Assisted Dispatch
Thanos Makatos, John Levon

Efficient handling of MMIO and doorbell updates is essential for achieving high performance in virtualized I/O. The proposed Linux ioregionfd interface reduces context switches and overhead by enabling direct, file descriptor–based dispatch of MMIO operations, bypassing the traditional need to exit to userspace—typically QEMU, via the KVM_RUN loop. In this talk, we present a different approach inspired by this idea, tentatively called shadow ioeventfd, implemented in the vfio-user protocol. Shadow ioeventfd introduces a shared memory region, separate from the guest-visible BAR, allowing guest writes to be handled entirely within the kernel using eventfd signaling. We discuss how we implemented this in libvfio-user with minimal changes to QEMU and the kernel, and how it integrates with SPDK-based NVMe emulation. We also share performance results demonstrating significant improvements in latency and CPU utilization, up to 200%, compared to traditional userspace emulation, which is especially important for Windows as they lack shadow doorbells.

Room 1
15:15
15:15
30min
Coffee break
Room 1
15:15
30min
Coffee break
Room 2
15:45
15:45
30min
Arm and QEMU cpu models - where are we right now?
Cornelia Huck, Sebastian Ott

We previously talked about Arm cpu models at KVM Forum 2023, so now it is a good time to summarize the progress we have made so far, where thought is still needed, and how we can continue.

We will demonstrate examples of what is already working (and what is not) with the code available as of today, where the main gaps and points of contention are, and what could be possible directions: how the QEMU command line should be modeled, how the needs of management software such as libvirt could be met, and which combinations of systems are actually reasonable to focus on.

Join this talk to hear about guests being moved between different machines, fun with debug registers and other hard to virtualize registers, and how we can try to make Arm not completely different from x86.

Room 1
15:45
30min
Shared device assignment: the groundwork of direct I/O in confidential VMs
Chenyi Qiang

Shared device assignment, also known as bounce buffer device assignment, refers to the capability of assigning a hardware PCI device to a confidential VM where that device can issue DMA to shared/unprotected memory. This can improve I/O performance of confidential VMs, offering he benefits similar to normal VMs.

In addition to serving as a transitional solution for the Trusted Execution Environment (TEE) I/O which allows the device can issue DMA to private memory, shared device assignment lays the groundwork for comprehensive TEE I/O implementation. For instance, some TEE I/O technologies (like TDX connect) rely on the ability to manage devices using shared memory during initialization and error recovery scenarios.

In this session, we will introduce the basic support for shared device assignment. Additionally, we will clarify the future expansion directions, starting with the relationship to some ongoing projects. This includes handling partial unmap situations through the support of cut mapping in IOMMUFD, and changes in the conversion path brought about by the new guest_memfd in-place conversion direction modifications. Furthermore, the RamDiscardManager framework used in the basic implementation of QEMU lacks scalability. In the future, to support more functionalities and state management (like virtio-mem/live migration in confidential VMs), a new framework will be necessary.

Room 2
16:15
16:15
30min
Supporting SEV firmware hotload in KVM
Ashish Kalra, David Kaplan

SEV firmware can be updated dynamically while SNP guests are running, which is desired by cloud providers to provide better service to their customers when performing security or functionality updates. This talk focuses on the changes needed within Linux/KVM to provide this support, including the ability to update or rollback firmware, and the effects on attestation reporting. Additionally, new versions of firmware can bring new feature support, and this talk will discuss how this is identified and how it can be taken advantage of.

Room 2
16:15
30min
Windows on Arm on QEMU/KVM: Challenges and Solutions
Akihiko Odaki

Microsoft released an RTM build of Windows on Arm last year on their website, and Linaro provides instructions for running it on QEMU/KVM. Now we can run Windows on Arm on QEMU/KVM flawlessly, or can we?

Despite basic configuration working with TCG, experiments on Asahi Linux revealed the reliability and functionality of a Windows VM on Arm proved to be far from par with Windows on x64 or Linux on Arm. Key issues included:
- QEMU and KVM struggled with PMU (Performance Monitoring Unit) emulation, a critical requirement for Windows.
- The virtio-gpu graphics driver, essential for features like high and variable display resolution, frequently crashed.
- The SPICE guest agent, necessary for features such as clipboard sharing, failed to function.

These hurdles necessitated multiple patches to update the entire virtualization stack. This presentation will demonstrate how these changes not only enhance the Windows on Arm experience but also improve Windows guest and Arm virtualization experiences overall. Lastly, I'll share insights gained from bringing up such an exotic platform and discuss future work.

Room 1
16:45
16:45
15min
Closing session
Room 1
16:45
15min
Closing session
Room 2