2024-09-23 –, Hall A+B
Cloud computing, with its flexible resource allocation and large-scale data storage, provides an integrated underlying platform for the widespread application of AI, including large-scale model training and inference. However, being different from traditional applications, AI focuses more on heterogeneous computing, and building it on virtualization brings some new issues and challenges, including:
1. The PCIe P2P communication efficiency between GPUs or GPUs and RDMA NICs is crucial for large-scale model training and inference. However, in virtualization scenarios, there will be a serious performance degradation due to the enablement of IOMMU.
2. Various higher-precision (millisecond-level) monitoring agents are usually deployed in VMs to monitor something like PCIe bandwidth, network bandwidth, etc. We found that traditional PMU virtualization cannot fully meet such monitoring needs. And those monitoring agents can also result in a high number of VMEXITs due to frequent PIO and RDPMC operations.
To address these challenges, this topic proposes a set of solutions, such as avoiding P2P TLPs being redirected to IOMMU and passthroughing core and uncore PMUs to the guest, to bridge the gap on AI infrastructure between virtualized and bare-metal environments.
I am currently working at ByteDance, with a primary focus on GPU virtualization and GPU driver development
Bytedance Virtualization Engineer.