2024-11-16 –, LT9
Language: English
With the ever-growing data size and the increasing complexity of data science workflows, high-performance computing becomes crucial for data scientists to tackle real-world problems. Attendees will learn to leverage RAPIDS projects with GPUs to accelerate and scale up scikit-learn and XGBoost model training workflows.
This talk will explore GPU acceleration beyond deep learning models and provide an overview of GPU-accelerated data science workflows. Python’s rich ecosystem has made it one of the most popular programming languages today. RAPIDS offers a suite of open-source Python libraries and primitives to accelerate core data science libraries, including pandas, scikit-learn, and NetworkX, without requiring any code changes. Additionally, the latest XGBoost integrates with RAPIDS to deliver a fully accelerated model training experience. We will demonstrate how to enable a GPU-accelerated end-to-end pipeline for training scikit-learn and XGBoost models, highlighting the significant speed improvements for various scikit-learn estimators. Then, we will delve into new features that facilitate scaling XGBoost using the latest NVIDIA Grace Hopper superchip to handle large datasets. We can discuss some details about the implementation and share our experience with GPU acceleration. Finally, we will outline our roadmap for future developments.
Engineer, RAPIDS, NVIDIA