EuroSciPy 2024

Jacob Tomlinson

Jacob Tomlinson is a senior Python software engineer at NVIDIA with a focus on deployment tooling for distributed systems. His work involves maintaining open source projects including RAPIDS and Dask. RAPIDS is a suite of GPU accelerated open source Python tools which mimic APIs from the PyData stack including those of Numpy, Pandas and SciKit-Learn. Dask provides advanced parallelism for analytics with out-of-core computation, lazy evaluation and distributed execution of the PyData stack. He also tinkers with the open source Kubernetes Python framework kr8s in his spare time. Jacob volunteers with the local tech community group Tech Exeter and lives in Exeter, UK.


Institute / Company

NVIDIA

Homepage

https://jacobtomlinson.dev

Twitter handle

@_jacobtomlinson

Git*hub|lab

https://github.com/jacobtomlinson


Session

08-29
13:20
30min
Accelerating Python on HPC with Dask
Jacob Tomlinson

Dask is a popular Python framework for scaling your workloads, whether you want to leverage all of the cores on your laptop and stream large datasets through memory, or scale your workload out to thousands of cores on large compute clusters. Dask allows you to distribute code using familiar APIs such as pandas, NumPy and scikit-learn or write your own distributed code with powerful parallel task-based programming primitives.

In this session we will dive into the many ways to deploy Dask workloads on HPC, and how to choose the right method for your workload. Then we will dig into the accelerated side of Dask and how you can leverage GPUs with RAPIDS and Dask CUDA and use UCX to take advantage of accelerated networking like Infiniband and NVLink.

High Performance Computing
Room 6