PyCon DE & PyData 2026

Claudia Comito

I work in the Large-Scale Data Science division at the Jülich Supercomputing Centre (JSC), and I lead the development of Heat, an open-source distributed tensor framework designed for high-performance data analytics. My work focuses on scaling scientific Python applications across multi-node, multi-GPU clusters.

My background is in astrophysics, I joined JSC in 2018 to co-design distributed analytics for scientific domains including aerospace and Earth system modeling. Since 2021, I have led the Heat project, focusing on technical user support, community growth, and project dissemination.


Session

04-15
17:35
30min
Heat: scaling the Python scientific stack to HPC systems
Claudia Comito, Thomas Saupe

Python’s scientific stack (NumPy/SciPy) is often confined to single-node execution. When datasets exceed local memory, researchers face a steep learning curve, typically choosing between complex manual distribution or the overhead of task-parallel frameworks.

In this talk, we introduce Heat, an open-source distributed tensor framework designed to bring high-performance computing (HPC) capabilities to the scientific Python ecosystem. Built on PyTorch and mpi4py, Heat implements a data-parallel model that allows users to process massive datasets across multi-node, multi-GPU clusters (including AMD GPUs) with minimal code changes.

We will discuss the design and architecture enabling "transparent distribution":

  • Heat’s distributed n-dimensional array for data partitioning and communication under the hood;
  • The synergy of PyTorch as a high-performance compute engine and MPI for efficient, low-latency communication;
  • Scaling efficiency, encompassing both strong and weak scaling for memory-intensive operations;
  • Fundamental building blocks—from linear algebra to machine learning—re-implemented for distributed memory space.

Attendees will learn how to leverage the cumulative RAM of supercomputers without leaving the familiar NumPy-like interface, effectively removing the "memory wall" for large-scale scientific analytics.

PyData: PyData & Scientific Libraries Stack
Europium [3rd Floor]