Heat: scaling the Python scientific stack to HPC systems PyCon DE & PyData 2026

Heat: scaling the Python scientific stack to HPC systems
.ical
2026-04-15 17:35, Europium [3rd Floor]

Python’s scientific stack (NumPy/SciPy) is often confined to single-node execution. When datasets exceed local memory, researchers face a steep learning curve, typically choosing between complex manual distribution or the overhead of task-parallel frameworks.

In this talk, we introduce Heat, an open-source distributed tensor framework designed to bring high-performance computing (HPC) capabilities to the scientific Python ecosystem. Built on PyTorch and mpi4py, Heat implements a data-parallel model that allows users to process massive datasets across multi-node, multi-GPU clusters (including AMD GPUs) with minimal code changes.

We will discuss the design and architecture enabling "transparent distribution":

Heat’s distributed n-dimensional array for data partitioning and communication under the hood;
The synergy of PyTorch as a high-performance compute engine and MPI for efficient, low-latency communication;
Scaling efficiency, encompassing both strong and weak scaling for memory-intensive operations;
Fundamental building blocks—from linear algebra to machine learning—re-implemented for distributed memory space.

Attendees will learn how to leverage the cumulative RAM of supercomputers without leaving the familiar NumPy-like interface, effectively removing the "memory wall" for large-scale scientific analytics.

Memory bottleneck in scientific computing (4 minutes)
- Limitations of single-node libraries
- Complexity of existing workarounds: trade-offs between manual MPI programming (high developer effort) and task-parallel frameworks
- The data-parallel alternative: performing uniform operations on distributed slices of a global tensor.

Architecture and implementation (8 minutes)
- The DNDarray structure: Technical breakdown of the distributed n-dimensional array, which provides a global logical view while managing local physical storage across MPI ranks.
- The split axis concept: How data is partitioned along specific dimensions (e.g., rows or columns) to optimize communication for different mathematical operations.
- Backend synergy:
- PyTorch as the compute engine for high-performance local tensor operations and GPU acceleration.
- mpi4py for communication in cluster environments.
- Hardware interoperability: Transparent execution across CPUs and GPUs, including NVIDIA (CUDA) and AMD (ROCm) accelerators.

Algorithmic building blocks for distributed memory (8 minutes)
- Communication-aware linear algebra: Distributed matrix-matrix multiplication and its communication costs. Advanced matrix decomposition methods, such as hierarchical and randomized SVD (hSVD), for massive datasets.
- Scalable machine learning and statistics: Example: clustering (K-Means) and Principal Component Analysis (PCA) on distributed arrays.
- Temporal analysis using Dynamic Mode Decomposition (DMD) on large-scale scientific data like global wind speeds.

Performance and scaling efficiency (7 minutes)
- Scaling methodologies: strong scaling (speedup for a fixed problem size) and weak scaling (efficiency as both problem size and resources grow).
- Memory wall removal: Utilizing the cumulative RAM of many cluster nodes to process datasets that are otherwise impossible to load.
- Case studies: Reviewing performance results from large-scale runs

Summary and project roadmap (3 minutes)
- Key takeaways
- Upcoming features
- Open-source community

Expected audience expertise in your talk's domain:: Intermediate Expected audience expertise in Python:: Intermediate Public link to supporting material, e.g. videos, Github::

https://github.com/helmholtz-analytics/heat

Claudia Comito

I work in the Large-Scale Data Science division at the Jülich Supercomputing Centre (JSC), and I lead the development of Heat, an open-source distributed tensor framework designed for high-performance data analytics. My work focuses on scaling scientific Python applications across multi-node, multi-GPU clusters.

My background is in astrophysics, I joined JSC in 2018 to co-design distributed analytics for scientific domains including aerospace and Earth system modeling. Since 2021, I have led the Heat project, focusing on technical user support, community growth, and project dissemination.

Thomas Saupe

Heat: scaling the Python scientific stack to HPC systems .ical 2026-04-15 17:35, Europium [3rd Floor]

Heat: scaling the Python scientific stack to HPC systems
.ical
2026-04-15 17:35, Europium [3rd Floor]