Johanna Goergen PyCon DE & PyData 2026

Johanna Goergen
.ical

I'm a Staff Research Data Engineer in the Research Department of DeepL, working on platform-level tooling for scaling data pipelines to petabyte scale. I have been part of the initiative to adopt Rust in critical components used for model training, and I'm looking forward to sharing this experience with you.

GitHub:

https://github.com/jo-migo

Session

04-15

15:00

45min

Scaling Data Processing for Training Workloads at DeepL Research with Rust

Jonas Dedden, Johanna Goergen

This talk will detail how we used Rust to solve a number of resource utilization inefficiencies while scaling data pre-processing to a petabyte scale and enable next-generation model training at DeepL. Besides other factors, this was done by developing an internal library for interacting with Parquet files in a memory efficient nature.

Topics include:
• Convincing you to love Rust for its memory safety
• Comparing C++ and Rust ecosystems for Python library development
• Diving into Python-Rust interoperability
• Convincing you to love Rust for its user-friendly (yes, actually!) language features
• Providing a high-level overview of the continuously growing impact that Rust is having on the Arrow and data engineering ecosystem

General: Rust

Platinum [2nd Floor]

Johanna Goergen .ical

Session

Johanna Goergen
.ical