Johanna Goergen
I'm a Staff Research Data Engineer in the Research Department of DeepL, working on platform-level tooling for scaling data pipelines to petabyte scale. I have been part of the initiative to adopt Rust in critical components used for model training, and I'm looking forward to sharing this experience with you.
Session
This talk will detail how we used Rust to solve a number of resource utilization inefficiencies while scaling data pre-processing to a petabyte scale and enable next-generation model training at DeepL. Besides other factors, this was done by developing an internal library for interacting with Parquet files in a memory efficient nature.
Topics include:
• Convincing you to love Rust for its memory safety
• Comparing C++ and Rust ecosystems for Python library development
• Diving into Python-Rust interoperability
• Convincing you to love Rust for its user-friendly (yes, actually!) language features
• Providing a high-level overview of the continuously growing impact that Rust is having on the Arrow and data engineering ecosystem