Jenya Bogacheva
Jenya’s journey into Python started in an unexpected place — studying African cultures and languages — and eventually led her to a Master’s in Machine Learning & Data Science. Today, she works on flight delay prediction at a company building tech for the aviation industry ✈️.
A digital nomad currently based in the pine forests of the Vietnamese highlands, she finds joy in birdwatching, doing yoga, and chasing the perfect cup of coffee ☕.
Session
Sometimes you inherit a clunky pipeline. Sometimes an LLM writes one for you. Either way, you’re stuck with something slow, memory-hungry, and hard to scale.
This talk is about what happens next — how to turn a naive tabular data pipeline into something fast, efficient, and scalable. You’ll get a guided tour through a zoo of optimization techniques: reducing algorithmic complexity, minimizing memory usage, improving I/O throughput, and swapping in Polars — a fast, Rust-based DataFrame library — in place of Pandas (for reasons beyond just hype). By walking through a real-world example step by step, you’ll see how each change makes an impact — and come away with a sharper eye for spotting similar bottlenecks or inefficiencies in your own pipelines.
The walkthrough is grounded in a real-world ML feature engineering task from the aviation industry. But in the spirit of spring, we’ll swap baggage belts for bird feeders — and reframe the problem through a birdwatcher’s lens, not by tracking airport operations, but by counting sparrows and mynas visiting my backyard feeder.