What Do You Get When You Combine a Snake and A Squirrel? - Building a Python Data Pipeline with Apache Flink
2021-02-20, 15:35–16:05 (US/Pacific), Interactive Track

Any symbiotic relationship among very different creatures has unique challenges, but can result in something even more powerful than the sum of its parts. Combining Python with Apache Flink, particularly for Machine Learning, has its complications, but can also produce an incredibly fast, portable, scalable, and highly flexible data pipeline.

This talk covers the structure and technical features of a Python-Flink pipeline. It also goes over getting started, and more importantly - addressing the common mistakes and hurdles of building one. This includes which features to use and how to leverage the strengths of each framework based on your specific use case. For instance, when would you use regular Python, and when would you want to use PyFlink? Are there cases where you would NOT want to use some of the abstraction or automation tools available for these frameworks?

Attendees will get out of this talk an introduction to working with Apache Flink with Python, and pragmatic tips and tricks for a smoother, faster, more enjoyable (because this should be fun!) dive into this symbiotic relationship.

This talk is geared towards those who are new to Flink but is applicable to anyone with beginner to advanced Python experience.


Getting started with stream processing for Python can be hard enough, and the more complex the streaming framework, the more intimidating it can be. It can be challenging to understand when to use (or not use) certain features with such a combination. However, each complication brings with it a level of flexibility and robustness, which lends itself particularly well to Machine Learning (and many other common Python) use cases.

Having seen firsthand what this combination can do, but also remembering how long it can take to get there without some guidance, I get a huge amount of joy in helping people avoid these hurdles (or feeling overwhelmed), and enabling them to get to just have fun with their stronger, faster data pipelines.

Caito is a Developer Advocate for Ververica (creators of Apache Flink), representing the US region, and is based in Portland, Oregon. Previously, she was a software engineer at a data analytics company and loves geeking out about metrics and stream processing. Outside of tech, Caito does woodworking/construction, dance, running, and appreciation of terrible puns.