Data DAGs with lineage for fun and for profit
07-07, 10:00–10:45 (US/Pacific), NYC Meetup [Session starts: Tuesday 07.07 12pm (Tuesday 07.07 9am PDT)]

Let’s be honest about it. Many of us don’t consider data lineage to be cool. But what if lineage would allow you to write less boilerplate and less code, while at the same time make your data scientists, your auditors, your management and well everyone more happy? What if you could write DAGs that mix between tasks based and data based?


Lineage support has been incubating with Airflow for a while. It was buggy and not very easy to use. Still for a lot of reasons it is really cool to have data lineage available. One of those reasons is that it can make writing DAGs a lot easier. Recently a lot of development has gone into improved lineage support and to make it much easier or even transparent to use. In this talk I will focus on what we have in mind, evangelize data lineage but also gather feedback from the audience where we should take it next.

Bolke de Bruin is VP of Apache Airflow and CTO of Wholesale Banking Advanced Analytics. Bolke is passionate about embedding new ideas in the Wholesale Banking organization and strives to make Wholesale Banking more data driven. Before joining ING in 2008 Bolke worked at the 2004 summer and 2006 winter Olympic Games managing the technology, communication and data requirements for all news & media feeds at two large event locations. Bolke has also run his own start up commercializing multi-touch technology. In his spare time, Bolke is a guest lecturer at the University of Amsterdam, fun father to Mattia (6) and Timo (2) and can be found surfing, obstacle running (Ever done a 15km Mud Run? – www.obstakels.com) or taking in a museum when the opportunity arises.

This speaker also appears in: