What would the django of data pipelines look like?
2025-09-09 , Room B

Django gives a Web programmer utilities, functional classes, and places to put things. Complex data pipelines could use the same kinds of structure. Data engineers deserve a good developer experience too - and many folks do both Web and data. The phaser open source library has abstractions for data pipelines, columns operations, phases and checkpoints, and many of its principles are applicable even without using phaser.

When you are acquiring data for multiple models in your db:
* Are your data transformations testable? Can they be remixed?
* Can you debug a pipeline that has already run?
* Is your code re-usable when you need to add a new data source or handle a different format?

By sharing what we’ve learned building data pipelines and building phaser, we hope to spread not only knowledge about this library, but also useful principles, abstractions and architecture to make data pipelines more readable, robust, maintainable and re-usable.


This talk will introduce the phaser open source library, but also teach principles that can help organizing data transformations and data pipelines. It's hard to carve out time to build your own utilities for a data pipeline and it's hard to even know what are effective patterns for modularizing once code gets complicated. Let's talk about why this investment is important and how the investment could be less costly.

Topics will include
* Organizing data pipelines into more phases than just ETL, and why
* Using checkpoints and logging to be able to debug pipeline breakdowns after they occur
* Making data transformation code testable and maintainable
* Supporting a team or rotating contributors to data pipeline code

Lisa Dusseault is the CTO of the non-profit Data Transfer Initiative, supporting consumer data portability across tech platforms. With a dual career in standards and startups, she brings both idealism and pragmatism. On the startup side, Lisa was CTO of Compaas and ShareTheVisit and VPEng of Klutch. On the standards side, she co-authored CalDAV, updated WebDAV, was chair of the XMPP and IMAPExt working groups, and spent four years as Area Director shepherding new Applications area work at the IETF.