Lisa Dusseault
Lisa Dusseault is the CTO of the non-profit Data Transfer Initiative, supporting consumer data portability across tech platforms. With a dual career in standards and startups, she brings both idealism and pragmatism. On the startup side, Lisa was CTO of Compaas and ShareTheVisit and VPEng of Klutch. On the standards side, she co-authored CalDAV, updated WebDAV, was chair of the XMPP and IMAPExt working groups, and spent four years as Area Director shepherding new Applications area work at the IETF.
Session
Django gives a Web programmer utilities, functional classes, and places to put things. Complex data pipelines could use the same kinds of structure. Data engineers deserve a good developer experience too - and many folks do both Web and data. The phaser open source library has abstractions for data pipelines, columns operations, phases and checkpoints, and many of its principles are applicable even without using phaser.
When you are acquiring data for multiple models in your db:
* Are your data transformations testable? Can they be remixed?
* Can you debug a pipeline that has already run?
* Is your code re-usable when you need to add a new data source or handle a different format?
By sharing what we’ve learned building data pipelines and building phaser, we hope to spread not only knowledge about this library, but also useful principles, abstractions and architecture to make data pipelines more readable, robust, maintainable and re-usable.