Orchestrating Data Pipelines in Python: From Generation to Quality PyConES 2025

Orchestrating Data Pipelines in Python: From Generation to Quality
.ical
2025-10-17 15:40–17:20, Workshop 05, B45 C109
Language: English

Working with data goes far beyond simply generating it. It involves tracking its origin, maintaining its integrity, and selecting the right tools for each stage of your workflow. With the rapid evolution of data tools, staying current can be challenging. Fortunately, Python offers a robust and accessible collection of tools, libraries, frameworks that can make your life easier.

In this workshop, we’ll introduce Dagster, a Python-based orchestration framework designed specifically to help manage data assets. Dagster provides native support for metadata, lineage, versioning, and also includes a powerful UI that brings clarity and structure to your workflows. We’ll also explore how you can integrate orchestration workflows with other popular Python libraries -- such as pandas, Pandera, and Soda-core -- to create efficient, end-to-end pipelines.

Whether you're experienced in data pipelining or are simply curious about learning more, this session will cover how to:

Manage orchestration and asset definitions within a unified repository
Use pandas to define and transform data assets
Apply Pandera to enforce data contracts and catch schema issues early
Integrate automated Quality Control for ongoing data quality monitoring and management

By the end of our session, you’ll walk away with a practical understanding of how these open source tools can be used together to help you build more maintainable data pipelines within a Python-native environment.

Topic: Data Science and Data Engineering (analytics, visualization, pipelines, data engineering, notebooks...) Additional topics: — Proposal level: Intermediate (it is necessary to understand the related bases to go into detail)

Daniel Ortiz

Daniel Ortiz is a Senior Software Engineer at Bloomberg, where he extensively uses Python and a broad range of data technologies to build scalable systems for orchestration, analytics, and workflow automation.

He has a background spanning full-stack application development and deep experience in data infrastructure and architecture. He enjoys working across the stack, from back-end systems to user-facing components, and strongly focused on delivering maintainable and high-impact solutions.

Daniel holds a bachelor’s degree in computer science from the University of Toronto and a master’s degree in applied computing from the University of London.

Juan Aragón

Juan Aragon is a Senior Software Engineer at Bloomberg, where he is focused on building data pipelines that ensure quality and reliability.

In his more than 13 years doing software development, Juan has worked across storage systems, large-scale infrastructure, and data engineering. He has strong expertise in C++ and Python. This is his second appearance at PyConES, reflecting his commitment to sharing knowledge with the engineering community.

Juan holds a degree in software engineering and a postgraduate qualification in advanced computer engineering from the University of Extremadura.

Orchestrating Data Pipelines in Python: From Generation to Quality .ical 2025-10-17 15:40–17:20, Workshop 05, B45 C109 Language: English

Orchestrating Data Pipelines in Python: From Generation to Quality
.ical
2025-10-17 15:40–17:20, Workshop 05, B45 C109
Language: English