Juan Aragón

Juan Aragon is a Senior Software Engineer at Bloomberg, where he is focused on building data pipelines that ensure quality and reliability.

In his more than 13 years doing software development, Juan has worked across storage systems, large-scale infrastructure, and data engineering. He has strong expertise in C++ and Python. This is his second appearance at PyConES, reflecting his commitment to sharing knowledge with the engineering community.

Juan holds a degree in software engineering and a postgraduate qualification in advanced computer engineering from the University of Extremadura.


Session

17/10
16:00
120minutos
Orchestrating Data Pipelines in Python: From Generation to Quality
Daniel Ortiz, Juan Aragón

Working with data goes far beyond simply generating it. It involves tracking its origin, maintaining its integrity, and selecting the right tools for each stage of your workflow. With the rapid evolution of data tools, staying current can be challenging. Fortunately, Python offers a robust and accessible collection of tools, libraries, frameworks that can make your life easier.

In this workshop, we’ll introduce Dagster, a Python-based orchestration framework designed specifically to help manage data assets. Dagster provides native support for metadata, lineage, versioning, and also includes a powerful UI that brings clarity and structure to your workflows. We’ll also explore how you can integrate orchestration workflows with other popular Python libraries -- such as pandas, Pandera, and Soda-core -- to create efficient, end-to-end pipelines.

Whether you're experienced in data pipelining or are simply curious about learning more, this session will cover how to:

  • Manage orchestration and asset definitions within a unified repository
  • Use pandas to define and transform data assets
  • Apply Pandera to enforce data contracts and catch schema issues early
  • Integrate automated Quality Control for ongoing data quality monitoring and management

By the end of our session, you’ll walk away with a practical understanding of how these open source tools can be used together to help you build more maintainable data pipelines within a Python-native environment.

Ciencia de Datos e Ingeniería de Datos
Workshop 05, E45 A109