PyCon DE & PyData 2026

Joshua Finger

Joshua is a Data Engineer at inovex GmbH dedicated to building robust, scalable data products. Utilizing his foundation as a Full Stack Software Engineer, he applies rigorous software engineering principles to ensure every data solution is high-quality, maintainable, and efficient.


Session

04-14
12:25
45min
Fight your garbage data: implementation of a pythonic data quality monitoring framework in PySpark
Rostislaw Krassow, Joshua Finger

The timeless phrase “garbage in, garbage out” is even more important today with the growing usage of non-deterministic generative neuronal networks, which amplifies the effect of bad data quality. This presentation describes Data Quality Monitor — a tool to bring transparency into data quality and help drive real improvements.

In the talk, we'll cover what defines a successful data quality monitoring solution and share findings from our initial evaluation of available open-source frameworks. Next, we'll showcase our implementation based on DQX. DQX is a lightweight, open-source framework for performing row-level data quality checks programmatically, with business rules organized in manageable YAML files. DQX, originally developed by Databricks Labs, integrates seamlessly with PySpark, making it easy and affordable to run data quality checks within our IoT data lake. Finally, we will discuss the organizational processes and structures required to effectively respond to data quality issues.

PyData: Data Handling & Data Engineering
Helium [3rd Floor]