PyConDE & PyData Berlin 2024

Unleashing Confidence in SQL Development through Unit Testing
04-23, 10:30–11:00 (Europe/Berlin), B07-B08

As the landscape of data-driven applications expands, the need for robust SQL development practices becomes increasingly critical. This conference talk addresses the challenges faced by data teams in maintaining and evolving complex SQL models for their Data Warehouses, and shows how unit testing can play a vital role in ensuring data quality.

We will delve into the significance of SQL unit testing, highlighting its ability to quickly validate modeling logic and making sure that modifications do not break existing behavior. With the ease of mind of an automatically verified SQL logic, changes to existing data models can be shipped with confidence, ultimately contributing to faster deployment cycles.

Get detailed insights on the structure and functionality of Lotum’s SQL unit testing framework, built in Python using pytest and tailored for BigQuery. With Lotum processing millions of events from mobile games every day, explore how this robust framework allows for efficient testing, ensuring the accuracy of the SQL logic. Learn how test cases with small sets of static mock data can be defined effortlessly so that they help pinpoint potential code errors easily.


The conventional approach to data model development frequently involves a repetitive cycle: crafting a query, executing it, examining a portion of the result, and iterating through the process with each subsequent query modification. This method becomes particularly challenging when dealing with the evolution of mature, extensively-used data models, where multiple developers collaborate without sufficient testing. In such scenarios, the iterative nature of this process poses significant risks, potentially leading to overlooked errors and compromised data quality.

The talk showcases the tangible benefits of having a well-designed unit testing framework, providing ease of mind to developers working collaboratively on the same model, and enabling the early detection of hard-to-spot errors before deployment.

During the development of new data models and during the integration of new data sources, the absence of large amounts of production data makes verification of the model outputs difficult - clearly defined tests for scenarios not yet observed in production play a crucial role in overcoming this hurdle. SQL unit testing becomes especially relevant when refactoring existing data models and can be very helpful to ensure the logic is unchanged, even for edge cases.

I outline the requirements for an effective SQL unit testing framework, emphasizing the use of the database or query engine to verify SQL statement correctness without persisting any data in the database. The presented framework supports the definition of atomic test cases, where each test case consists of minimal input datasets and expected output datasets and it is verified if the output of the query when run on the defined inputs matches the expected output.

The practical implementation of a SQL unit testing framework will be shared in detail, by giving insights into Lotum’s pytest-based SQL unit testing framework and demonstrating how a test case for a SQL statement with mock data can be built effortlessly with minimal code redundancy.

Internal workings of the framework will be explained, including the mechanics to define and run a unit test: By injecting mock data into an existing SQL statement, replacing references to production tables by the injected mock data, and executing the resulting fully-static statements in the query engine, the framework evaluates the transformed data against expected outputs. This way, the correctness of the query can be verified on a case-by-case basis without manually modifying the query code itself.

Attendees will leave the session with a deep understanding of the importance of SQL unit testing, equipped with insights into building an effective framework, defining test cases, and ensuring data model robustness. The talk provides a roadmap for data teams to embrace a test-driven development approach, enhancing code quality, and fostering a culture of confident SQL development.


Abstract as a tweet (X) or toot (Mastodon)

Confidently ship changes to your SQL data model by validating logic with a SQL unit testing framework. Our framework, powered by pytest, ensures robust deployments, making data model evolution a breeze.

Expected audience expertise: Domain

Intermediate

Expected audience expertise: Python

Intermediate

I'm an experienced technical leader with expertise in Data Science and Data Engineering. For over 20 years I have been designing and implementing data-intensive applications end-to-end, from data ingestion to deployment and have build solutions which generate insight from data using statistical analysis and machine learning. My passion is building user-friendly, high-performance and cost-efficient data platforms.