PyData London 2026

Test-Driven Data Analysis
2026-06-05 , Hardwick Hub

Test-Driven Data Analysis is a methodology for reducing errors in data and data analy. It is also an open-source Python package for supporting key aspects of the methodology. This tutorial will provide hands-on experience using the library to validate data and write tests (manually or automatically) for analytical processes. It will also highlight approaches to avoiding errors in specific areas not amenable to software support.


Test-Driven Data Analysis is a methodology for reducing errors in data and data analysis, and also a an open-source Python package (tdda) for supporting key aspects of the methodology. This tutorial will provide hands-on experience using the library to

  • generate constraints characterising data in data frames automatically;
  • validate data using previously generated constraints;
  • test structured data resulting from analyses in data frames;
  • test unstructured date from analyses, typically in text files and graphical form,

as well as highlighting other libraries that can be used for similar purposes.

It will also discuss a taxonomy of errors arising during analysis and highlight approaches to reducing those errors, including through the use of 22 TDDA-focused checklists.

This major error categories that will be considered are

  • errors of interpretation (of formulation and of communication),
  • errors of implementation,
  • errors of process,
  • errors of applicability, and
  • errors of judgement.

ATTENDEES

No prior experience is required, but it would be helpful to have the tdda library installed and to have some familiarity with DataFrames in polars or pandas. If you want to develop hands-on experience during the tutorial follow the instructions to install tdda at tdda.readthedocs.io.

If this works, you should be able to use the tdda command. If you change to a directory you are happy to put data in, the command

 tdda examples all

will download all the data that will be used in the tutorial in subdirectories.

There is wifi available at the conference, but if you do this ahead the tutorial, you will fight fewer people for bandwidth and will have more chance to check it works before you need the library.

Nick Radcliffe has used Python since around 2005 (starting with Python 2.1, in the form of Jython) and has been doing what we now call Data Science since around 1986. He is a Visiting Professor in the Maths Department (Operations Research) at University of Edinburgh and runs Stochastic Solutions Limited, a consulting and software company working in Data Science. Since around 2015 Nick has been developing the ideas of test-driven data analysis (TDDA), which is an approach to quality of data and analytical processes inspired by test-driven development (TDD). The open-source Python TDDA library (for which he is the lead developer) provides support for test-driven data analysis in those areas where software can help.

Nick has previously co-authored two books, one on Sustainability for WWF, and one on a (defunct) Python online tag-based social database called Fluidinfo. By the time of this conference, his latest book, Test-Driven Data Analysis (CRC Press) should be available.