PyCon UK 2022

OpenSAFELY: a python powered response to the COVID pandemic
09-18, 12:00–12:30 (Europe/London), Assembly Room

OpenSAFELY is a secure, transparent, open-source platform, built in Python, in response to the COVID pandemic. It provides almost real-time analysis of millions of electronic health records. We will discuss privacy, security and research goals achieved by a cross-disciplinary team of researchers, epidemiologists and developers working together.


OpenSAFELY is a secure, transparent, open-source software platform for analysis of electronic health records (EHR) data, built in Python.

OpenSAFELY was built as a fast response to the Covid-19 pandemic to provide almost real-time analysis of millions of NHS records related to Covid data such as hospitalisations, deaths, risk factors and vaccinations.

In this talk, we will tell the story of how OpenSAFELY came to be, the privacy, security and research goals it addresses with software, and how research best practices can be achieved by a cross-disciplinary team of researchers, epidemiologists and software developers working alongside each other.

We will describe the particular challenges that face researchers working with potentially identifiable patient data, and how the inherent risks of such data can be mitigated.

A key concept is the idea of pseudonymised data; i.e. data where a patient's immediately identifiable information (name, exact address, date of birth etc) has been obfuscated). While this sort of data has been superficially anonymised, it can be quite easy to re-identify individuals from the information available. For example, a malicious intruder might easily find and read Boris Johnson’s entire health record by searching for a patient who matches information that is openly available on his Wikipedia page: his age, the dates he tested positive and was admitted to hospital for Covid-19, and the fact that he lives in London.

OpenSAFELY is a range of tools that aims to allow analysis of EHR data while adhering to these principles; it aims to substantially exceed, by design, the current requirements on securing sensitive healthcare data. OpenSAFELY does not move patient data outside of the secure environments where it already resides: instead, trusted analysts can run large scale computation across pseudonymised patient records in situ, and in near-real-time.

OpenSAFELY contains a range of flexible, pragmatic, but broadly standardised tools that users work with to convert raw patient data into “research ready” datasets, and to then execute code across those datasets. We will discuss the architecture of the platform, some of the tools we chose to use, and reasons for the design decisions made.


Is your proposal suitable for beginners? – yes
See also: Slides (1.4 MB)

I am a software developer at the Bennett Institute for Applied Data Science, working on the OpenSAFELY platform.

bennet.ox.ac.uk | opensafely.org

Hi folks

I am a software engineer at the Bennett Institute, working on OpenSAFELY.

I have been building things with python for a good while, with a focus on infrastructure and tooling. A long time Linux user, I previously worked at Canonical on Ubuntu for 9 years.

An even longer time ago I did a PhD in AI cloud scheduling, which explains why you might see me drinking a lot.