Processing large JSON files without running out of memory PyData Boston 2025

Processing large JSON files without running out of memory
.ical
2025-12-10 14:15–14:55, Abigail Adams

If you need to process a large JSON file in Python, it’s very easy to run out of memory while loading the data, leading to a super-slow run time or out-of-memory crashes. In this talk you'll learn:

How to measure memory usage.
Why loading JSON takes a lot of memory.
Four different ways to reduce memory usage when loading large JSON files.

If you need to process a large JSON file in Python, it’s very easy to run out of memory while loading the data, leading to a super-slow run time or out-of-memory crashes. If you're running in the cloud, you can get a machine with more memory, but that means higher costs. How can you process these large files?

In this talk you'll learn:

How to measure memory usage.
Some of the reasons why loading JSON uses so much memory.

Then, you'll learn some of the solutions for this problem:

Using a more efficient in-memory representation.
Only loading the subset of the data you need.
Streaming parsing, which can parse arbitrarily-large files with a fixed amount of memory.
Using a different file format, like JSON Lines.

Along the way, we'll also measure the runtime speed of these solutions and the various libraries we'll be using.

Prior Knowledge Expected: No previous knowledge expected

Itamar Turner-Trauring

Itamar Turner-Trauring is a consultant, and writes about Python performance at https://pythonspeed.com/. He helps companies maintain open source software and speed up their data processing code.

In his spare time he is a volunteer with Cambridge Bicycle Safety, and writes about Cambridge local politics at Let's Change Cambridge.

Processing large JSON files without running out of memory .ical 2025-12-10 14:15–14:55, Abigail Adams

Processing large JSON files without running out of memory
.ical
2025-12-10 14:15–14:55, Abigail Adams