Answering local questions on big datasets with RangeExtractor.jl JuliaCon 2025

Answering local questions on big datasets with RangeExtractor.jl
.ical

2025-07-23 10:50–11:00, Lawrence Room 104 - Function Room

Our world today is defined by big data; the output of a single satellite orbit is larger than your laptop's hard drive. The canonical way to analyze "big earth observation datasets" has always been to throw it on a cluster and let it run overnight. But what if it didn't have to be?

With RangeExtractor.jl, you can run queries over huge gridded datasets on a laptop, without storing the data locally. Loading and processing is batched by chunks, either defaults from the dataset, or from the user

RangeExtractor.jl is a Julia package that is meant to accelerate localized computations over global, large (~1TB to ~1PB), chunked datasets. It does this through multithreading, asynchronous data downloads, and intelligent query-splitting over chunk boundaries.

In this talk, I'll touch over the case for RangeExtractor, how it's designed, and what performance bottlenecks remain to be improved upon.
I'll also talk about the two principal usecases that motivated RangeExtractor: (a) extracting the lowest elevation of each glacier for all ~200,000 glaciers in the world, over a 30-meter resolution elevation grid, and (b) computing elevation statistics over the whole of Greenland. Both of these questions will be answered on my 14-inch Macbook, without pre-downloading any big data.

Anshul Singhvi

JuliaGeo collaborator and author of GeoMakie.jl, and contributor to DocumenterVitepress.jl!

This speaker also appears in:

Answering local questions on big datasets with RangeExtractor.jl .ical 2025-07-23 10:50–11:00, Lawrence Room 104 - Function Room

Answering local questions on big datasets with RangeExtractor.jl
.ical

2025-07-23 10:50–11:00, Lawrence Room 104 - Function Room