2025-07-23 –, Main Room 3
Our world today is defined by big data; the output of a single satellite orbit is larger than your laptop's hard drive. The canonical way to analyze "big earth observation datasets" has always been to throw it on a cluster and let it run overnight. But what if it didn't have to be?
With RangeExtractor.jl, you can run queries over huge gridded datasets on a laptop, without storing the data locally. Loading and processing is batched by chunks, either defaults from the dataset, or from the user
Our world today is defined by big data; the output of a single satellite orbit is larger than your laptop's hard drive. The canonical way to analyze "big earth observation datasets" has always been to throw it on a cluster and let it run overnight. But what if it didn't have to be?
RangeExtractor.jl is a Julia package that is meant to accelerate localized computations over global, large (~1TB to ~1PB), chunked datasets. It does this through multithreading, asynchronous data downloads, and intelligent query-splitting over chunk boundaries.
In this talk, I'll touch over the case for RangeExtractor, how it's designed, and what performance bottlenecks remain to be improved upon.
I'll also talk about the two principal usecases that motivated RangeExtractor: (a) extracting the lowest elevation of each glacier for all ~200,000 glaciers in the world, over a 30-meter resolution elevation grid, and (b) computing elevation statistics over the whole of Greenland. Both of these questions will be answered on my 14-inch Macbook, without pre-downloading any big data.
JuliaGeo collaborator and author of GeoMakie.jl, and contributor to DocumenterVitepress.jl!