27.03.2025 – 11:00-11:10 (Africa/Abidjan), Poster (Zelt)
PostgreSQL is often considered a standard solution for geospatial data processing. However, compute costs grow with the data volume and vertical scaling quickly becomes expensive. In contrast, distributed processing frameworks allow for horizontal scaling. In this talk, we will present our experience with Apache Spark, an open-source framework designed for high-volume data processing. We will show the benefits and highlight the challenges we faced during the implementation.
We are from the data team of viadukt, a start-up that aims to increase the energy-efficient modernization of buildings across Germany. To achieve this, we set up a comprehensive geospatial database containing building-related data and developed a processing pipeline that integrates both federal and open-source datasets. Initially, PostgreSQL with the PostGIS extension met our needs while we focused on North Rhine-Westphalia. But as our scope expanded nationwide, significant challenges arose.
PostgreSQL required costly vertical scaling, even though we only occasionally needed high processing power, resulting in an underutilized yet expensive setup. Alternatively, we could have downsized our machines whenever processing was not relevant. However, this would require downtime of our instance, which we wanted to avoid, or complex mechanisms to avoid it.
To address these issues, we implemented a solution based on Apache Spark and Apache Sedona that met specific criteria: cost-effectiveness, flexible scaling for storage and compute, consistent performance, and an easy migration path. In this talk, we'll explore how we overcame challenges we faced during the migration and the solutions we implemented.
Jannis completed a bachelor’s in geography, followed by a master’s in natural conservation and landscape ecology. During his master’s and PhD at Research Centre Jülich, he specialized in soil hydrology using geophysical methods, including particle-ray physics, and autodidactively taught himself scripting languages. Now, as a geoinformatician at viadukt, he’s expanding his skills, advancing into spatial data science.