Dr. Jannis Jakobi
Jannis completed a bachelor’s in geography, followed by a master’s in natural conservation and landscape ecology. During his master’s and PhD at Research Centre Jülich, he specialized in soil hydrology using geophysical methods, including particle-ray physics, and autodidactively taught himself scripting languages. Now, as a geoinformatician at viadukt, he’s expanding his skills, advancing into spatial data science.
Beiträge
PostgreSQL is often considered a standard solution for geospatial data processing. However, compute costs grow with the data volume and vertical scaling quickly becomes expensive. In contrast, distributed processing frameworks allow for horizontal scaling. In this talk, we will present our experience with Apache Spark, an open-source framework designed for high-volume data processing. We will show the benefits and highlight the challenges we faced during the implementation.
Efficient retrieval of geospatial data is crucial but presents scaling challenges. During our transition from PostgreSQL to Apache Spark, we encountered limitations in spatial indexing. While PostgreSQL’s indexing supports efficient queries, this is not directly translatable to Spark. The transition required us to create new strategies for managing and querying spatial data effectively. In this talk, we’ll share the challenges we faced and the innovative solutions we implemented to address them.