2026-08-12 –, Room 3
Spatial machine learning has become increasingly crucial for environmental prediction tasks. Yet, current workflows in R and Python face challenges when scaling to high‑resolution, national‑level mapping and when integrating modern uncertainty‑aware methods. In this talk, I present a new Julia‑based spatial machine learning framework for digital soil mapping, focusing on national soil organic carbon (SOC) prediction in Estonia. The approach combines Random Forest models, stacked meta‑learning, and conformal prediction through the MLJ ecosystem, while developing an integration port to Julia of the IGEO7 discrete global grid system (DGGS) to impose a hierarchical spatial structure.
This approach targets persistent issues in spatial ML, such as autocorrelation, multi‑scale dependencies, and computational efficiency. It implements DGGS‑based multi‑resolution covariate aggregation, spatially aware cross‑validation, Shapley values, and area‑of‑applicability (AOA) assessment using the Dissimilarity Index method. Initial results demonstrate improved spatial fidelity, scalable high-resolution prediction, and more transparent communication of uncertainty.
This work showcases how Julia’s speed and composability enable a modern, reproducible, and scalable approach to spatial machine learning in comparison to what conventional Python/R workflows currently offer.
Digital soil mapping increasingly relies on spatial machine learning techniques that must balance predictive accuracy, spatial fidelity, and computational scalability. Recent research in the Python and R ecosystems highlights the advances in explicit spatial structure, multi‑scale context, and robust uncertainty quantification. However, for experimenting with large‑area, high‑resolution prediction tasks, Python felt inefficient. Building on earlier national‑scale experiments in Estonia and leveraging the increasingly mature MLJ ecosystem, this project explores a fully Julia‑based spatial ML pipeline to improve the modelling of soil organic carbon (SOC).
The workflow integrates three components:
- Random Forest (RF) models implemented via MLJ / DecisionTree for nonlinear, interaction‑rich prediction;
- Stacked ML meta‑learning to combine predictions across multiple model families and spatial resolutions (DGGS-based neighbourhood kernels and parent-relationship to model spatial structure and tele-connections);
- Conformal prediction to generate calibrated, spatially explicit uncertainty intervals.
A key innovation is the integration of the IGEO7 discrete global grid system (DGGS). DGGS are increasingly used, knowingly and unknowingly (HEALPix, H3, etc). IGEO7 originates from the DGGRID tool (Sahr, K., https://github.com/allixender/DggridRunner.jl, a native CxxWrap binding is still under development). We made it available to Julia as the fundamental spatial scaffold. Several core functionalities, such as Z7-indexing-based neighbourhoods, are now natively implemented in Julia (https://github.com/allixender/Z7.jl), as is the required authalic conversion from the spherical DGGS to the WGS84 ellipsoid. IGEO7’s equal‑area, multi‑resolution hierarchy provides a principled alternative to traditional spatial ML approaches based on k‑nearest neighbours, coordinate distances, or buffer‑based metrics. Environmental covariates (climate, terrain, land cover, and Estsoil profile data) are aggregated at several DGGS resolutions, allowing the models to capture hierarchical spatial dependencies similar to those of multi‑mesh graph structures proposed in current spatial ML research (e.g., Google GraphCast).
Variable preparation follows the literature to enhance the role of predictors most relevant to SOC modelling, including various terrain and geomorphological indices (e.g., via Geomorphometry.jl) and remote sensing indices, such as NDVI and more. These predictors form the basis for both model training and the dissimilarity index (DI) used in the area of applicability (AOA) framework, which is an increasingly standard requirement in spatial ML for assessing extrapolation risk.
Training samples are drawn within the DGGS structure to ensure spatial representativeness, and spatial cross‑validation is performed using DGGS‑consistent blocking schemes to counter overoptimistic estimates caused by autocorrelation. The resulting models can generate predictions at multiple IGEO7 levels, supporting both fine‑resolution mapping and aggregated, scalable national‑level assessments.
Uncertainty assessment combines two complementary approaches:
- Conformal prediction, offering distribution‑free uncertainty intervals calibrated to the empirical error structure;
- AOA‑based spatial validity masks, identifying regions where model predictions are reliable based on weighted distances in predictor space.
In a future research paper, we aim to explore and discuss in more detail the similarities and differences in the uncertainty results between the two methods. Additionally, Quantile Random Forest would be great, but it is not yet available in Julia.
By building this workflow almost entirely in Julia, the project benefits from Julia’s performance for large‑scale raster/grid operations, MLJ’s flexible model composition, and the ease of integrating custom spatial data structures such as IGEO7. The talk will present the modelling pipeline, computational aspects, evaluation results, and lessons learned about implementing advanced spatial ML methods in the Julia ecosystem.
This work demonstrates how Julia can serve as a powerful platform for modern spatial machine learning, offering performance, composability, and extensibility in comparison to what is typically feasible in Python/R‑based workflows.
Alex is an Associate Professor in Geoinformatics and a Distributed Spatial Systems Researcher with many years of experience in geospatial data management and web- and cloud-based geoprocessing with a particular focus on land use, soils, hydrology, hydrogeology and water quality data. His interests include Discrete Global Grid Systems (DGGS), OGC standards and web-services for environmental and geo-scientific data sharing, modelling workflows and interactive geo-scientific visualisation. He is also the European co-chair of the OGC DGGS working group.