27.03.2025 –, Poster (Zelt)
Machine learning for spatial problems faces unique challenges, notably spatial dependence. Effective modeling requires integrating spatial information and proper validation methods to preserve spatial structure. This poster will overview spatial machine learning packages in R, focusing on tools for feature engineering, validation, and interpretation. It will also serve as a guide for comparing these tools and critically assessing their strengths and limitations.
Machine learning techniques are widely used for both spatial and non-spatial problems. As the use of machine learning models for creating maps continues to grow, several challenges have emerged that are unique to spatial problems and not encountered in non-spatial applications. Most importantly, spatial dependence poses significant challenges, requiring careful consideration to avoid overfitting and biased outcomes.
The effectiveness of machine learning models greatly hinges on the quality of input predictor data, underscoring the significance of feature engineering in the modeling process. Various approaches have been proposed to integrate spatial information into machine learning, including spatial coordinates of the observations, Euclidean distance fields (EDF), etc. Moreover, several modifications to traditional machine learning algorithms, such as RFGLS and hybrid models, have been suggested to improve the predictive performance of spatial data modeling. Regardless of the selected modeling algorithm, a proper validation approach is crucial for choosing suitable variables and hyperparameters during the tuning process and quantifying the quality of final predictions. Standard cross-validation (CV) methods tend to overestimate model transferability in spatial data, and spatial cross-validation methods, such as spatial block k-fold CV and nearest neighbor distance matching LOO CV, have been proposed to address this issue by preserving the spatial structure of data subsets. Finally, spatial data models require specific tools for model interpretation and visualization, such as area of applicability (AoA), to provide insights into the model’s quality and decision-making process.
This poster aims to provide a comprehensive overview of spatial machine learning packages currently available in R, categorizing them based on their functionalities and applicability in spatial data analysis. We will not only focus on spatial extensions of three main machine learning frameworks in R (CAST for caret3, mlr3spatiotempcv for mlr3, and spatialsample for tidymodels) but also introduce specialized spatial machine learning packages for feature engineering, model validation, and interpretation, including SpatialML, RandomForestsGLS, and sits. The poster will not only serve as a navigational guide for comparing spatial machine-learning tools but also critically assess their strengths, limitations, and potential integration strategies.
I am a computational geographer working at the intersection between geocomputation and the environmental sciences. My research is focused on developing and applying spatial methods to broaden our understanding of processes and patterns in the environment. Vital part of my work is to create, collaborate, and improve geocomputational software. I am an active member of the R-spatial community and a co-author of the Geocomputation with R book.
PhD Student in Landscape Ecology