This study presents a reproducible methodology for detecting informal settlements using OpenStreetMap building footprint data. By integrating urban morphology and topology within a Python-based workflow and leveraging OSMnx for data extraction, the approach enables fine-scale spatial analysis in data-scarce environments. The method is tested across three case studies in the Global South, demonstrating how OpenStreetMap can support urban research, comparative analysis, and data-driven insights into informal settlements relevant to SDG 11.
Rapid urbanisation is transforming cities worldwide, particularly in the Global South, where urban growth often occurs faster than formal planning processes can respond. As a result, informal settlements have expanded significantly and now shelter a substantial portion of the urban population, leading to several problems within them, e.g., lack of tenure security, access to safe water and acceptable sanitation, and housing durability [1]. Despite their lack of formal recognition, these settlements are home to countless individuals and families, forming integral components of the urban landscape. Despite this fact, these areas frequently remain underrepresented in official datasets and planning documents. This lack of reliable spatial information poses a major obstacle for urban planners, researchers, and policymakers seeking to address the challenges associated with informal urbanisation and monitor progress toward Sustainable Development Goal 11 (SDG 11), which aims to ensure inclusive, safe, resilient, and sustainable cities. For all mentioned, qualitative and quantitative geographic data collection and analysis are crucial to a better understanding of such urban contexts.
The acquisition of the mentioned spatial data is essential for understanding the morphology and dynamics of informal settlements. However, traditional approaches such as census surveys or remote sensing often face limitations: On the one hand, with remote sensing, the detection of large areas of slums struggles to capture the fine-grained spatial structures that characterise informal neighbourhoods. On the other hand, while census data is obstructed by the reality that population census and household surveys may be difficult or even impossible to collect or access in some less-developed countries [2]. These limitations highlight the need for alternative data sources and methodologies capable of supporting detailed spatial analysis of informal settlements.
Informal settlements often exhibit distinctive spatial characteristics, including high building density, irregular plot structures, and complex patterns of spatial connectivity. Two fields have been exploring settlements with such dynamic and organic characteristics: urban morphology and topology. Urban morphology is a field of study focusing on the analysis of urban form and processes of its formation and transformation [3]. While topology is not an in-depth explored field for urban and geospatial analysis, some research has shown it represents a good approach to the study of complex urban structures like slums, because of the analytical tools for identifying incipient urban development in informal neighbourhoods [4]. Therefore, this research proposed a methodology to detect informal settlements by integrating principles of urban morphology and topology using building footprint data derived from OpenStreetMap (OSM). Urban morphology metrics (UMMs) were calculated to describe characteristics such as building area, perimeter, compactness, tessellation area, and inter-building distances. Morphological tessellations generated out of building footprints were used as a proxy for parcel structures in areas where cadastral information is scarce or unavailable. In addition, spatial topology was analysed within the cases of study through a graph-based representation of building relationships generated using Delaunay Triangulation. This network structure enabled the calculation of connectivity-based Topology Metrics (TMs), including the Average Weighted distance between buildings.
In recent years, Volunteered Geographic Information (VGI) has emerged as a powerful alternative for documenting urban environments that are poorly represented in official datasets. Among many others, OSM is one of the most successful examples of VGI [5] and has become one of the most important global repositories of openly accessible geospatial data. In many cities of the Global South, this openly available data provides one of the few spatially detailed representations of informal settlements. Despite the concerns about completeness and contributor bias, VGI data from OSM have been validated in several studies [6], showing comparable positional accuracy to authoritative datasets. In addition, its participatory approach ensures that the data is both current and reflective of the actual conditions on the ground. The global coverage of OSM data, its continuous updates, and participatory nature provide valuable data, especially in under-mapped areas like informal settlements. A key component of the workflow for this research was the usage of the Python library OSMnx, which enabled the automated extraction of building footprints and the structuring of OSM data within a Python environment. The integration of the data acquisition through this library supported a fully reproducible and accessible pipeline that further allowed the spatial analysis of the slums.
The mentioned methodology was applied to three informal settlements located in different regions of the Global South where detailed OSM building footprint data was available: Asia, Africa and Latin America. The selection of these regions also responded to their particular conditions as major hosts of informal settlements, as Asia gathers over half of the world’s slum population, and some cities have reached worrying levels of inequality, as it happens in Latin America and Africa [7]. The training dataset was derived from the Korail neighbourhood in Dhaka, Bangladesh, one of the largest and most densely populated slums in South Asia. To evaluate the transferability of the methodology across different regions, the model was tested in two additional neighbourhoods: Katanga in Kampala, Uganda, and Ricardo Brugada in Asuncion, Paraguay.
Two classification strategies were implemented to distinguish slum buildings from non-slum buildings: a stepwise search and a random forest. The first approach used a rule-based threshold searching method capable of identifying the optimal metric threshold within the separated clusters associated with informal settlements. To evaluate how well the threshold separates the clusters into slums and non-slums, a silhouette score was performed. The second approach applied a Random Forest machine learning classifier trained using the morphological and topological metrics derived from the OSM building footprints data. For both methods mentioned, and as a prior step to classification, building footprints and neighbourhood polygons were spatially intersected to provide the labelled data for the ground truthing. This highlights the importance of VGI as both an input dataset and a reference layer for creating training labels.
The results suggested that with the RF classifier, the model has learned the specific characteristics of the training data, including noise and outliers, which implies overfitting. By comparison, even though the TS strategy is not an automatic method, and the algorithm computation time was longer, it is less sensitive to the finer aspects of the training data, which led to better generalizability. In terms of the implementation of urban morphology and topology, the results demonstrate that the morphological metrics could capture the spatial characteristics that are typical of informal settlements, such as: smaller building areas, higher building densities, and shorter inter-building distances. However, while the topology-based metrics provided additional insights on the connectivity within the slums, their contribution to the building classification accuracy was very modest, in some cases, even led to a slight decrease in the performance. These results and the comparative analysis across case studies selected from different continents suggest that informal settlements do not exhibit a single universal morphological and topological signature. While Korail is characterised by tightly packed and highly connected building clusters, Ricardo Brugada and Katanga display a different spatial logic of informality, where small housing units are more dispersed. This diversity helps to explain the limited transferability of the slum mapping model.
A cartographer with roots in Paraguay and academic experience across Europe through the Erasmus Mundus Cartography MSc. She is currently starting her professional journey as a cartographer in Germany, with a strong interest in open data, social and urban research. She is passionate about maps, infographics, cities, and the stories they tell.