Leveraging OpenStreetMap for hyperlocal geocoding of Twitter data: A spatiotemporal analysis of the 2016 Haifa (Israel) wildfire
2025-10-03 , Pulag

This study presents a geospatial framework that combines NLP, machine learning, and GIScience to extract and georeference tweets related to the November 2016 Haifa wildfire, enabling near real-time insights into urban fire dynamics. Using OpenStreetMap and GeoNames to geocode over 16,000 tweets, the researchers demonstrated strong spatial and temporal alignment with official fire incident reports, highlighting social media’s potential as a supplementary data source for disaster response. The approach offers a scalable model for leveraging crowdsourced and user-generated data in emergency informatics, especially in data-scarce regions.


The increasing frequency and severity of urban wildfires demand new sources of near real-time information to support emergency response and disaster risk reduction [1-4]. In this study, we present an application for extracting and georeferencing the spatiotemporal distribution of tweets associated with the November 2016 wildfire in Haifa, Israel. The unprecedented urban fire influenced densely populated neighbourhoods, caused extensive infrastructural damage, and led to the evacuation of thousands of residents [5]. However, because of the nature and extent of the fire that lasted nearly 3 days, complete and reliable information concerning the emergence and development of new fire locations at the sub-city scale was only partial. Accordingly, management and decision-making procedures were complicated and some of the cascading events along the occurrence of the catastrophe were hard to be detected and addressed.
The purpose of the study was to analyse tweets as a potential source of near real-time information and examine to what degree Twitter can be used to assist decision-making during occurrences on urban catastrophes. The implemented research combined Natural Language Processing (NLP), Machine Learning (ML), and Geographic Information Science (GIScience) to filter, classify, and precisely geolocate tweets at the city, neighbourhood or street-level resolution. One of the main components of the established geospatial framework was OpenStreetMap (OSM, https://www.openstreetmap.org, accessed July 2019), used in conjunction with the GeoNames gazetteer (http://www.geonames.org/, accessed July 2019) to construct a comprehensive spatial reference corpus. This enabled the geocoding of both explicitly and implicitly localized tweets that lack GPS metadata—an essential challenge given that only 1–3% of the tweets are geotagged with reliable geographic coordinates [6, 7].
We have collected approximately 2.4 million tweets using keywords related to the wildfire (in the Hebrew, Arabic, and English languages) between November 24th –27th, 2016. After classification using topic modelling and RCNN (Recurrent Convolutional Neural Networks) [8], around 114,000 tweets were labelled as relevant to the event. Of these, only 31 tweets were geotagged with geographic coordinates which is obviously an insufficient number of observations to perform spatial analysis. To overcome this shortcoming, we implemented a text-based georeferencing approach leveraging gazetteer data extracted from the OpenStreetMap and GeoNames databases. Accordingly, we converted 18 OSM shapefiles into a unified point dataset containing a wide variety of geographic features—ranging from neighbourhoods and roads to public buildings and natural landmarks. This dataset was merged with a version of the GeoNames corpus to create a point-based localized gazetteer representing the Haifa metropolitan area and its environs. For the purpose of geocoding, each point was associated with its place name in Hebrew, Arabic, or English. Following, the geocoding pipeline consisted of the following key steps: (1) NLP techniques including tokenization, stemming, and stop-word removal to extract named entities and spatial references from the tweet’s metadata [9]; (2) FuzzyWuzzy-based string matching algorithm that computes the Levenshtein distances between strings extracted from the tweet tokens and the place names in our gazetteer [10]; and (3) matchings were assigned a confidence score, which allowed us to filter or weight the credibility and accuracy of the spatial data. For example, georeferenced tweets with scores above 90% were deemed highly reliable for spatial trend detection.
The process yielded 16,672 georeferenced tweets distributed across 130 unique localities within Haifa and its close vicinity. Following, we conducted a spatiotemporal inspection by aggregating tweets into 5×5 km grid cells and 8-hour intervals—bins that were informed by the density of localities extracted from the OSM/GeoNames hybrid gazetteer. We used Esri ArcGIS Pro to generate Kernel Density Estimation (KDE) maps [11] and 3D visualizations of the tweets’ distribution. The results presented strong temporal (Figure 1) and spatial (Figure 2) correspondence between georeferenced tweets and the officially reported fire incidents by the Israel Fire and Rescue Services (IFRS). In most of the bins where actual fires were documented, relevant georeferenced tweets were also present. Additionally, tweets often captured the cascading nature of the fire, spreading across Haifa and adjacent towns. Importantly, the spatial granularity enabled us to detect not only urban hotspots but also peripheral areas such as the cities of Daliyat el-Carmel and Akko, which were underrepresented in the media reporting.
OpenStreetMap played a pivotal role in our ability to extract high-resolution geospatial insights from unstructured social media content. Its richness and multilingual tagging schema allowed effective token matching across Hebrew, Arabic, and English tweets. While GeoNames provided broad administrative and populated place names, OSM uniquely offered sub-city level detail—such as parks, fire stations, neighbourhoods, and points of interest—that dramatically enhanced geocoding precision. The findings of the study demonstrate that OSM and GeoNames might function as an open, extensible backbone for disaster informatics, particularly in regions where official geospatial datasets are sparse or restricted. Additionally, this research showcases a replicable model for fusing crowdsourced geographic data (OSM and GeoNames) with user-generated content (Twitter) to inform emergency response at the hyperlocal scale.