2025-10-03 –, Pulag
This talk explores the extrinsic quality of OpenStreetMap data in Brno by comparing the city’s most frequently mapped amenities against a custom, field-collected reference dataset. The findings highlight relatively high attribute accuracy in OSM but reveal gaps in feature completeness, with only about 34.94% of features matched with the reference dataset.
OpenStreetMap is a notable example of a database created by volunteers. Due to its open approach to data collection, establishing trust in the data is essential. Three key factors must be considered to evaluate this trust: completeness, correctness, and positional accuracy. The most common method in recent years for assessing large datasets like OpenStreetMap is to examine intrinsic data quality [1-3], which relies on metadata. However, this approach does not allow for a thorough analysis of the mapped features, resulting in only a rough estimate of the data's trustworthiness. To provide a more detailed understanding of the data, extrinsic data quality is evaluated by comparing OpenStreetMap with a reference dataset. This method is effective for evaluating feature completeness. However, when assessing attribute accuracy, a similarly detailed dataset for comparison is often unavailable. In these instances, the evaluator must gather their own reference dataset, which can be expensive and time-consuming. Because of that, previous studies mainly focused on assessing attribute accuracy by utilizing intrinsic data quality.
In our study, we focused on assessing the extrinsic data quality of the city of Brno in the Czech Republic. We wanted to know how much we can trust OpenStreetMap in our city and if there is some correlation between attribute accuracy and metadata of the features. A secondary objective was to determine how well ISO 19157 can be used to assess the attribute quality of the OpenStreetMap. We chose the city’s ten most mapped amenity features and gathered a reference dataset for these features. Since we knew what we would be evaluating, we have acquired a dataset perfect for evaluating OpenStreetMap. Thus, there was no need for major compromises in data evaluation.
Over the course of several months, we traveled over 1,000 km on foot and gathered a few thousand reference features using the Locus GIS app. Each feature contained a list of evaluated attributes together with a photo of the object for further evaluation. Evaluated amenities were bench, waste_basket, recycling, restaurant, bicycle_parking, cafe, vending_machine, post_box, pub and fast_food – the most mapped node features in Brno.
Our assessment of OpenStreetMap's attribute accuracy revealed generally positive results. Several attributes in our sample achieved 100% accuracy, particularly those with boolean values. However, the most significant issues arose with string attributes that lack defined value lists, such as opening_hours. Ultimately, the primary concern identified was the completeness of the data.
We found that the completeness of feature occurrence is inadequate; we were only able to match 34.94% of all reference features with those in OpenStreetMap. This completeness varied significantly across evaluated amenities, with waste_baskets and benches being notably underrepresented.
We also assessed the positional accuracy of the data. Each amenity was evaluated separately, revealing average positional errors ranging from 2.63 meters to 4.02 meters. The median error was found to be between 1.83 meters and 3.13 meters. Overall, OpenStreetMap appears to be a relatively accurate positional database for the city of Brno, despite some isolated deviations (outliers).
When examining the relationship between attribute accuracy and feature metadata, we assumed that more users editing a feature would lead to more accurate data. This concept is known as the “many eyes principle” [4, 5]. However, the correlations between metadata (such as the number of contributors, versions, and days since the last edit) and attribute correctness are typically not statistically significant. As a result, no explicit dependency can be determined, and no clear patterns emerge from the statistically significant values.
Our work also shows that evaluating OpenStreetMap using ISO 19157 can be problematic because this standard does not consider multiple correct values or the varying degrees of attribute correctness (“level of detail” of attributes). Additionally, automating the evaluation of specific attributes, such as opening_hours, is challenging since these attributes can contain different yet correct values. Furthermore, missing or incomplete documentation significantly impacts evaluation, as there should be clear rules indicating which values are correct or not. However, achieving this is difficult for projects that rely on the folksonomy principle, which encourages users to create new values that often lack documentation.
OpenStreetMap offers a comprehensive database, but its data quality varies significantly depending on the main tag and geometry type. In our sample data, both attribute and positional accuracy are reasonably good. However, a more significant issue is incompleteness, which includes missing and poorly mapped features. It is important to note that only a small subset of the database has been evaluated. The overall accuracy and completeness can vary significantly around the world, as demonstrated by numerous studies [6, 2, 7, 8]. This research aimed to assess the area of Brno while also testing how ISO 19157 can be applied to evaluate OpenStreetMap. Comparing Brno with a rural area or a city of a similar size in another country would provide valuable insights. However, this evaluation is very time-consuming due to the demands of actual data collection and subsequent processing. The uniqueness of this study is emphasized by the fact that a similar research effort, which uses collected data specifically to evaluate OpenStreetMap, has not yet been conducted.
PhD student of Cartography, Geoinformatics and Remote Sensing at the Masaryk University in Brno, Czechia
Part of the Missing Maps CZ & SK community
OpenStreetMap enthusiast
Remote Sensing, Machine Learning and GIS analyst