A novel application of models of species abundance to better understand OpenStreetMap Community structure and interactions
2019-09-22 , Hörsaal West

The OpenStreetMap (OSM) community is a global community crossing cultures, languages, and geographical boundaries. Researchers have been working to develop automated approaches to understanding the composition of this community through their contributions to the OSM database. In this talk we propose a new and novel application of theories and models of species abundance from ecological science to understand contributor community structure and distributions in OSM.


**Motivation: **
Community is a word that evokes different images for different people. Socially, we as humans require interaction with other people and society is built around people coming together into social groups we call ‘community’. Communities identify different groups and very often the bond within these communities in a set of shared goals and the division of sharing of labour and skills among other resources. Indeed some scholars believe that the feeling of contributing positively to our own communities is one of the most fundamental feelings of satisfaction in life (Proctor, 2013). In all of these ways the now millions of contributors to OSM form the OSM Community. Attempts to understand how the OSM community works have appeared in the academic literature. Amongst the research community there is a curiosity and fascination about the OSM community given: the global extent of OSM crossing cultures, geographical boundaries and languages; the altruistic nature of its members; and its obvious success as a primarily Internet-based community different to almost every non-crowdsourced community we know from our everyday lives.

**State of the Art: ** In this talk we shall argue that the model of community required for OSM is more nuanced that many of the current quantitative approaches. Neis et al (2013) amongst others have used concepts of junior, senior, local, external mappers which does capture the distribution of contributors to OSM well. OSM has been shown to loosely adhere to the 90-9-1 rule of Neilsen (2012) which highlights that about 90% of the members of community-based projects are usually only consuming the collaboratively collected information, while 9% occasionally contributes to the project and only 1% demonstrate a very active pattern of contribution of activity. As Begin (2018; PhD Thesis) argues 'characterizing Volunteered Geographic Information (VGI) data requires understanding contributors’ behaviour and many typologies of contributors are proposed in an attempt to link VGI contributors with the nature of the data they provide'. In Begin et al (2018) the authors identifies the different phases of contributor life cycle from a temporal perspective as a contributor's lifespan is a 'university metric'. In a more computationally complex approach Truong et al (2018) develop a multigraph approach with data mining to characterise individuals and identify behavioural groups. The implementation of a multiplex network based on an OSM data sample and an initial analysis make it possible to identify useful behaviours.

Methodology: We consider a very novel approach to community identification and understanding by borrowing concepts and methodologies from theories and models of species abundance to the individual contributors of the OSM community. This is a novel approach in VGI but a decades old and mature branch of Ecological Science. As Hughes (1986) points out "It is a common observation that in samples from animal and plant communities most of the individuals belong to a small number of abundant species, whereas most of the species are represented by a small number of individuals". In OSM we see that most individual contributors make a small number of edits. However, from the global OSM community, a small number of species (groupings) are represented by a small number of contributors. For example contributors who have contributed thousands of GPS traces or thousands of building objects.

We use the OSM Planet History data for a number of selected regions to consider the contribution history of those OSM community members who have contributed in those regions. All software is developed in Python. We then develop and apply the Community Level Modesl (CLMs) from Maguire et al (2016) and others. We define different types of OSM community member species. Species characteristics are based on contribution history and patterns and can be easily changed. For example, we may create a species which are differentiated by the number of OSM Relations they have created/edited. We could decide on three species groups: 0 - 10 relations, 10 - 100 relations or greater than 100. More sophisticated species can be developed. CLMs allow the creation of a species co-occurrence matrix to environmental variables (such as quantity of edits, types of tagging used, etc) which allows prediction of the community structure and the distributions of individual species. Maguire et al (2016) argue that in ecological communities CLMS have the potential to predict species distributions and changes in the community composition more accurately than other models such as species distribution models (SDM).

Assuming that contributors to OSM exist in isolation and do not influence one another's distributions potentially limits our ability to understand the patterns of contribution in the community. The application of this innovative approach from Ecological Science means we can potentially better understand the interactions between contributors. This has the future potential for improved iterations between experienced contributors and new entrants and the organisation of local events such as mapping parties.

See also: