Assessing the quality of sources in Wikidata across languages
2021-10-31, 14:10–14:20 (UTC), Room 2

Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers. As a secondary source, its contents must be backed by credible references; this is particularly important as Wikidata explicitly encourages editors to add claims for which there is no broad consensus, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata’s ability to systematically assess and assure the quality of its references remains limited. To this end, we carry out a mixed-methods study to determine the relevance, ease of access, and authoritativeness of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics, and machine learning. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.


What will the participants take away from this session?

Participants will know what the current state of reference quality in Wikidata is. They will learn how content taken from specific languages can drift towards specific domains, and how that affects how a non-English speaking person might experience reference-checking in Wikidata. We open the floor for discussions at the end in the hope for participants to try debating solutions or remedying steps.

Language

English

Recording

Yes

Other links

Assessing the quality of sources in Wikidata across languages: a hybrid approach: https://arxiv.org/abs/2109.09405

Link to notes

https://etherpad.wikimedia.org/p/WikidataCon2021-Assessingthequalityofsourcesin

Gabriel Amaral is a computer scientist, graduated summa cum laude from the Federal University of Ceará, a PhD candidate at King's College London and a Marie Curie fellow. He is part of the Marie Curie European training network Cleopatra, which delivers approaches and technologies to build and use large-scale, multilingual knowledge graphs. His PhD thesis tackles the quality of references and the verification of claims found in Wikidata.

This speaker also appears in: