WikidataCon 2021

Gabriel Maia

Gabriel Amaral is a computer scientist, graduated summa cum laude from the Federal University of Ceará, a PhD candidate at King's College London and a Marie Curie fellow. He is part of the Marie Curie European training network Cleopatra, which delivers approaches and technologies to build and use large-scale, multilingual knowledge graphs. His PhD thesis tackles the quality of references and the verification of claims found in Wikidata.


Sessions

10-31
10:00
55min
Scientific greetings
Tiago Lubiana, David Abián, Gabriel Maia

What aspects of Wikidata do you research? Which ones do you find challenging? In this condensed session each researcher will have the opportunity to introduce themselves and their work to other colleagues in just around 5 minutes. They will also be able to ask for help, offer collaboration and find out what other colleagues are working on.

Due to time constraints, please sign up for the session on Wikidata as soon as possible if you want to speak.

Education & science
Room 2
10-31
14:10
10min
Assessing the quality of sources in Wikidata across languages
Gabriel Maia

Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers. As a secondary source, its contents must be backed by credible references; this is particularly important as Wikidata explicitly encourages editors to add claims for which there is no broad consensus, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata’s ability to systematically assess and assure the quality of its references remains limited. To this end, we carry out a mixed-methods study to determine the relevance, ease of access, and authoritativeness of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics, and machine learning. The findings help us ascertain the quality of references in Wikidata, and identify common challenges in defining and capturing the quality of user-generated multilingual structured data on the web.

Education & science
Room 2