WikidataCon 2025

Opening Session

Opening Session

17:15

What if Wikidata had a "WikiProjects first" philosophy?

Jan Ainali

WikiProjects has been around for a very long time in the Wikimedia movement. But still, they are mostly built by users themselves with little supporting infrastructure. At the same time, Wikidata has a unique potential to use WikiProjects due to the millions of items that otherwise easily slip through the cracks and the flexibility in modeling things in various ways. In this lightning talk, I will paint a picture of a future Wikidata that uses WikiProjects in a radically different way to inspire some thoughts and discussions.

17:30

How we make it easy to access Wikidata's data

Ifrah_WMDE

Over the past year, the Wikibase Reuse team has been learning how people access Wikidata's data, what challenges they run into, and have rolled out some small solutions! In the coming year, we’re planning more improvements to make access easier, and we’d love to share some of our ideas and hear your thoughts and questions.

18:00

WikiProjekt Stolpersteine goes Wikidata

Max Kristen

How the (project)[https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Stolpersteine_goes_Wikidata] tries to move the vast knowledge kept in Wikipedia lists into Wikidata and then back to Wikipedia.

Providing the World with Good Data

18:30

Observations on Lexeme Modeling Across Languages

Raisha Abdillah

Editing lexicographical data on Wikidata shows that every language stretches the model in its own way. Across 30+ languages, we have seen patterns and divergences in modeling Lexemes.
This session shares those observations plainly, with a focus on underserved and less-documented languages where contributors often work without much guidance. While sharing our observations, we will highlight languages that already have established practices (like Turkish, German, and French), as examples others can follow.

These reflections build directly on conversations happening in the Lexicographical Data community (from Telegram, Talk pages, and evolving documentation pages), and are shared here as learning experiences. We will also briefly show how we tried to fold these lessons into our approach when we built a tool to edit Lexemes, as an example of how contributors' pain points shape tool design.

Participants will leave with grounded examples of what works, where challenges remain, and ideas for how to approach contributing or building tools that handle linguistic diversity realistically.

Lexemes and Languages

19:00

Wikibase as a Data Sharing Space: Connecting Rights, Communities, and GLAM through Federated Infrastructures

antaldaniel

Wikibase is often seen as a tool for semantic and technical interoperability. But the European Interoperability Framework (EIF) reminds us that digital public services — from GLAM to statistical infrastructures — require alignment on four layers: legal, organisational, semantic, and technical. Our experience shows that while the semantic and technical layers are well developed in the Wikibase/Wikidata ecosystem, the legal and organisational layers are decisive if we want true federation and co-curation across institutions and communities.

This presentation introduces three diverse Wikibase-based Data Sharing Spaces (DSSs): the Slovak Comprehensive Music Database, the Finno-Ugric DSS, and TextileBase. These cases demonstrate two distinctive curatorial challenges:
- Finno-Ugric DSS: reconnecting dispersed minority heritage across several countries and languages, where community epistemologies must be reconciled with institutional metadata.
- Slovak Comprehensive Music Database: cutting across unusually complex public/private boundaries, where rights agencies, libraries, and the National Library operate under very different data governance regimes.

By combining Wikibase with pragmatic legal and organisational design, these DSSs achieve the kind of federation that European policy papers and digital heritage scholarship often describe but rarely realise. We will also share our conceptual multilingual Wikimuseum exhibition, co-curated with Wikimedia Eesti and several museums and to be presented at Wikimedia CEE 2025, as a live example of how dispersed collections can be federated into a co-curated digital service.

Linked Open Data Ecosystem

19:30

Wikidata and Biodiversity — a match made for Earth

Tiago Lubiana

We love Wikidata. We love biodiversity. And we connect both!

This presentation will build on the Wikimedia and Biodiversity Data session at Living Data 2025 (https://meta.wikimedia.org/wiki/Event:Living_Data_2025).

It will be a whirlwind through some biodiversity+Wikidata activities, including connections with iNaturalist, GBIF, the Biodiversity Heritage Library, and the WikiProject Biodiversity. Participants will see fun tools, beautiful images, and a thriving community.

Providing the World with Good Data

20:00

Paulina: A Cool Tool for Exploring the Public Domain

Jorge Gemetto

Paulina is a Wikidata-based tool for the GLAM community that facilitates searching for authors and works, helps identify their copyright status in different countries, and provides access to works when available. At Wikimania 2025, the Paulina tool won the Coolest Tool Award in the category Most Innovative.

In this session, we want to showcase the application's latest features, share what new features we're considering, and gather feedback from the GLAM community on what new features they'd like to see implemented in Paulina.

20:30

A Call to Improve the Wikidata Ontology

Egezort, Peter Patel-Schneider

The Wikidata ontology is large, multi-domain, and community-created. This results in a considerable number of issues, such as ambiguous classes and questionable subclass relationships, disjointness violations, confusion between subclass and instance, and different modeling decisions in different domains, that undermine reliability and limit or impair use of the ontology. Less attention has been paid to finding and fixing these issues than adding new classes and domains. Improving the Wikidata ontology will take a combination of better tools to help in finding and fixing existing issues, better tools to help editors avoid creating issues, and a change in the community to promote better ontology design. This need not be done solely in a top-down rigid fashion but should instead include creation adoption of coherent, well-described ontoological principles that gain acceptance through use. The goal is not perfection, but an ontology sufficiently cohesive and consistent to enable robust inferencing and use in applications.

Providing the World with Good Data

21:15

The first day of WikidataCon may be over but the fun doesn't end yet - join us for a fun round of games, powered by Wikidata!

This session will not be recorded.

09:00

Robust authority control systems for Africa using Wikibase

Nkem Osuigwe PhD, Human Capacity Development & Training (Director), AfLIA, DOREEN APPIAH, Stanley Boakye-Achampong

The Authority Control template is used to link content in Wikipedia to libraries and databases as a pathway for disambiguation and ensuring consistency. However, it was observed that this template links Wikipedia content about Africa and Africans to libraries and databases outside of the continent. This was due to the lack of Authority File Control systems within the African library sector. Through the Knowledge Equity Fund, the African Library and Information Associations and Institutions (AfLIA), with membership in 34 African countries addressed that gap by creating the National Library of Nigeria Semantic Name Authority Repository (NLN SNAR). AfLIA is also using NLN SNAR as a model for developing semantic Authority files on Wikibase for other National libraries within the continent. This is considered as a major step towards instituting a robust Authority control for Africa’s library sector as the same relationships, entities and data models would be used as Wikibase answers the question of centralization or decentralization of data.

Linked Open Data Ecosystem

09:30

Wikidata and authority files: an overview

Camillo Pellizzari

This presentation aims to give a quick overview about the relationships and collaborations between Wikidata and library authority files, dealing with reconciliation methods and with the ways in which the Wikidata community and the cataloguers editing the authority files can cooperate to raise the quality of the respective data; examples of existing collaborations will also be shown.

Rewrite scholarly SPARQL queries for the graph split + Federating SPARQL involving Wikibase

10:00

60min

Tiago Lubiana, Daniel Mietchen

Rewriting scholarly SPARQL queries for the graph split: Tiago Lubiana
Building on a previous presentation at WikiCite 2025, we will show an overview of the process that led to the graph split on Wikidata and walk participants through rewriting SPARQL queries. The session will present some of the tricks for adapting queries to the split, including internal federation and Blazegraph hints. The session will build capacity towards the rewrite of scholarly queries, with a particular focus on supporting the Scholia platform, as well as briefly discuss how queries can be prepared for a future transition (hint: staying as close as the core syntax of the SPARQL standard as possible).

Federating SPARQL queries involving Wikibase instances: Daniel Mietchen
Federated queries make it possible to connect knowledge across different SPARQL endpoints, enabling richer insights than any single dataset can provide. For the Wikibase ecosystem, this is especially powerful, as researchers, institutions, and community projects often maintain their own Wikibase instances, and being able to query across several of them (including Wikidata) opens new opportunities for discovery, reuse, and collaboration.

https://etherpad.wikimedia.org/p/wikidata-con-2025-sparql

11:00

The current state of Scholia

Daniel Mietchen, Finn Årup Nielsen, Egon Willighagen, Lane Rasberry

Scholia has become a central tool for exploring scholarly information in Wikidata, generating profiles for authors, topics, institutions, journals, and more. In this talk, we will provide an overview of Scholia’s current state: how it is used, what has changed in its infrastructure, and where it is headed. A major theme will be the 2025 Wikidata Graph Split, which directly impacts how Scholia retrieves and processes data. We will illustrate how Scholia has adapted to the new split between the main and scholarly graphs, including adjustments to queries and the use of SPARQL federation. Beyond this technical shift, we will also look at ongoing development, community contributions, and future challenges for sustaining and extending Scholia in the evolving Wikidata ecosystem.

11:30

Wikidata and Abstract Wikipedia: the Now and the Future.

Genoveva Galarza Heredero

Abstract Wikipedia aims to provide global, multilingual access to knowledge by separating content from language, allowing it to be expressed in any natural language. Wikidata lies at the very core of this vision. In this talk, we will show how Wikidata already works with Wikifunctions, powering functions that generate plain text and HTML outputs, and will look ahead to how abstract content may be represented and the role Wikidata will play in this.

12:15

Arabic Wikidata Days: Growing a Regional Community around Linked Open Data

Houcemeddine Turki, Nanour Garabedian

In 2024, we launched Arabic Wikidata Days as part of Wikidata’s 12th Birthday celebrations, with the goal of empowering Arabic-speaking contributors to engage more deeply with Wikidata. The event was organized into complementary tracks tailored to diverse contributor needs: a Data Science Track for researchers and developers exploring advanced tools and queries, and Wikimedia Contributor Tracks focused on practical editing, data modeling, and integration skills, together providing a comprehensive learning journey for the Arabic Wikidata community.

This presentation will share the story of how the event was developed, organized, and delivered by the Wikidata Arabic Community (MENA) in collaboration with Wikimedia affiliates and universities across the region. We will highlight the challenges we faced, the strategies we used to engage volunteers, and the concrete benefits for participants, from new contributors who learned the basics of Wikidata to researchers who applied Wikidata tools in data science projects.

Looking ahead, we will also present our plans for Arabic Wikidata Days 2025, focusing on how we intend to expand partnerships, scale training opportunities, and further integrate Arabic-language knowledge into the global open data ecosystem.

12:30

Very Small GLAM: pilot experience and current results

Ismael Olea

The Very Small GLAM initiative looks for reducing entry barriers for very small GLAM entities using Wikibase, opensource software and affordable hardware. It's being developed through the implementation of the digital archive for history of the cinema industry in Almería by LaOficina Producciones Culturales as a first hand practical driver. A Very Small GLAM (VSG) solution should attend for multimedia knowledge graphs, suitable for GLAM projects, and affordable for very small organizations or groups.

12:45

Mateus Santos, Tamiris Volcean, Eduardo Neves

AletheiaFact.org is developing a disinformation triage system that leverages Wikidata's structured knowledge to automatically process, prioritize, and organize fact-checking verification requests. Our work-in-progress research demonstrates how Wikidata entity recognition and topic classification can transform chaotic public submissions into structured, prioritized workflows.

This presentation showcases our workflow; from initial request receipt through vandalism detection, entity/topic processing, priority assessment, and intelligent grouping with existing requests. We'll demonstrate live prototypes of "passes" that use Wikidata to identify duplicate claims, cluster related topics, and automatically flag high-priority verification needs based on entity significance and public interest patterns.

The research addresses a critical bottleneck in fact-checking: efficiently processing thousands of public verification requests while maintaining quality and avoiding duplication. By using Wikidata as the backbone for content organization, we're creating scalable approaches that could benefit the entire fact-checking ecosystem.

13:00

Time for games - Round 2

The fun continues as we close out the 2nd day of WikidataCon with another round of Wikidata-fuelled games!

17:00

State of Wikidata: People-powered and trusted infrastructure for a changing world

Lydia Pintscher

Wikidata just turned 13. In this session we will take a look at what's been happening over the past year and what's ahead.

17:30

Lydia Pintscher, Jona, Jan Ainali, Mateus Santos, Friedrich Lindenberg

Wikidata 4 Social Good

Wikidata makes amazing applications possible. In this panel we will hear from different projects using Wikidata's data for social good. They range from citizen participation, to fact checking to understanding the non-profit landscape better.

18:15

Wikidata MCP: Exploring Wikidata with AI

Philippe Saadé

This presentation introduces the Model Context Protocol (MCP), an open-source standard for integrating AI models with external tools and data sources. We present a Wikidata MCP server that provides LLMs with core functionalities including semantic and keyword search for entity discovery, property exploration, relationship retrieval, and SPARQL query execution. This approach addresses key AI limitations by minimizing identifier hallucinations and incorrect assumptions about Wikidata's structure in tasks such as SPARQL query generation.

We also present a Wikidata vector database that enables semantic search across Wikidata's data, allowing LLMs to discover conceptually similar items even when exact terminology is unknown.

18:45

PoliLoom: Verification-First AI for Political Data in Wikidata

Brenna Maeve, Johan Schuijt

Wikidata has the potential to be a global record of political leadership, but keeping this information current, verifiable, and globally inclusive is a challenge. PoliLoom offers a pipeline that pairs AI-assisted extraction with human oversight to address this need.

PoliLoom is an experiment in “verification-first AI” for Wikidata. The project extracts candidate statements about politicians from Wikipedia using large language models, reconciles them to Wikidata items with similarity search, and presents each claim with archived source text and highlighted proof lines. Through a review interface tied to MediaWiki OAuth, contributors approve or reject claims with clear evidence.

This session will show how PoliLoom combines automation with human verification to maintain accurate, time-bound records of political office. It demonstrates a scalable pipeline that processes full Wikidata dumps, handles semantic reconciliation, and generates structured statements ready for integration. The talk will highlight how this work supports developers, GLAM professionals, researchers, language advocates, community organizers in building a verifiable and inclusive democratic memory on Wikidata.

19:30

SmartGuide: Creating immersive personalized experiences on top of Wikidata

Jan Dolezal

Discover how SmartGuide, an AI-powered digital guide platform for tourist attractions and destinations, leverages Wikidata to create immersive and personalized tourism experiences. This presentation will showcase how SmartGuide builds self-guided experiences on top of Wikidata's rich datasets, alongside other open data sources like the German National Tourism Board's Knowledge Graph, to enhance the visitor experience and contribute to a broader understanding of cultural heritage. We will share insights on how we combine Wikidata with proprietary UGC and AI copywriting, how we built recommendation engine and use analytics data to make tourism more rewarding and sustainable.

20:00

Mad Libs with Wikidata Lexemes and Abstract Content

Mahir Morshed

Explore not just how lexemes power abstract content, but also how different languages can help each other with their lexemes! In a version of the Mad Libs party game, some abstract content (like the kind that might exist for the Abstract Wikipedia) will be presented with blanks to be filled in. Participants will select Wikidata items which in different languages have compound nouns–and sometimes verbs–to fill in those blanks. The resulting abstract content is then rendered to yield a (hopefully hilarious!) story about a situation. Participants will also have the opportunity during the game to fix lexemes in different ways, should rendering issues arise, corresponding to the various ways that items may be transformed into words and phrases in various languages.

Lexemes and Languages

20:45

Training Hearing with Lexemes

Max Kristen, Thomas Kristen

People with cochlear implants often require extensive training to adjust to the different, more mechanical soundscapes. Some professional applications offer up to a few dozen examples, but they do not always provide the option of training at home. An application called "Dacit" was developed as part of a master's thesis to give patients the opportunity to listen and train with a larger corpus, sourced from Wikidata, using pronunciation audio files from Wikimedia Commons. We demonstrate the self-reported improvements of patients and discuss the future of the app.

21:00

Closing Session

Closing Session

GLAM Wiki 2025: Selection of Lightning Talks

21:15

90min

A selection of Lightning Talks from (GLAM Wiki 2025)[https://meta.wikimedia.org/wiki/GLAM_Wiki_2025] have been recorded and will be shown back-to-back in this extended session.