WikidataCon 2025
Opening Session
Over the past year, the Wikibase Reuse team has been learning how people access Wikidata's data, what challenges they run into, and have rolled out some small solutions! In the coming year, we’re planning more improvements to make access easier, and we’d love to share some of our ideas and hear your thoughts and questions.
Lexica is a web-based, mobile-friendly tool that makes it easy for anyone to contribute to Lexemes in Wikidata, including undeserved languages. With a simple interface, Lexica helps users link Lexemes to Items, add script variants, and apply hyphenation. In this session, we will demonstrate how Lexica can be used to add lexicographical data, and to facilitate more inclusive contributions across language communities.
Editing lexicographical data on Wikidata shows that every language stretches the model in its own way. Across 30+ languages, we have seen patterns and divergences in modeling Lexemes.
This session shares those observations plainly, with a focus on underserved and less-documented languages where contributors often work without much guidance. While sharing our observations, we will highlight languages that already have established practices (like Turkish, German, and French), as examples others can follow.
These reflections build directly on conversations happening in the Lexicographical Data community (from Telegram, Talk pages, and evolving documentation pages), and are shared here as learning experiences. We will also briefly show how we tried to fold these lessons into our approach when we built a tool to edit Lexemes, as an example of how contributors' pain points shape tool design.
Participants will leave with grounded examples of what works, where challenges remain, and ideas for how to approach contributing or building tools that handle linguistic diversity realistically.
We love Wikidata. We love biodiversity. And we connect both!
This presentation will build on the Wikimedia and Biodiversity Data session at Living Data 2025 (https://meta.wikimedia.org/wiki/Event:Living_Data_2025).
It will be a whirlwind through some biodiversity+Wikidata activities, including connections with iNaturalist, GBIF, the Biodiversity Heritage Library, and the WikiProject Biodiversity. Participants will see fun tools, beautiful images, and a thriving community.
Paulina is a Wikidata-based tool for the GLAM community that facilitates searching for authors and works, helps identify their copyright status in different countries, and provides access to works when available. At Wikimania 2025, the Paulina tool won the Coolest Tool Award in the category Most Innovative.
In this session, we want to showcase the application's latest features, share what new features we're considering, and gather feedback from the GLAM community on what new features they'd like to see implemented in Paulina.
The Wikidata ontology is large, multi-domain, and community-created. This results in a considerable number of issues, such as ambiguous classes and questionable subclass relationships, disjointness violations, confusion between subclass and instance, and different modeling decisions in different domains, that undermine reliability and limit or impair use of the ontology. Less attention has been paid to finding and fixing these issues than adding new classes and domains. Improving the Wikidata ontology will take a combination of better tools to help in finding and fixing existing issues, better tools to help editors avoid creating issues, and a change in the community to promote better ontology design. This need not be done solely in a top-down rigid fashion but should instead include creation adoption of coherent, well-described ontoological principles that gain acceptance through use. The goal is not perfection, but an ontology sufficiently cohesive and consistent to enable robust inferencing and use in applications.
The Authority Control template is used to link content in Wikipedia to libraries and databases as a pathway for disambiguation and ensuring consistency. However, it was observed that this template links Wikipedia content about Africa and Africans to libraries and databases outside of the continent. This was due to the lack of Authority File Control systems within the African library sector. Through the Knowledge Equity Fund, the African Library and Information Associations and Institutions (AfLIA), with membership in 34 African countries addressed that gap by creating the National Library of Nigeria Semantic Name Authority Repository (NLN SNAR). AfLIA is also using NLN SNAR as a model for developing semantic Authority files on Wikibase for other National libraries within the continent. This is considered as a major step towards instituting a robust Authority control for Africa’s library sector as the same relationships, entities and data models would be used as Wikibase answers the question of centralization or decentralization of data.
This presentation aims to give a quick overview about the relationships and collaborations between Wikidata and library authority files, dealing with reconciliation methods and with the ways in which the Wikidata community and the cataloguers editing the authority files can cooperate to raise the quality of the respective data; examples of existing collaborations will also be shown.
Rewriting scholarly SPARQL queries for the graph split: Tiago Lubiana
Building on a previous presentation at WikiCite 2025, we will show an overview of the process that led to the graph split on Wikidata and walk participants through rewriting SPARQL queries. The session will present some of the tricks for adapting queries to the split, including internal federation and Blazegraph hints. The session will build capacity towards the rewrite of scholarly queries, with a particular focus on supporting the Scholia platform, as well as briefly discuss how queries can be prepared for a future transition (hint: staying as close as the core syntax of the SPARQL standard as possible).
Federating SPARQL queries involving Wikibase instances: Daniel Mietchen
Federated queries make it possible to connect knowledge across different SPARQL endpoints, enabling richer insights than any single dataset can provide. For the Wikibase ecosystem, this is especially powerful, as researchers, institutions, and community projects often maintain their own Wikibase instances, and being able to query across several of them (including Wikidata) opens new opportunities for discovery, reuse, and collaboration.
Scholia has become a central tool for exploring scholarly information in Wikidata, generating profiles for authors, topics, institutions, journals, and more. In this talk, we will provide an overview of Scholia’s current state: how it is used, what has changed in its infrastructure, and where it is headed. A major theme will be the 2025 Wikidata Graph Split, which directly impacts how Scholia retrieves and processes data. We will illustrate how Scholia has adapted to the new split between the main and scholarly graphs, including adjustments to queries and the use of SPARQL federation. Beyond this technical shift, we will also look at ongoing development, community contributions, and future challenges for sustaining and extending Scholia in the evolving Wikidata ecosystem.
Abstract Wikipedia aims to provide global, multilingual access to knowledge by separating content from language, allowing it to be expressed in any natural language. Wikidata lies at the very core of this vision. In this talk, we will show how Wikidata already works with Wikifunctions, powering functions that generate plain text and HTML outputs, and will look ahead to how abstract content may be represented and the role Wikidata will play in this.
How the (project)[https://de.wikipedia.org/wiki/Wikipedia:WikiProjekt_Stolpersteine_goes_Wikidata] tries to move the vast knowledge kept in Wikipedia lists into Wikidata and then back to Wikipedia.
Wikidata just turned 13. In this session we will take a look at what's been happening over the past year and what's ahead.
Wikidata makes amazing applications possible. In this panel we will hear from different projects using Wikidata's data for social good. They range from citizen participation, to fact checking to understanding the non-profit landscape better.
This presentation introduces the Model Context Protocol (MCP), an open-source standard for integrating AI models with external tools and data sources. We present a Wikidata MCP server that provides LLMs with core functionalities including semantic and keyword search for entity discovery, property exploration, relationship retrieval, and SPARQL query execution. This approach addresses key AI limitations by minimizing identifier hallucinations and incorrect assumptions about Wikidata's structure in tasks such as SPARQL query generation.
We also present a Wikidata vector database that enables semantic search across Wikidata's data, allowing LLMs to discover conceptually similar items even when exact terminology is unknown.
Discover how SmartGuide, an AI-powered digital guide platform for tourist attractions and destinations, leverages Wikidata to create immersive and personalized tourism experiences. This presentation will showcase how SmartGuide builds self-guided experiences on top of Wikidata's rich datasets, alongside other open data sources like the German National Tourism Board's Knowledge Graph, to enhance the visitor experience and contribute to a broader understanding of cultural heritage. We will share insights on how we combine Wikidata with proprietary UGC and AI copywriting, how we built recommendation engine and use analytics data to make tourism more rewarding and sustainable.
Explore not just how lexemes power abstract content, but also how different languages can help each other with their lexemes! In a version of the Mad Libs party game, some abstract content (like the kind that might exist for the Abstract Wikipedia) will be presented with blanks to be filled in. Participants will select Wikidata items which in different languages have compound nouns–and sometimes verbs–to fill in those blanks. The resulting abstract content is then rendered to yield a (hopefully hilarious!) story about a situation. Participants will also have the opportunity during the game to fix lexemes in different ways, should rendering issues arise, corresponding to the various ways that items may be transformed into words and phrases in various languages.
People with cochlear implants often require extensive training to adjust to the different, more mechanical soundscapes. Some professional applications offer up to a few dozen examples, but they do not always provide the option of training at home. An application called "Dacit" was developed as part of a master's thesis to give patients the opportunity to listen and train with a larger corpus, sourced from Wikidata, using pronunciation audio files from Wikimedia Commons. We demonstrate the self-reported improvements of patients and discuss the future of the app.
Closing Session
A selection of Lightning Talks from (GLAM Wiki 2025)[https://meta.wikimedia.org/wiki/GLAM_Wiki_2025] have been recorded and will be shown back-to-back in this extended session.