WikidataCon 2023

WikidataCon 2023

Research Data Repositories and Wikidata | 維基數據與研究資料儲存庫
2023-10-28 , Main program

The depositar is an open repository developed in Academia Sinica, Taiwan. The service is free: Anyone can deposit and access research datasets on the depositar. The depositar features a Wikidata-based keyword system where datasets in its collection can be annotated and filtered by Wikidata items. The feature was introduced in 2019. In this presentation, we aim to talk about some of ideas to further extend the Wikidata keyword feature in the depositar.

研究資料寄存所 (depositar)是臺灣中央研究院的開放資料庫。這是一個免費的資料庫服務:任何人都可以儲存並使用在研究資料寄存所上的研究資料集。並且研究資料寄存所內建有維基數據相同的關鍵字系統,資料集可以透過這個系統被標註並搜尋。該功能於 2019 年被新增進入 研究資料寄存所。在本場次中,我們將會分享一些我們打算如何擴展維基數據關鍵字系統的計畫。

Research data repositories are websites where datasets are deposited, aggregated, distributed, and discovered by researchers and the public. The repositories are online places entrusted by the communities to preserve and share data for the common good. To help search and discovery, data repositories use various vocabularies to describe and annotate datasets in their collections. The depositar is an open repository developed in Academia Sinica, Taiwan. The service is free: Anyone can deposit and access research datasets on the depositar. The depositar features a Wikidata-based keyword system where datasets in its collection can be annotated and filtered by Wikidata items. The feature was introduced in 2019. In this presentation, we aim to talk about some of ideas to further extend the Wikidata keyword feature in the depositar.

One of the ideas is to present and explore the relationship among the datasets in the collection based on their annotated Wikidata keywords as well as the Wikidata properties between the keywords. Such semantic exploration need not be a builtin feature of the depositar as the repository already publishes via data APIs the metadata of all the datasets in its collections (including their Wikidata keywords). This semantic exploration can be a standalone application and it can further rely on the Wikidata services for functionalities. Another idea is to suggest Wikidata keywords from the natural language descriptions given by the users to the datasets as part of their metadata.

We will also explore the scenarios where multiple data repositories all publish via data APIs the Wikidata keywords that have been used to annotate their collections. Starting from this, we can have a basis for federated semantic search across data repositories that is driven by the Wikidata infrastructure and its worldwide communities.

Choose Venue for Publication:


Session Form:


Publication Language:


Level of Difficulty:


Target Audience:

researchers, data stewards, students, data managers, developers, Wikidata users

Tyng-Ruey Chuang is an Associate Research Fellow at the Institute of Information Science, Academia Sinica, Taiwan, with joint appointments at both the Research Center for Humanities and Social Sciences (Center for GIS) and the Research Center for Information Technology Innovation.

He leads the depositar lab.

莊庭瑞是臺灣中央研究院人文社會科學研究中心以及資訊科技創新研究中心的副研究員。同時他也是 depositar lab 的負責人。

Cheng-Jen works for the Institute of Information Science, Academia Sinica, Taiwan. He leads the technical development of the depositar. His recent research interests are data exchange standards, deployment automation and long-term maintenance of information systems. Utilizing the above technologies, he aims to achieve the sustainability of the depositar. He is also a senior Python language user.

李承錱任職於臺灣中央研究院的資訊科學研究所,同時也是研究資料寄存所專案的首席技術開發員。他最近研究重心偏重於資料流通標準、資訊系統自動化部屬與長期維運,期許透過導入相關技術,以因應研究資料寄存所的持續發展需求。同時他也是一名資深的 Python 語言愛用者。