WikidataCon 2023

WikidataCon 2023

Making Wikidata Smaller Without Reducing Information | 維基數據簡而不減
2023-10-30 , Main program

(EN)
Walkthrough of ways to drastically reduce the size of Wikidata, both as MediaWiki pages and as Query Service data


(ZH)
說明有什麼方法可以大幅縮減維基數據的大小。


(EN)
Wikidata's growth in recent years has sparked concerns about the likelihood of the collapse of its Query Service and the increasing inability to edit many of its larger items. Much of it stems from a considerably large amount of data within it being stored unnecessarily, both when this information is not actively used elsewhere on Wikidata and when the information represented by this data can be readily and reliably computed in other ways.

This session will highlight ways to keep the amount of information on Wikidata constant while reducing the overall size of its data, both in terms of the lengths of item pages on wikidata.org and the number of RDF triples in the Query Service's store. It will distinguish between several types of action that may be taken, including 1) what can in principle be done right now without affecting existing workflows, 2) what is also possible now but may require acceptable changes to queries for accommodation, and 3) what is not currently feasible since it necessitates software changes and possibly entirely new storage units. It is expected that some of the proposed actions may be controversial, but this session seeks to defend these actions from at least the standpoints of Wikidata's site health, community health, and usability.

It is hoped that, depending on the types of action described, viewers will be inspired to either take these actions directly or encourage those who develop Wikibase, its Lua interface, and the Query Service to make appropriate changes and improvements so that those actions can later be taken.


(ZH)
維基數據在近幾年的成長幅度之大,令人不禁擔憂其查詢服務崩潰的可能性,以及編輯許多大形項目所會遭遇的困難。這些問題皆源於維基數據上相當大量且稍嫌多餘的資料,其中許多更是閒置並缺乏使用的資料,又或是可以透過其他資料輕鬆計算獲得。

本場次的重點將會放在如何保持維基數據的資訊總量不變,同時減少其數據的總體大小,無論是在維基數據上的項目頁面長度,又或是資料查詢服務所蒐錄的 RDF 三元組。根據操作的方式不同可分為以下:
1. 在不影響目前工作流程的前提下可以做的操作
2. 在可能需要些許改變與適應但在實務上具有可行性的操作
3. 以及在現下無法直接進行的操作,可能需要軟體層面的更改又或是需要設計全新的儲存模式。可以預期的是,我們所提出的部分方案是具有爭議性的,然而,本場會議的目的便是將這一議題,站在維護維基數據的整體健全、社群的健全、以及平台的可使用性來與大家進行討論。

我們希望,在經過這場會議的討論可以引起社群成員們對這一議題的重視,並根據上述的不同方案採取行動,又或是鼓勵軟體開發者們採取必要的措施和改進,協助未來這類方案的執行。


Publication Language

英文(en)

Level of Difficulty

簡易

Target Audience

anyone who uses Wikidata (and especially contributors to its massive size)

Choose Venue for Publication

國際純線上(歐美時區友善)

Session Form

線上預錄