Gerrit Gragert
M.A. LIS and Computer Science. Team lead "IT Services for the Digital Library" at Berlin State Library. My Field of work are all electronic ressources of the State Library of any kind (on- and off premise) and supporting the Digital library technically. Also lecturer at the Institute for Library- and Information Science of the HU Berlin.
Session
For several years, the specialised Information Service Asia (CrossAsia) of the Berlin State Library collected a huge amount of digital texts. The data were licensed through licensing agreements, with text and data-mining rights (TDM) also included. To avoid the dilemma of providing the texts for TDM without revealing them, we discovered the new decentralised Gaia-X infrastructure based on the Ocean Protocol as one possible solution.
The Ocean Protocol implements the Compute-To-Data (CtD) approach, where an algorithm is sent to the data and not the other way around. Users do not have access to data that are subject to licences. The data stays on premise.
To explore the possibilities of this approach, we carried out a small Proof-of-Concept. As part of this, we set up a dedicated portal for the Gaia-X-Test-Network (https://sbb.portal.minimal-gaia-x.eu/) and published data sets for CtD. Selected data scientists were able to run their algorithms on the data.
This presentation will report on the details of the Proof-of-Concept and will give an overview of the discovered advantages and disadvantages of the dynamic Gaia-X Infrastructure. At the same time, it will identify important questions around the workflows and implementation that we know we need to find answers to. This will be linked reflections on what still needs to be accomplished for this approach to develop into an infrastructure for the digital humanities and machine learning.