2023-10-11 –, Conference room
DACE is an open-source Data Aggregation and proCessing Engine that has been developed by PSNC since 2019. It serves FBC (the Polish national aggregator), SSH Open Marketplace, Leopoldina (an institutional knowledge platform) and Ariadna (a Silesian aggregation platform). DACE is composed of a customisable, event-driven data aggregation and processing pipeline, harvesting manager and optional discovery platform.
The aggregation and processing pipeline can be adapted to specific scenarios through microservices that focus on specific and small-scale actions, like batch or single record retrieval, data transformation, text recognition or data ingestion. DACE supports data harvesting via OAI-OMH, Mediawiki API, Wordpress API, Z39.50, CSV/XML import as well as several dedicated APIs (e.g. CLARIN resource families). The data transformation, extraction and normalisation components use data-source level configurable XSLT or JOLT, text recognition engines for full-text search (e.g. Tesseract) as well as date, keywords or NER extraction/normalisation routines.
Technically, the main idea behind DACE is to leverage the Apache Kafka event streaming framework in order to build loosely coupled ecosystem of microservices that receive messages and act on them, e.g. by sending new messages or ingesting data into discovery platform. Through this approach we aim to build a reusable framework that is flexible, scalable, reliable and highly available.
After several years of developments and production-level deployments (with more to come), this session will present the achievements of DACE and aims to attract the community to use and further develop the engine.
Tomasz Parkoła is the Head of Digital Libraries and Knowledge Platforms Department at Poznań Supercomputing and Networking Center where he manages research & development teams responsible for digital humanities infrastructure (http://ehum.psnc.pl/en/main-page/), products and services for digital libraries and cultural heritage (https://dingo.psnc.pl/) as well as Europeana-accredited Polish metadata aggregator FBC (https://fbc.pionier.net.pl/). He has been involved in national and international research and development projects with main themes on data access & processing, long-term preservation, digitization workflows as well as data aggregation & interoperability (e.g. IMPACT, SCAPE, SSHOC, Europeana Cloud).