2024-11-12 –, Aula Magna
Publishing data seems easy: Put it on a web page, obtain a DOI, and
you are done. In practice, this kind of “dead“ data generally is hard
to find, access and hence to reuse, not to mention interoperability.
Hence, the Virtual Observatory defines “active” interfaces to the data:
standard protocols enable uniform querying and access, rich metadata in
standard formats on standard interfaces ensure discoverability. This
means that data publishers need to run non-trivial software. Software,
however, has a fairly short half-life, in particular because of changing
platforms, but also because the standards occasionally evolve.
In this talk, I discuss how the DaCHS data publication package tries to
mitigate this specific sort of bitrot, first and formost by introducing a
declarative layer (“state the problem, not the solution“) in data
publishing from ingestion to service operation to registration. I will
show some examples for how this has enabled us and others to run
data centres over many years with low to moderate effort, while staying
up to date with the evolving VO. I will also delineate where no
suitable declarative approaches have been found and what that meant
during major platform changes like the move from Python 2 to Python 3.