2022-11-04 –, ADASS Conference Room 1
The ESAC Science Data Centre (ESDC) is handling the archive data for several astronomy, solar and planetary missions. We started with some gigabytes of information, currently in the hundreds of terabytes and not so far in the future we will handle with petabytes. Some important fraction of them reside in database systems which allows to analyse the data directly in the ESDC systems using VO protocols. All kinds of data: structured, semi and unstructured. How to store this data in a database to give the users the ability to query easily the contents of a space mission?. How do we choose a solution that will handle some small data to one that scales better for big data. May this be a nightmare?
One does not fit all, but maybe in the future it well may happen.
We will review the evolution of database solutions for big data space projects with special focus in the ones that we have already tested (PostgreSQL, CitusDB, PostgresXL, Greenplum) with specific implementation for the Gaia DR3 release, the European JWST archive, the Euclid Science Archive and the future PLATO Data Archive.
Pilar de Teodoro studied Applied Physics at Universidad Autonoma de Madrid and at the Niels Bohr Institute in Copenhagen. After returning to Spain from Denmark, she joined Oracle Iberica to become a consultant in several projects in the technology area working as DBA, Application Server administrator and Portal Developer. More than 8 years later, in 2006, she joined the Gaia team at ESAC in the role of SOC DataBase Administrator and database testing manager. After working more than 8 years in the Gaia SOC, she joined the ESAC Science Data Center (ESDC) where she works as database expert and is themission technical interface for the Gaia archive.