Vicente Navarro ADASS 2022

Vicente Navarro
.ical

Vicente is a Senior System Engineer at ESA’s European Space Astronomy Centre in Madrid (Spain) where he is responsible for development and operations of Science Ground Segment Systems.
Vicente leads System Engineering activities for the definition, implementation and operations of ESA Datalabs, which aims at consolidating a reference platform for scientific analysis of multi mission information.
Previously at ESA, Vicente has been in charge of development and operations activities of Europe’s Space Situational Awareness (SSA) Precursor Services as well as Ground Segment Systems at ESA’s European Space Operations Centre in Darmstadt (Germany) for missions like Integral, Rosetta, Mars Express, XMM-2000, Cryosat, etc
Before joining ESA, Vicente has been chief technologist of large-scale software systems in government and automotive sectors.
Vicente holds an MS in Computer Engineering complemented with on-going PhD studies in the area of Intelligent Agents and Global Navigation Satellite Systems.

Session

11-03

15:00

30min

ESA Datalabs: Unleashing a New Wave of Data Exploitation Opportunities

Vicente Navarro

At the European Space Astronomy Centre (ESAC) near Madrid, the ESAC Science Data Centre (ESDC) hosts ESA archives for Astronomy, Planetary and Heliophysics Space Science. Furthermore, the GNSS Science Support Centre (GSSC), with special attention to Galileo and EGNOS, consolidates an ESA archive for scientific exploitation of Global Navigation Satellite Systems (GNSS). The deluge of data generated by ESA missions in Space Science, Navigation, Earth Observation, and other domains both, from a scientific and operational viewpoint, calls for a brand-new palette of capabilities able to extract insights from these multi-mission, federated data sources.

Built around science-return as its central pillar, ESA Datalabs aims to present itself to the end-user as an intuitive system for swift access to a large catalogue of data volumes and processing tools effectively integrated in a single platform. Behind this glossy curtain, hidden from the user, lays an IT infrastructure which features an extremely sophisticated architectural blueprint. Kubernetes clusters, Rancher, ElasticSearch engines, Docker containers, and many other usual suspects of the High-Performance Computing world, team-up to deliver an innovative experience.

This focus demo will guide the audience through multiple catalogues that shape the application store and software as service concepts present in ESA Datalabs. First, the Datalabs catalogue will be introduced. This catalogue provides access to data analysis tools ranging from domain-specific desktop tools like Topcat, well known astronomical tool, to general-purpose, web applications like JupyterLab, widely adopted for data science in multiple fields. Through this catalogue users can search, comment, bookmark, and run any Datalab. Once the user finds a Datalab of interest, a simple click on a play icon makes the magic. At this point, users can modify the default configuration of the Datalab set by its creator, select a previous version of the Datalab, increase its computing resources, or connect additional data volumes from ESAC Archives. Following Datalab start-up, users can go back to the catalogue and launch up to five Datalabs in parallel (default profile configuration). Furthermore, a wizard-like editor guides ESA and non-ESA developers to make Datalab contributions following a build and moderation process that includes automatic security scans.

Leveraging on the powerful infrastructure developed for the execution of Datalabs, this demo will introduce a second catalogue for the execution of complex, batch data processing Pipelines. Pipelines represent an extension of the Datalab entity, defined as a set of an input area, a sequence of processing stages in between (steps), and an output area. The capabilities to perform data integration, pre-processing, transformation and analytics represent the entry point for Machine Learning Pipelines into ESA Datalabs. Moreover, a visual editor provides an integrated development environment to put together these processing workflows in a graphical way. The editor drives the user through the development cycle, simplifying the creation process and transforming the graphical representation of Pipelines into Common Workflow Language (CWL), the underlying standard supported by the orchestration engine.

Throughout this focus demo, the audience will see how ESA Datalabs catalogues permeate through the Data Archives at ESAC, bringing new collaboration features derived from the possibility to share co-located computing elements and storage areas. Along these lines, several JupyterLabs will illustrate how to explore and analyse data from ESAC archives.

Currently available as a beta release, ESA Datalabs joins the growing number of science exploitation platforms, implementing advanced network and computing capabilities to leave behind the discovery and download science era, showcasing a new era characterized by archives tight coupling with exploitation tools as a service.

ADASS Conference Room 2

Vicente Navarro .ical

Session

Vicente Navarro
.ical