Valentina Cesare

Valentina Cesare is an astrophysicist and a fixed-term researcher at the Osservatorio Astrofisico di Catania and she is currently working on the porting of scientific applications related to Gaia space mission (specifically, the Gaia AVU-GSR Parallel Solver) on HPC, HTC, and GPU environments, to the development of workflows for scientific visualization, and to other activities in the context of Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing–PNRR in Future Computing. She got the Ph.D. in Physics and Astrophysics at the University of Torino, with a thesis about modified gravity.


Sessions

11-08
08:45
15min
The Gaia AVU–GSR Parallel Solver: CUDA solutions for linear systems solving and covariances calculation toward Exascale infrastructures
Valentina Cesare

We ported to the GPU with CUDA the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) Parallel Solver, developed for the ESA Gaia mission, by optimizing a previous OpenACC porting of the code. The code finds, with a 10-100 μas precision, the astrometric parameters of ∼10^8 sources, the attitude and instrument settings of the Gaia satellite, and the parameter γ of the PPN formalism, by solving a system of linear equations, A×x=b, with the LSQR iterative algorithm. The coefficient matrix A of the final Gaia dataset is large, with ∼10^11x(5x10^8) elements, and sparse, reaching a size of ∼10–100 TB, typical for the Big Data analysis, which requires an efficient parallelization to obtain scientific results in reasonable timescales. In the matrix size, 10^11 is the number of equations, i.e., of stellar observations, and 5x10^8 is the number of unknowns, Nunk. The speedup of the CUDA code over the original AVU–GSR solver, parallelized on the CPU with MPI+OpenMP, increases with the system size and the number of resources, reaching a maximum of 14x, >9x over the OpenACC code. This result is obtained by comparing the two codes on the CINECA cluster Marconi100, with 4 16 GB V100 GPUs per node. We verified the agreement between CUDA and OpenMP solutions for a set of production systems. The CUDA code was then put in production on Marconi100, essential for an optimal AVU–GSR pipeline and the successive Gaia Data Releases. We aim to port the production of this code on Leonardo CINECA infrastructure, expecting to obtain even higher performances, since this platform has 4x GPU memory per node compared to Marconi100.
To solve a system of linear equations, the system solution, the errors on the unknowns (variances) and the covariances can be calculated. Whereas the solution and the variances arrays have size Nunk~5x10^8, the variances-covariances matrix has a size ~Nunk^2/2, which can occupy ~1 EB. This represents a “Big Data” problem, which cannot be solved with standard methods. To cope with this difficulty, we define a novel I/O- based strategy in a two jobs-pipeline, where one job is dedicated to the files writing and the second concurrent job reads the files as they are created, iteratively computes the covariances, and deletes the files, to avoid storage issues. In this way, the covariances calculation does not significantly slowdown the AVU-GSR code for a number of covariances up to ~10^6.
These analyses represent a first step to understand the (pre-)Exascale behavior of a class of codes based on the same structure of this one.
Acknowledgments: This work is supported by the Spoke 1 “FutureHPC & BigData” of the ICSC– CN di Ricerca in HPC, Big Data and Quantum Computing–and hosting entity, funded by European Union–NextGenerationEU”. This work was also supported by ASI grant No. 2018-24-HH.0, in support of the Italian participation to the Gaia mission, and by CINI, under the project EUPEX, EC H2020 RIA, EuroHPC-02-2020 grant No. 101033975.

GPU implementations for core astronomical libraries
Talks
11-09
08:30
15min
High performance visualization for Astronomy & Cosmology: the VisIVO’s pathway toward Exascale systems
Eva Sciacca, Valentina Cesare, Nicola Tuccari, Fabio Vitello

Petabyte-scale data volumes are generated by observations and simulations in modern astronomy and astrophysics. Storage, access, and data analysis are significantly hampered by such data volumes and are leading to the development of a new generation of software tools. The Visualization Interface for the Virtual Observatory (VisIVO) has been designed, developed and maintained by INAF since 2005 to perform multi-dimensional data analysis and knowledge discovery in multivariate astrophysical datasets. Utilizing containerization and virtualization technologies, VisIVO has already been used to exploit distributed computing infrastructures including the European Open Science Cloud (EOSC).

We intend to adapt VisIVO solutions for high performance visualization of data generated on the (pre-)Exascale systems by HPC applications in Astrophysics and Cosmology (A&C), including GADGET (GAlaxies with Dark matter and Gas) and PLUTO simulations, thanks to the collaboration within the SPACE Center of Excellence, the H2020 EUPEX Project, and the ICSC National Research Centre. In this work, we outline the evolution's course as well as the execution strategies designed to achieve the following goals: enhance the portability of the VisIVO modular applications and their resource requirements; foster reproducibility and maintainability; take advantage of a more flexible resource exploitation over heterogeneous HPC facilities; and, finally, minimize data-movement overheads and improve I/O performances.

Acknowledgements: This work is funded by the European High Performance Computing Joint Undertaking (JU) and Belgium, Czech Republic, France, Germany, Greece, Italy, Norway, and Spain under grant agreement No 101093441 and it is supported by the spoke "FutureHPC & BigData” of the ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing – and hosting entity, funded by European Union – NextGenerationEU”

Other creative topics in astronomical software
Talks