2024-11-14 –, Aula Magna
Space missions have always recorded electromagnetic signals, from infrared light to gamma rays. Expected to launch in 2037, ESA's large-class mission LISA (Laser Interferometer Space Antenna) will survey gravitational wave signals from space. As the world's first in-orbit instrument to probe space-time itself, this is one of the most ambitious science missions ever. LISA promises a wealth of new science, allowing us to test our understanding of general relativity and to open a new window for astrophysics and cosmology. The data analysis for this mission will have to disentangle superposed signals from a variety of astrophysical sources, as well as modeling the instrumental noise. At the heart of a distributed data analysis system lies a gigantic Bayesian inference pipeline: the Global Fit. The computational challenge will be massive, expected to be about an order of magnitude heavier than the data processing of the recent ESA mission Euclid, in optimistic scenarios.
The inference of the parameters of each source will require source separation, complicating the estimation of their posterior distributions which is already challenging for isolated gravitational events. When separation is not possible, the number of superimposed sources becomes an unknown and the signals themselves form a confusion background comparable to noise; trans-dimensional analysis is then required, which yields additional complexity. To tackle the challenge of the Global Fit, the currently envisioned approach relies on a Markov chain Monte Carlo (MCMC) strategy, with block Gibbs sampling across the classes of sources (and the noise level) to reduce the complexity. Even using this trick, existing pipeline prototypes are computationally expensive and scale badly. As a consequence, the scientific community is looking for technological and algorithmic breakthroughs, e.g. relying on GPUs, sparsity-based modeling or artificial intelligence.
The so-called GlobalFit Framework will provide an abstraction layer between the distributed system components in charge of pipeline orchestration and execution on one hand, and the scientific modules on the other hand. This layer will offer to the scientific development team a convenient way to interact with the underlying components and hence focus on the algorithm development. Among others, the Framework will guarantee that the scientific modules can be integrated with low coupling, thus allowing dozens of labs to contribute with various languages and technologies. It will also handle the module scheduling, including the iteration logic, scalability and adaptation to available resources. It will feature check-pointing and resuming capabilities, and communicate in real time with concurrent pipelines running in different computing centers across the world. All of those technical requirements already make the Framework development an engineering challenge. Yet, the most complex features to be implemented relate to the user experience: Given such an algorithmic and computational beast, how to unleash its scientific potential? To do so, our Framework prototype is equipped with two dashboards which will be presented in more details: an Operation Dashboard for operators and a Monitoring Dashboard for science experts.
The proposed talk will address the following questions: How to support debugging and investigation at runtime, when thousands of sources are being processed in parallel? How to actively monitor the progress of the estimators and how to display a relevant synthesis of the source catalog in live? How to provide interactivity to the operators without wasting resources?