<img loading="lazy" src="/media/adass2023/img/ADASS_Logo_font2_EItiex0.png" id="event-logo" alt="The event’s logo">

A Dynamic GUI to Supercharge your Scripts

Peter Teuben

We describe a simple method to annotate scripts (bash, csh, python,
and perhaps more) extract a set of variables which are presented in a
GUI (currently using Qt5) and able to run the script. An older version
using tk (wish) is still available.

We show two examples. The first example is a very generic GUI builder,
agui, which can be used in a number of scripting languages. The second
is called pythena, and uses the principles of the first and provides a
GUI interface to run the AthenaK MHD modeling code.

In the first example, exploring a large parameter space could be
combined with the script producing multi-variate data on-the-fly,
which can then be explored with 3rd party tools like such as glueviz
or topcat.

A Generic Interface for Serving Time Series Data: HAPI Explained

Jon Vandegriff

HAPI is a standard for serving time series data. It was developed with Heliophysics in mind, but has been kept as generic as possible and is likely useful in other areas of solar system exploration and astronomy. We present the basics of the standard (RESTful request and response interface with a streaming data format), and show how it has been adopted by various data centers in Heliophysics and Planetary Science, both in the US and Europe. We will also describe the various clients that are available for reading HAPI data. We aim to dialog with the astronomy community about the relevance of HAPI for serving time series data at various astronomical archives, as well as use of HAPI by astronomy software as a way to gain more uniform access to Heliophysics and other data sources.

Software, tools and standards for Solar System, heliophysics, and planetary research

A data processing pipeline for the Aspera SmallSat Mission

Miriam Keppler

One of the most intriguing components of galaxies is the warm-hot (~10^5-10^6 K) gas extending into the circumgalactic medium (CGM) of galaxies. The mass of this gas reservoir likely contains more mass than the stars in their parent galaxy, and its characterization is thus crucial to the understanding of galaxy formation and evolution. However, despite its importance, the properties of the warm-hot gas are poorly constrained so far. Aspera is a NASA-funded UV SmallSat Mission lead by the University of Arizona and expected to launch in 2025. The goal of Aspera is, for the first time, to detect and map the warm-hot gas around nearby galaxies through the detection and characterization of OVI line emission at 1032 Angstrom.

The focus of this poster is the development of a data processing pipeline for Aspera. The payload consists of two parallel channels each fed by a long-slit spectrograph illuminating a common Microchannel Plate detector providing a time-resolved event list of incident photons for each observation. The mission operations will apply a step-and-stare concept where individual galaxies are targeted at one or several spatial fields to spatially sample the galaxies’ environments. The Aspera data pipeline's goal is to reconstruct for each target calibrated 3-dimensional (2D spatial x 1D spectral) data cubes from several individual slit pointings around each target galaxy.

The ASPERA data pipeline will take care of applying the calibration metrics (derived from on-ground and in-orbit calibration activities) to the recorded photon event lists, extract a flux- and wavelength calibrated spectrum for each observation, and combine the spectra from individual slit pointings to reconstruct the final 3D data product. The raw data, as well as pipeline-produced data products will be archived through MAST. The pipeline will be written in Python and will be made publicly available to users.

This poster will lay out the planned structure of the pipeline, discuss individual steps to be applied to the science data during the processing, and present the format and content of our planned data products. We will put emphasis on the image cube reconstruction process, which will use an interpolation method to resample the spatially irregular sampled pointing observations onto a spatially regular gridded image cube. We will further present the generation of mock observations through an instrument simulator that will allow us to test and verify the pipeline.

A new Deep Learning Model for Gamma-ray bursts’ light curves simulation

Riccardo Falco, Luca Castaldini

AGILE is a space mission launched in 2007 to study X-ray and gamma-ray astronomy. The AGILE Team is developing new detection algorithms for Gamma-Ray Bursts (GRBs) both with classical and machine learning technologies. To train or test these algorithms, it is necessary to have a large GRB dataset, but usually, there are not enough real data available. This problem is also common for the new generation of high-energy astrophysics projects (such as COSI and CTA). It therefore becomes essential to have a system to simulate GRB data.
This work aims to develop a Deep Learning-based model for generating synthetic GRBs that closely replicate the properties and distribution of real GRBs. The dataset obtained using the trained model can then be used to develop detection algorithms both with classic techniques and with machine learning. To develop this generative model we have to take into account several complexities. The main ones are the huge different temporal behaviours of such GRBs and the lack of data making the training harder.
We propose a new method for generating GRBs. This model combines the Generative Adversarial Network (GAN) and Variational Autoencoder (VAE) to produce high-quality and structured generative results while preserving latent space structure. We also developed a loss function tailored to our dataset for the reconstruction task. To create the training dataset we extracted the light curves (LC) of GRBs presented in the Fourth Fermi-GBM Catalog using only long GRBs. This catalogue contains more than a decade of observations with a total of 3608 GRBs, captured through 12 NaI and 2 BGO detectors. In this work, we considered only the NaI detectors. We used the LCs detected by multiple sodium detectors and related to the same GRB as independent, to augment the number of samples. We filtered the LCs by removing outliers and those with missing values, ending with a total of 5964 LCs. Evaluating the distribution of the GRB duration (through the t90 parameter), we set the length of the time series at 220.
We assessed the model's performance by quantifying the dissimilarity between the histograms of count rates in synthetic and real GRBs. Additionally, we conducted quantitative analysis, employing statistics of the LCs distribution in both datasets. The results show that the synthetic LCs are generated with very similar properties to the real ones. We are working on a conditional version of our model using physical parameters of the GRBs such as the t90 or the fluence to have a more precise generation. This method can be used to generate synthetic LCs for other high-energy astronomy projects such as AGILE, CTA and COSI.

ALBUS: Modelling the Ionosphere with GNSS Interchange Data from the South African TrigNET

Benjamin Hugo

The Earth's ionosphere and plasmasphere is a solar-ionized region in the upper regions of the atmosphere. It extends from the D-region, at 60km, into the lower magnetosphere. The region, driven by solar activity and storms, acts as a dispersive medium which introduces both astrometric errors at very low (meter-length) wavelength radio light, as well as the Faraday Rotation of linearly-polarized radio light when coupled with the Earth’s strong magnetic fields. This rotation can be tens of degrees in the linear angle of polarization below the S-band (~2.0 GHz). We explored the use of RINEX interchange data from the GNSS receivers on the South African TrigNET network (which forms part of the International GNSS Service) and parametric profiles of the ionosphere (PIM, developed originally by the United States Air Force) to measure the amount of Faraday rotation induced by the ionosphere. We verify our measurements with long-term monitoring of the (stable) linearly-polarized quasar, 3C286, as well as the limb of the Moon as part of a larger joint-calibrator study between the Very Large Array (Socorro, NM, USA) and MeerKAT (Carnarvon, Northern Cape, South Africa) interferometers. We find a residual scatter in the predicted RM values of about 1 radian per meter squared at the Karoo site, which is consistent with scatter using the most accurate distributed IONEX models from the international geo and space sciences communities (NASA CDDIS-distributed JPL, CODE and UQRG models). This poster highlights the necessity of more research in the area of accurate ionospheric modeling, especially in southern geomagnetic latitudes and its implications for low-frequency full polarization science in the Square Kilometer Array and ngVLA era.

Software, tools and standards for Solar System, heliophysics, and planetary research

Accelerated Briggs Weighting Function on NVIDIA GPUs with CUDA

Gurjeet Jagwani

Radio interferometric imaging requires weighting of the measured visibilities to produce images suitable for scientific analysis. Natural and Uniform weighting schemes are implemented to either minimise the effects of sidelobes (Uniform) or to decrease the system noise in the data (Natural). The Briggs Weighting function provides a compromise between these two weighting schemes, using a robustness parameter. In this work we present an implementation of the Briggs Weighting function with CUDA on NVIDIA GPUs, achieving at least 3x speedup on a GeForce RTX 3090 Ti compared to DDFacet on synthetic data with 16 threads on an Intel i9 processor.

GPU implementations for core astronomical libraries

All your shapes in ADQL: MOCs in the TAP ecosystem

Markus Demleitner

MOCs, the HEALPix-based Multi-Order Coverage maps, are a powerful tool
for representing arbitrary shapes on the sphere with user-selectable
fidelity and remarkable compactness. Their most salient feature is that
operations such as union and intersection, nightmarish with conventional
geometries, are just a few lines of readable code with MOCs. This makes
MOC support a natural complement for the Virtual Observatory's Astronomy
Data Query Language ADQL. Indeed, the latest version of the ADQL-based
Registry discovery protocol RegTAP already makes use of MOCs. This
poster describes some proposed ADQL extensions to make MOCs even more
useful in TAP and ADQL and discusses these extensions' implementation
status.

Science with data archives: challenges in multi-wavelength and time domain data analysis

An original way to manage huge astronomical table

Gilles Landais

We explored a way to wrap in database large astronomical tables stored in a remote repository and in a suitable format. The architecture is based on ATSS (Astronomical Table Serialisation System) which exploits binary file allowing a powerful position indexation well suited to intensive use. This binary format is used in the VizieR catalogue service which duplicates large tables in ATSS and in a postgreSQL database to provide an efficient conesearch service and the SQL flexibility needed by the IVOA Table Access Protocol (TAP). The database architecture requires an important capacity to deal with datasets exceeding already the billion of records (Gaia, SDSS, GSC, etc). Some new surveys such as LSST will drive a significant increase in volume which may require a redesign of the architecture. We explored a cost-effective solution in term of funds and maintenance that consists in wrapping ATSS binary data as foreign table. We developed a PostgreSQL extension using the Foreign-Data-Wrapper that maps SQL with the API of the binary file including its cone search and indexation capabilities. The solution has been tested with a selection of large tables in a TAP service, we will discuss the technology, its potential but also the limitation of the current development.

Annotated Coadds: Concise Metrics for Characterizing Survey Cadence and for Discovering Variable and Transient Sources

David Shupe

In order to study transient phenomena in the Universe, existing and forthcoming imaging surveys are covering wide areas of sky repeatedly over time, with a range of cadences, point spread functions, and depths. We describe here a framework that allows an efficient search for different types of time-varying astrophysical phenomena in current and future, large data repositories. We first present a methodology to generate and store key survey parameters that enable researchers to determine if a survey, or a combination of surveys, allows specific time- variable astrophysical phenomena to be discovered. To facilitate further exploration of sources in regions of interest, we then generate a few sample metrics that capture the essential brightness characteristics of a sky pixel at a specific wavelength. Together, we refer to these as “annotated coadds.” The techniques presented here for WISE/ NEOWISE-R data are sensitive to 10% brightness variations at around 12th Vega magnitude at 4.5μm wavelength. Application of the technique to Zwicky Transient Facility data also enabled the detection of 0.5 mag variability at 20 AB mag in the r-band. We demonstrate the capabilities of these metrics for different classes of sources: high proper-motion stars, periodic variable stars, and supernovae, and find that each metric has its advantages depending on the nature of variability. We also present a data structure which will ease the search for temporally varying phenomena in future surveys.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Anomaly Detection in ASKAP’s Monitoring Data through Collaborative Intelligence

Zhuowei Wang

We explore the invigorating intersection of astronomy and human-machine collabora-
tion, casting a particular focus on the transformative role these elements play in our
exploration of the universe. The focus of our study is ASKAP and the pivotal role of its
monitoring data in enhancing astronomical discoveries. ASKAP is an innovative radio
telescope array that’s redefining our cosmic mapping capabilities. However, ASKAP’s
success in charting unprecedented amounts of galaxies brings forth the challenge of man-
aging and interpreting the resulting ’data explosion’. In response to this, we explore
the potential in anomaly detection, leading to a proposed collaborative human-machine
approach that maximizes the strengths of both components in effectively managing and
interpreting the vast datasets. We use anomaly detection by recognizing what’s ’normal’, we can identify the anomalies – the unusual occurrences that could lead to new discoveries. We propose a collaborative human-machine approach. Machines process the data and identify anomalies, while humans interpret the results and guide the exploration

Application of a Simulation-Based Inference method to a galaxy cluster cosmology analysis

Yuanyuan Zhang

We use Simulation-Based Inference (aka, Likelihood-free inference) to constrain cosmological parameters with optical galaxy clusters. Using galaxy cluster observables (e.g., ) derived from the Quijote simulation suite, we train a machine learning algorithm to learn the joint probability distribution of the parameters that generated the simulations and the resulting galaxy cluster observables. This trained algorithm is then applied to a test set of galaxy cluster observables, to derive the corresponding cosmological and astrophysical parameters and their uncertainties. Our preliminary analysis shows that the posterior values of the parameters and their uncertainties are accurate when compared to the truth. These results demonstrate the potential of applying a Simulation-Based Inference to galaxy cluster cosmology studies.

Astrophysics and Cosmos Observation. The Italian National Centre on HPC, Big Data and Quantum Computing

Ugo Becciani

High Performance Computing (HPC)-based and Big Data technologies are outstanding instruments to model the complex dynamic systems studied in Astrophysics and Cosmology today. Their use is needed by the majority of today's activities related to astrophysics: from the reduction and analysis of astronomical data up to their interpretation and comparison to theoretical predictions, including simulations and theoretical modeling.
INAF plays a fundamental role in the participation at the Italian National Centre on HPC, Big Data and Quantum Computing (funded by the EU Italian National Recovery Plan). We are adopting a user-driven approach in order to tightly couple to the community and it will adopt a co-design methodology for the development of the applications of very big experiments (e.g. SKA, CTA, EUCLID, GAIA, LOFAR etc), combining the requirements and the expertise of the scientists and the community code developers to the innovative software and hardware solutions and services envisioned by HPC and Cloud stakeholders, including green computing approaches, addressing the synergic and coordinated development of applications and technology. Among the main targets we are addressing the Big Data processing and visualization, adopting innovative approaches (e.g. Artificial Intelligence, inference via Bayesian statistics) for the analysis of large and complex data volumes and for their exploration (e.g. in-situ visualization), capable of efficiently exploiting HPC solutions.

Cloud infrastructures for astronomical data analysis

Automated anomaly detection at scale with the cloud-based Roman Data Monitoring Tool

O. Justin Otor, Tyler Desjardins, Jonathan Hargis

The Nancy Grace Roman Space Telescope (Roman) is an upcoming NASA flagship mission that is planned for launch no later than 2027. Roman is primarily a survey mission with the Wide Field Instrument (WFI) as its main instrument. The WFI consists of eighteen 16-megapixel detectors and is anticipated to produce 20 PB of science data during its five-year primary mission, an order of magnitude more than the current and planned yields of all active NASA flagship missions in astrophysics. If the effort to inspect data by eye scales with the number of pixels, WFI-sized data may dictate a full-time job's worth of attention. Thus, we present the Roman Data Monitoring Tool, a scalable, cloud-based platform built to automate the anomaly detection process for WFI science observations. Hosted on Amazon Web Services (AWS), our proof of concept is an event-driven pipeline using AWS Lambda that judges the astrometric alignment of newly uploaded simulations against Gaia's Data Release 3 and logs results in a database. Future plans include building parallel pipelines to track other phenomena of interest and a dashboard of all monitoring results to aid in triage when follow-up support is necessary.

Automatic classification of evolved objects from the Gaia’s DR2 and DR3 databases using Deep Learning Tools

Silvana G. Navarro

Planetary nebulae (PN) and symbiotic systems (SS), product of the evolution of low and medium mass stars are not easy to be distinguished with photometric data alone. However using some diagnostic diagrams it is possible to separate them.
We present the results of the automatic classification based on GAIA photometry data from releases DR2 and DR3.
The automatic classification was made using different algorithms and the results compared in basis of their accuracy. The training catalogue was constructed using the GAIA parameters (Gmag, BP mag and RP mag) which were complemented with J, H and/or K magnitudes from the 2MASS catalogue and some b - v colors when they were available from SIMBAD database.
The results concerning the accuracy obtained, and the better combination of parameters to achieve the best effectiveness are presented.
It was found that the b-v color, used frequently to separate NPs from SS, can be replaced by GAIA colors: Gmag-BPmag or BPmag-RPmag with advantage over b-v in some diagnostic diagrams.

Automatic classification of evolved objects from the Gaia’s DR2 and DR3 databases using Machine Learning Tools

Silvana G. Navarro Jiménez

Bayesian non-LTE gas temperature estimation of cores in the CMZ using Julia

Brian Svoboda

We present a new single-zone, non-LTE radiative transfer code, Jadex (https://github.com/autocorr/Jadex.jl), for estimating the gas kinetic temperature and volume densities of molecular clouds. The software is written in the Julia programming language and numerically optimized for the repeated likelihood evaluations needed for Bayesian parameter estimation using Markov-Chain Monte Carlo methods. As a demonstration, we estimate gas kinetic temperatures of dense cores detected in CMZoom, the SMA survey of the Milky Way's Central Molecular Zoom (CMZ), using the three K-doublet transitions of formaldehyde near 218 GHz. We report initial results from a comparison of the measured temperatures to core evolutionary states. Finally, we assess Julia's suitability for moderate size applications in numerical computing for astronomy.

Best practices for developing high quality scientific pipelines in the framework of the ESA PLATO mission

David Keiderling

ESA's PLATO mission, short for PLAnetary Transits and Oscillations, is a multi-decade, multi-nation space mission searching for earth-like planets around sun-like stars and to advance the field of stellar physics. Software development teams working on the science ground segment for PLATO face the well-known challenge of writing scientific code that shall be of high quality and thus reusable by later missions.

The development efforts are further complicated by the decade long development time as well as shifting institution responsibilities and changes in staff composition. Specifications of requirements as well as algorithms evolve continuously adding to the complexity of the development tasks.

To deal with these challenges we introduced industry best practices that enable us to write maintainable and robust software in an ever-changing environment. Among others, we have selected SCRUM, Clean Code and Continuous Integration / Continuous Delivery (CICD) as the foundation of our process.

In this talk we will give a high-level overview of the most valuable lessons-learned to date. We will showcase our approach to implementing Scrum as well as the benefits of starting simple, iterating on complexity and shortening our iteration cycles. Finally, we discuss how a unified development and technology stack has improved our project progress.

Binaural stellar spectra sonification based on variational autoencoders

Adrian Garcia Riber

The current development of Virtual Observatory technology (VO) allows easy access to astronomical data from ground-based and space-based observatories, not only for astronomers on remote computer networks but also for anyone interested in this field. This infrastructure represents one of the biggest examples of global collaboration and Open Science development and allows the use of real case studies on Citizen Science projects, Science Education, and Outreach.

Within this global inclusive paradigm, Sonification can play a key role in generating comprehensive multimodal representations of datasets, complementing graphical representations of scientific information, expanding the possibilities of virtual science exploration, and improving inclusion and accessibility for visually impaired and blind users (BVI).

Focused on unsupervised Deep Learning analysis of stellar spectra catalogs, this work describes a method to generate spectrum binaural multimodal representations using variational autoencoders. It includes the implementation and evaluation of an experimental prototype that explores the case study of the STELIB stellar library from the Spanish Virtual Observatory (SVO), to show the potential of the proposed pipeline to incorporate AI techniques and Sound Spatialization into astronomical data sonifications.

Bright Star Subtraction Pipeline for LSST: A Review of Progress

Amir Ebadati Bazkiaei

The Legacy Survey of Space and Time (LSST) will reach unprecedented surface brightness depths by imaging the southern sky, which provides a spectacular opportunity for studying the Low Surface Brightness (LSB) structures around galaxies. The presence of bright stars in deep images can result in over-estimation of the sky background if the bright stars are not robustly modeled to large enough radii and subtracted prior to the background measurements. This over-estimation results in over-subtraction of the sky background, which destroys the LSB structures in deep images of LSST. To prevent destroying LSB structures when subtracting the sky background, this work aims to develop Bright Stars Subtraction pipeline for the LSST deep images. We are currently in the process of developing a pipeline that robustly models the star profile of bright stars out to several hundred arcseconds radius and uses the modeled profile to subtract bright stars from deep images prior to the sky background subtraction process in the LSST data reduction pipeline. Our work includes testing the pipeline on deep images from currently available surveys and simulated data to evaluate its power in subtracting bright stars while preserving LSB structures. Once completed, the Bright Stars Subtraction pipeline will be a valuable tool for the LSB studies in the LSST as well as upcoming deep imaging surveys.

Building a Science-Driven Mission Planning Tool with JMARS

Christian Schaller

HiRISE [1][2], aboard NASA's Mars Reconnaissance Orbiter (MRO), and CaSSIS [3], aboard ESA's ExoMars Trace Gas Orbiter (TGO), are high-resolution, narrow-angle, color imaging systems. Science returns from these instruments are maximized by precision targeting and an understanding of the geography and geology of a candidate observation site; mission planning for such instruments requires science-driven planning tools.

JMARS [4], the Java Mission-planning and Analysis for Remote Sensing tool, is a geographic information system (GIS) developed at Arizona State University. It combines numerous datasets from multiple spacecraft missions for several solar system bodies; despite the name, JMARS is not limited to Mars operations. The tool supports global image layers, local image layers, individual images, geographic data layers, and feature annotations. It includes an extensive Java programming interface, and it is available as an open-source software project (OSS) [5].

We show here how we have used the OSS version of JMARS as a foundation for CaSSIS science operations with HiRISE operations as a guide. We've customized the JMARS built-in GIS data layer for direct access to the CaSSIS target database. This custom layer displays candidate targets that match a variety of planning criteria; the particular imaging requirements of user-selected targets can be viewed directly in the tool. Using a version of the JPL NAIF SPICE toolkit [6] adapted for Java, we display spacecraft navigation and instrument pointing in a separate custom layer that includes mission-level constraints and time-dependent details such as local time and photometric angles.

Instrument parameters are set directly in the planning tool, with integrated photometric models and resource management tools to assist in the process. We export the operations plan to file, from which we build the CaSSIS instrument command file; the same plan is integrated into the spacecraft operations plan at the main mission operations center.

JMARS runs on a wide range of computing systems, including personal desktop computers and laptops, enabling easy remote operations. As the foundation for an in-house tool, we can adapt to unexpected operational situations, such as the partial loss of instrument's capability.

[1] McEwen, A. S., et al. (2007), J. Geophys. Res., 112, 10.1029/2005JE002605; https://hirise.lpl.arizona.edu.
[2] McEwen, A. S., et al. (2023), Icarus, in press.
[3] Thomas, N., et al. (2017), Space Sci. Rev., 212, 10.1007/s11214-017-0421-1; http://cassis.unibe.ch.
[4] Christensen, P. R., et al. (2009), AGU Fall Meeting 2009, Abstract IN22A-06; http://jmars.asu.edu.
[5] https://jmars.mars.asu.edu/open_source.
[6] Acton, C. H. (1996), Planetary and Space Sci., 44, 10.1016/0032-0633(95)00107-7; https://naif.jpl.nasa.gov.

BurstCube Ground Software and Data Analysis Pipeline

Joseph Asercion

BurstCube is a 6U CubeSat (10 x 20 x 30 cm) designed to rapidly detect and disseminate information about gamma-ray bursts (GRBs) to both expand sky coverage and supplement existing observatories. To do this, the mission requires a lightweight yet robust data processing pipeline system which can be easily deployed, requires minimal maintenance, can perform standard gamma-ray burst analysis to convert raw instrument data to FITS standard data files, and can handle transfers to science archiving and alert systems. This is accomplished via a three-stage processing pipeline consisting of Level 0, Level 1, and Level 2 (L0, L1, and L2) processing routines. These stages convert data from the instrument transmitted in CCSDS Space packet format to FITS file format, perform event localization, generate light curves and spectra, perform automated event classification, and disseminate rapid detection alerts and related event information to the community via the GCN to allow multi-wavelength and multi-messenger follow-ups by other observatories. Processed event data and ancillary data products are then transferred to and served to the public via the HEASARC at Goddard Space Flight Center. In this poster the software architecture of the BurstCube Data Analysis Pipeline is detailed, including individual data levels, the processing tasks involved, and interactions with data archiving and science alert systems.

CLEAN algorithm implementation comparisons between popular software packages

Daniel Wright

The CLEAN algorithm, first published by Högbom (1974) and its later variants such as Multiscale CLEAN (msCLEAN) by Cornwell (2008), has been the most popular tool for deconvolution in interferometric imaging used by aperture synthesis radio telescopes. CLEAN effectively removes the telescopes point spread function from the observed images. We have compared source fluxes produced by different implementations of msCLEAN (WSCLEAN, CASA) with a prototype implementation of msCLEAN for the SKA on two datasets. The first is a simulation of multiple point sources of known intensity, where none of the software packages detected all the simulated point sources to within 1.0% of the simulated values. The second is of supernova remnant 3C 391 taken by the Very Large Array (VLA), where none of the software packages produced images which agreed to within 1.0% of each other.

Collaborative and Guided Visual Analytics Methods for Space and Planetary Science Applications

Manuela Rauch

This abstract presents our work for collaborative and guided visual analytics methods in four scientific applications within the EXPLORE EU project (https://explore-platform.eu). To offer this, we use Visualizer, a web-based open-source research framework for rapid prototyping of visual analytics applications developed at Know-Center. Simple APIs ensure extensibility of visual and algorithmic methods as well as integrability to third-party web applications.
In the G-Tomo application we visualize extinction data in a 3D volume visualization enabling users to explore dust clumps and areas of low-density. Depending on the selected cube size, different resolutions of the data are loaded in the visualization (see Figure 1). Custom interaction methods enable users to create and modify slices in the 3D volume visualization and display the selected slice in a contour plot where they can annotate areas of interest (see Figure 2). The two visualizations have been seamlessly integrated in the G-Tomo Dash application. The S-Phot application aims to estimate stellar parameters, e.g., surface temperature, to deviate other parameters. To support the data exploration, a “Default UI” has been configured showing a Hertzsprung-Russel diagram, the S-Phot science visualization, the AladinLite visualization, as well as two tables (see Figure 3). Annotation capabilities have been included in the first two visualizations enabling users to mark interesting findings. Besides this, a user guidance component has been implemented supporting users in selecting interesting data fields, algorithmic methods, and visualizations (see Figure 4). Thus, when users select data fields and specify their analytical goal, recommended workflows are retrieved from the server and are displayed in the UI. This supports especially less experienced users in applying useful analytical workflows and retrieve meaningful insights. AI methods enable us to cluster stars depending on their parameters; however, they do not provide any details on why they have been clustered together and how they are different to other clusters. To overcome these limitations, we proposed an “Information Landscape” visualization enabling users to select clusters of interest and explore the distribution of parameters within a histogram (see Figure 6). Users can annotate clusters and thus easily share insights on the investigated data. Planetary scientists are interested in exploring the spectral composition of different locations also on different planets. Searching and identifying similar spectral profiles cannot be done manually. Thus, we implemented a spectral profile search using several DTW algorithms to support them in identifying similar spectral profiles for the Lunar application (see Figure 5). It enables users to either load their own spectral profiles or select spectral profiles from a sample cube and search for similar profiles, either in the same cube or in the USGS library data (https://crustal.usgs.gov).

Software, tools and standards for Solar System, heliophysics, and planetary research

Creating Spectral Cubes from NIRSpec MSA slitlet stepped observation

Jane Morrison

Using the Near Infrared Spectrograph (NIRSpec) on the James Webb Space Telescope(JWST) we have
spatially resolved 56 galaxies at redshifts 1<z<6 in the Hubble Ultra Deep Field. What is unique about
this program (PID 2123) is the micro-shutter assembly (MSA) was used in "slitlet stepping mode" which essentially turned the MSA into a multi-object IFU. Slitlet stepping exploits the multiplex and sensitivity advantages of the MSA to carry out a survey in a vastly shorter time than a large IFU sample would require. We present the software which was developed to create the spectral cubes for these galaxies.

The NIRSpec MSA consists of 4 quadrants of 365 × 171 shutters that can be individually opened and closed to create the spectral slit configurations for this multi-object spectroscopy mode. MSA slitlets are two or more open shutters, adjacent in the cross-dispersion (spatial) direction, that form a longer "slitlet".
Combing the multiple open shutters and slitlet-stepping across the galaxies can efficiently obtain spatially resolved spectroscopy of several galaxies at the same time. Depending on the
galaxy size, 1 to 8 unique slitlets are designed to cover the entire galaxy. The slitlets also vary in length from 1 to 6 shutters to cover the entire galaxy.

MSA "slitlet-stepping" is not presently an offical JWST supported mode. Regardless, in Cycle 1, this mode has been used, albeit in different manners, by GO2123 and GO2132 and have demonstracted its viability. Here we present the MSA cube building software developed for GO2123, also known as GARDEN: Galaxies at All Redshifts Deciphered and Explained with the NIRSpec MSA.
GARDEN has observed 56 galaxies at redshifts between 1 and 5 with specialized MSA stepping configurations depending on their morphologies and instrumental artifacts. We present a specialized set of software routines to create IFU-like data cubes for these galaxies. The software uses the JWST pipeline products produced by the calwebb_spec2 pipeline. The calwebb_spec2 pipeline processes the data and produces fully calibrated output. The slitlet-stepping cube building software is general and can be used on other NIRSpec MSA "slitlet-stepping" programs to take calibrated 2-D images and produces 3-D spectral cubes. The 2-D disjointed MSA spectra are corrected for distortion and assembled into a rectangular cube with three orthogonal axes: two spatial and one spectral. The MSA spectral cube software uses the same 3d drizzle algorithm used for ordinary JWST IFU data to produce regularized 3-D spectral cubes.

Deep Learning and IACT: bridging the gap between Monte-Carlo simulations and LST-1 data using domain adaptation

Thomas Vuillaume, Michaël Dell'aiera

The Cherenkov Telescope Array Observatory (CTAO) is the next generation of observatory employing the imaging air Cherenkov technique for the study of very high energy gamma rays in the range from 20 GeV to 300 TeV. The first full CTAO telescope, the LST-1, is operational in La Palma, and is acquiring data that has achieved the identification of established sources, exploration of unknown ones, and validation of anticipated performance benchmarks. The deployment of deep learning methods, thanks to the GammaLearn project, for the reconstruction of physical attributes of incident particles, encompassing parameters like energy, arrival direction, and particle classification, has evinced promising outcomes when conducted on simulations. However, the transition of this approach to observational data is accompanied by challenges, as deep learning-based models are susceptible to domain shifts. In the present contribution, we address this issue through the integration of domain adaptation into state-of-the-art deep learning models, and we shed light on the performance that they bring using LST-1 data.

Discover your astronomical data from python – simply!

Renaud Savalle, Markus Demleitner, Hendrik Heinl

The VO Registry is a set of about 30,000 metadata records of astronomical resources. It is queryable using the powerful RegTAP protocol requiring users to write ADQL. A friendlier interface using that protocol has recently been written a part of the pyVO astropy affiliated package. In this poster, we briefly introduce the standards this is implemented against. The new API can be used by astronomers to discover data based on various constraints, ranging from free text to physical concepts and areas in space, time, and spectrum.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Distributing Public Data with the Gravitational Wave Open Science Center

Martin Beroiz

The Gravitational Wave Open Science Center (GWOSC) provides public access to gravitational-wave data products. GWOSC serves as the primary access point for data from GW observatories around the world, and provides a uniform interface for our data. Our data products include strain data files, catalogs of detections, and quality or injection segment information. In this poster, we highlight our data products, resources to learn how to analyze data, and the tools to work with them. Among GWOSC's efforts to improve the user experience are a collection of tutorials, a public discussion forum, and a world-wide annual workshop.

Dysh: Spectral-Line Calibration of SDFITS Files

Victoria Catlett

The Laboratory for Millimeter-Wave Astronomy (LMA) at The University of Maryland (UMD) is working with The Green Bank Observatory (GBO) to create Dysh, a replacement for GBTIDL, GBO’s current data-reduction package. Dysh will be an open-source, Python-based spectral line reduction and analysis software for Single-Dish FITS (SDFITS) files that takes advantage of well-known packages such as Astropy and Specutils. Features will include equal or better data calibration quality than GBTIDL, enhanced data reduction options, faster I/O speeds, and easily-implemented custom scripts.

Dysh is scheduled for completion in 2025 with several intermediate releases, and it has the flexibility to be implemented at other single-dish radio telescopes, such as the Large Millimeter Telescope (LMT).

Efficient Image Visualization and Analysis with CARTA - Cube Analysis and Rendering Tool for Astronomy

Yu-Hsuan, Hwang

CARTA (Cube Analysis and Rendering Tool for Astronomy) is a next generation software designed for image visualization and analysis in the field of astronomy, developed by a collaborative team from Academia Sinica Institute of Astronomy and Astrophysics (ASIAA), Inter-University Institute for Data Intensive Astronomy (IDIA), National Radio Astronomy Observatory (NRAO), and Department of Physics, University of Alberta.

There are three key features that make CARTA a powerful tool for astronomical data. Firstly, CARTA utilizes a client-server architecture combined with GPU graphic rendering and latency-hiding techniques, ensuring high performance in visualizing large images. This capability is particularly crucial for modern telescopes. Secondly, CARTA offers flexible deployment options, available as both stand-alone applications and for site deployment. It integrates with the ALMA archive website, offering convenient online previews. Lastly, in addition to its high performance and deployment flexibility, CARTA is equipped with a wide range of efficient tools, including spectral line related analysis and multi-wavelength catalog visualization. This supports various fields in astronomy research, from individual source structure studies to large survey statistical analysis.

Evaluating IVOA ExecutionPlanner for CIRASA tools

Dave Morris

The IVOA ExecutionPlanner is, in simple terms, an abstract service interface for a computing platform able to answer the question: “Can I run this ‘executable thing’ on this platform?”. It provides a standard interface for submitting and monitoring jobs that can be implemented in front of a range of different compute platforms.

CIRASA is a visual analytic platform for advanced source finding and classification, integrating state-of-the-art tools, such as the CAESAR source finder, the ViaLactea Visual Analytic (VLVA) and Knowledge Base (VLKB).

A key component of the CIRASA infrastructure, caesar-rest [https://github.com/SKA-INAF/caesar-rest], is a REST-ful web service for astronomical source extraction and classification using the caesar source extractor [https://github.com/SKA-INAF/caesar]. The current implementation of caesar-rest can be integrated with a number of different job management services, such as Kubernetes, Slurm, or a local Celery service. However these are all implemented as direct connections to local services, co-located with, and managed by, the same community as the caesar-rest service.

This project will explore designs for a simple ExecutionPlanner prototype that is capable of executing caesar source extractor tasks and using this to enable the caesar-rest system to make use of remote compute platforms managed by other projects.

Internally, the prototype would use the same job management services, but it would hide this behind the abstract ExecutionPlanner interface, decoupling the caesar-rest system from the details of the job management service. This would enable the caesar-rest system to act as a distributed application, using a range of different compute platforms from a community of federated services.

This project will evaluate the costs, benefits and suitability of using the ExecutionPlanner interface to provide the kind of community of federated compute platforms that will be needed to implement the wider goals of the SKA architecture.

Cloud infrastructures for astronomical data analysis

Exploring the journey towards scientific knowledge through one year of public data at the European JWST Archive

Maria Arevalo Sanchez

After a successful year of discoveries since the first public JWST images were revealed, the James Webb Space Telescope (JWST) has solidified its position as the premier space science observatory. Over the past year, the ESAC Science Data Centre (ESDC) has taken the lead in establishing the European JWST Science Archive (eJWST), aimed at enhancing the scientific outcomes of this remarkable observatory. As a part of the collaborative efforts involving NASA, ESA, and CSA, all JWST metadata and public data are now being seamlessly synchronised in real-time, mirroring the Mikulski Archive for Space Telescopes (MAST) JWST archive at the Space Telescope Science Institute (STScI).

The eJWST Science Archive ensures swift, intuitive, and user-friendly access to Webb's invaluable data: for both private observations (data stored at STScI) and public observations (data housed at ESDC) with utmost convenience. This provides a guide to the exploration of the celestial landscape viewed by JWST, contextualising observation data and metadata within the realm of multi-wavelength science. Integration with the ESASky tool (containing access to all ESA science archives) enriches the experience, while archive data viewers offer on-the-fly previews. Moreover, we will illustrate how the eJWST Archive propels scientific research by offering various helper tools, such as searches based on the ADQL query language and dedicated Python modules including the ESA JWST Astroquery module. This package is also included within the JWST section of ESA Datalabs, an innovative science exploitation platform developed by the Data Science and Archives Division at ESAC.

In addition, the James Webb Space Telescope workspace within ESA Datalabs will be presented. Here we highlight the software packages that are at users' disposal for processing JWST products, along with sample notebooks that empower exploration and analysis of data from the European JWST archive. The JWST area in ESA Datalabs comes equipped with the latest version of the processing pipeline and associated calibration files (contexts), which are updated daily. Furthermore, this workspace serves as a hub for JWST workshops and summer schools. By relieving lecturers and participants of software configuration and setup concerns, it allows users of ESA Datalabs to focus their energies squarely on the analysis of JWST data.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Fundamental Parameter Estimations of O-type Stars Binary Systems Using Recurrent Neural Networks

Miguel Flores

Recent studies have established O-type stars as predominantly born in binary star systems. In this work, we explore the application of a recurrent neural network system for estimating the effective temperature and surface gravity of binary star systems. Additionally, we assess the neural network's sensitivity in processing synthetic binary spectra derived from two stellar spectra models of O-type stars and the implications of the contribution from the secondary star in the system. Finally, we compare the estimations produced by our proposed system with those from prior research.

Galaxy Classification using Topological Data Analysis

Solai Jeyakumar

Galaxy catalogues with large number of images of galaxies observed at
several wavelengths have become available. Examples of such catalogues
are produced by survyes such as SDSS, DES, and CANDELS.
Using these catalogues, galaxies are classified utilizing eye-fit such as
the one applied to Galaxy Zoo project. Such an excercise is very time consuming
and it is not apt for big datasets. In addition, recent catalogues include galaxies
at higher redshift. These galaxies are observed with poor resolution and
the galaxy evolution with redshifts may mean that the traditional galaxy
classification scheme may not be sufficient.
Other automated methods that are applied to large datasets,
either extract morphological parameters such as concentration, clumpiness,
asymmetry etc., or estimate photometric parameters.

Recent times Machine Learing techniques such as Convolutional Neural Networks (CNN)
and Support Vector Machine (SVM) are applied to galaxy classification problem with
varying degree of success. Many authors have applied CNN based
schemes along with many techniques to improve the success rate, such as
dropout regularization and transfer learning.
However these techniques require large computational power and human created
training catalogues. The resulting output classes are restricted to the classes
in the training dataset.

Recently the technique of "Topological Data Analysis (TDA)" is being used for image
segmentation, classification and object detection. This method uses techniques
like persistent homology or tSNE to identify topologically connected components
in the images. Although this technique is used in other sciences,
it has not been used much in astronomy. The TDA technique is applied to studying
large scale structures, such as finding voids and filaments and
Cosmic Microwave Background (CMB) data. However the TDA tecnique has
not been applied to galaxy classification problem.
In this work the TDA method is applied to known galaxy classification catalogues
in order to evaluate whether this method can be used for large multi-wavelength datasets.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Galaxy cluster detection on SDSS images using deep machine learning

Kirill Grishin

Galaxy clusters are a powerful probe of cosmological models. Next generation large-scale optical and infrared surveys will reach unprecedented depths over large areas and require highly complete and pure cluster catalogs, with a well defined selection function. We have developed a new cluster detection algorithm YOLO-CL, which is a modified version of the state-of-the-art object detection deep convolutional network YOLO, optimized for the detection of galaxy clusters (Grishin, Mei, Ilic 2023). We trained YOLO-CL on color images of the redMaPPer cluster detections in the SDSS. We find that YOLO-CL detects 95−98% of the redMaPPer clusters, with a purity of 95−98% calculated by applying the network to SDSS blank fields. When compared to the MCXC2021 X-ray catalog in the SDSS footprint, YOLO-CL is more complete then redMaPPer, which means that the neural network improved the cluster detection efficiency of its training sample: it detects 98% of clusters with mean X-ray surface brightness of 20×10^-15 erg/s/cm2/arcmin2 while redMaPPer is 98% complete above 55×10^-15 erg/s/cm2/arcmin2. The YOLO-CL selection function is approximately constant with redshift, with respect to the MCXC2021 cluster mean X-ray surface brightness. YOLO-CL shows high performance when compared to traditional detection algorithms applied to SDSS. Deep learning networks benefit from a strong advantage over traditional galaxy cluster detection techniques because they do not need galaxy photometric and photometric redshift catalogs. This eliminates systematic uncertainties that can be introduced during source detection, and photometry and photometric redshift measurements. Our results show that YOLO-CL is an efficient alternative to traditional cluster detection methods. In general, this work shows that it is worth exploring the performance of deep convolutional networks for future cosmological cluster surveys, such as the Rubin/LSST, Euclid or the Roman Space Telescope surveys.

Science with data archives: challenges in multi-wavelength and time domain data analysis

General Coordinates Network

Courey Elliott

The Gamma-ray Coordinates Network (GCN) is a public collaboration platform run by NASA for the astronomy research community to share alerts and rapid communications about high-energy, multimessenger, and transient phenomena. Over the past 30 years, GCN has helped enable many seminal advances by disseminating observations, quantitative near-term predictions, requests for follow-up observations, and observing plans. GCN distributes alerts between space- and ground-based observatories, physics experiments, and thousands of astronomers around the world. With new transient instruments from across the electromagnetic spectrum and multimessenger facilities, this coordination effort is more important and complex than ever. We introduce the General Coordinates Network, the modern evolution of GCN built on modern, open-source, reliable, and secure alert distribution technologies, and deployed in the cloud. The new GCN is based on Apache Kafka, the same alert streaming technology that has been selected by the Vera C. Rubin observatory. In this poster we will describe a brief history of GCN, the purpose of GCN, and an outline of the GCN software and systems.

Cloud infrastructures for astronomical data analysis

HelioCloud: A cloud-native platform for accelerating heliophysics research

Christopher Jeschke, Jon Vandegriff

We present HelioCloud, a platform designed to offer an easy on-ramp for Heliophysics science users in the Amazon Web Services (AWS) compute environment. With the need to analyze big data, and with collaboration and Open Science becoming more of the currency of the community, it is important to find ways to facilitate the open analysis of larger data volumes in shared spaces. HelioCloud builds on a compute stack based on Pangeo, and greatly simplifies setting up a scientist-friendly AWS environment conducive to Heliophysics research. We are creating an open-source software-as-infrastructure mechanism that will allow institutions or groups to easily deploy a robust Heliophysics workbench complete with familiar code authoring tools, Jupyter Notebooks, simple and scalable parallel computing support, and data storage sharing capabilities that will foster collaboration within the community. We will present the internal architecture for HelioCloud instances, show how easy deployment can be, and how scientists can store and share data using the HelioCloud API.

Cloud infrastructures for astronomical data analysis

IAU CPS SatHub and Tools to Mitigate Satellite Constellation Interference

Michelle Dadighat

The IAU Centre for the Protection of the Dark and Quiet Sky from Satellite Constellation Interference (CPS), established in early 2022 and co-hosted by NSF’s NOIRLab and the SKA Observatory, was created to unify efforts to mitigate the effects of satellite constellations on astronomy. SatHub, one of the four sub-groups of CPS, focuses on software and related tools to aid observers and industry partners in addressing some of the issues caused by commercial satellite constellations.

Currently, SatHub software efforts are concentrated in two main areas: SatChecker, a satellite ephemeris prediction service which is nearing a beta phase, and a satellite brightness observation database, still in early development.

One of the primary tools needed to mitigate the effects of commercial satellite constellations is a robust ephemeris service that can provide accurate satellite positions as well as estimated brightness. This can reduce or remove the need to recreate satellite interference predictions across multiple observatory locations. The SatChecker service will be available via API and web interfaces with the ability to provide satellite passage information for specific pointing/FOV/exposure times, as well as general ephemeris information for specific satellites. SatChecker will also utilize multiple data sources providing orbital data in both two-line element (TLE) and orbital ephemeris message (OEM) formats to both predict future passes and provide archival information. Additionally, SatChecker could also be used to conduct simulations of how satellite constellations can impact specific science cases, such as Rubin Observatory’s LSST and associated follow-up observations.

Another focus of SatHub’s development efforts is a data repository to collect satellite brightness observations - visual, optical, and the images themselves. Currently work has begun on an image upload website (Trailblazer), but we will be working on developing a repository to collect all brightness observation measurements and make them easily accessible for review or download. One of the main focus points is creating a standardized data format for all relevant brightness information, and ensuring that there is a consistent way to process images from different sources. This information can also be used by satellite operators to quantify the effects of any mitigations they do to reduce satellite brightness.

Improving CI/CD workflow efficiency with Kubernetes and an automated bot system

Ari

Oftentimes software developers are forced to allocate time towards processes other than development such as waiting for builds, retrying merge requests, and updating packages. We are working to improve the workflow efficiency of CI/CD pipelines in GitLab. We will use Kubernetes to deploy jobs in parallel, removing the need to wait for each job to finish before initiating the next. This will significantly reduce testing time. By running developer tools on our Kubernetes cluster – such as Marge-bot, a merge request bot, and RenovateBot, an automated dependabot – we will decrease time-consuming tasks unrelated to coding and allow developers to focus on research and the development process.

Improving Legacy Systems at the Minor Planet Center

David Bell

The Minor Planet Center curates and disseminates a catalog of around 400 million astrometric observations of over 1.3 million Solar System bodies. In addition, we compute and distribute orbits, ephemerides, identifications and associated metadata in both modern and legacy formats. This vast and heterogeneous set of data and data products is utilized by a similarly vast and heterogeneous user base. This, combined with the need for prompt and continuous operations, has led to a highly complex collection of legacy software. In this poster, we describe two examples from recent years of our attempts to modernize our systems:

Where Are My Observations (WAMO), an application/API that allows users to track the processing status of their submissions.

Near-Earth Object Confirmation Page (NEOCP), used by the community to identify new nearby asteroids and comets.

Software, tools and standards for Solar System, heliophysics, and planetary research

Improving Stellar Dynamic Modeling of Single-Fiber Kinematics

Michael Talbot

Strong galaxy-scale gravitational lenses are invaluable in the study of the evolution of galactic structure, cosmography, and the Hubble tension challenge. The necessary precision of these fields requires comprehensive modeling of the density profile of the lens, in which jointly fitting stellar dynamics and lensing information helps isolate and characterize lens and external mass properties. However, researchers often resort to dynamic fitting of single-aperture kinematics since it is often challenging and observationally expensive to obtain spatially resolved kinematics of lenses that are typically an order of an arcsecond in angular size. Unfortunately, any dynamic fit of single-aperture kinematic measurements requires assumptions on how the unknown stellar orbit patterns impact the line-of-sight measured velocities, which inflates measurement uncertainties and has led to literature reporting bias between isothermal power-law fits of the source images and kinematics. These issues motivated our current work to compare single-aperture dynamic approximations to improved dynamic fits of spatially resolved kinematic measurements for over 10,000 nearby galaxies from the MaNGA survey. We report on the investigation and what modeling improvements are illuminated from this study.

Joint Likelihood Deconvolution of Astronomical Images in the Presence of Poisson Noise

Axel Donath

We present a new software for Joint Likelihood Deconvolution (Jolideco) of a set of astronomical
observations of the same sky region in the presence of Poisson noise. The method reconstructs a
single flux image from a set of observations by optimizing the a posteriori joint Poisson likelihood of all
observations under a patch based image prior. The patch prior is parameterised by a Gaussian Mixture
model (GMM) which we trained on astronomical images with high signal to noise ratio, including data
from the James Webb Telescope as well as the GLEAM radio survey. During the reconstruction
process the patch prior adapts to the patch structures in the data by finding the most likely GMM
component for each patch in the image. By applying the method to simulated data we show that both
the combination of multiple observations as well as the patch based prior lead to a much improved
reconstruction quality in many different source scenarios as well as signal to noise regimes. We show
also that the method yields superior reconstruction quality to alternative standard methods such as
the Richardson-Lucy method. We also the results of the method applied to example data from the
Chandra observatory as well as the Fermi -LAT.

MADYS: determining and comparing stellar and substellar parameters across isochronal models

Vito Squicciarini

Thanks to the unrivalled astrometric and photometric performances of Gaia, new impetus has been given to the study of young stars: both from an environmental perspective, as members of comoving star-forming regions, and from an individual perspective, as targets amenable to planet-hunting direct-imaging observations. In addition to this, direct imaging of giant planets and brown dwarfs thanks to ground-based (e.g., VLT) and space-borne (JWST) instruments is providing the community with larger and larger samples of young objects that are already paving the way to a tremendous boost in the understanding of the physics underlying their formation and atmospheric properties.

In view of the large availability of theoretical evolutionary models, both the stellar and the planetary field would benefit from a unified framework that allows a straightforward comparison of physical parameters obtained by different stellar and substellar models.

To this aim, I developed the Manifold Age Determination for Young Stars (MADYS), a flexible Python tool for parameter determination (age, mass, Teff, radius, luminosity etc) of young stellar and substellar objects based on isochronal fitting of observed photometry. MADYS is equipped with automatic modules handling SQL-type queries of input list of stars, the computation of interstellar extinction, the estimate of photometric quality and, finally, the computation of astrophysical parameters and the graphical representation of posterior distributions within the parameter space. More than 250 photometric filters and 20 models are currently available in MADYS, allowing a self-consistent and straightforward comparison of the results obtained by different evolutionary models.

I will introduce here the main features of the tool, which is well documented and has been already employed in ~10 publications.

ML and next steps in the DRAO data handling pipelines

Dustin Lagoy

The Dominion Radio Astrophysical Observatory (DRAO) has several telescopes whose radio frequency and digital subsystems are currently undergoing (or have recently completed) major upgrades, enabling new science via expanded capabilities in bandwidth and frequency resolution. Driven by science observation requirements and the radio frequency interference (RFI) environment each telescope will produce upwards of 400 MB/s of spectral data during long-running observations (on the order of petabytes every year). This high volume of data requires new infrastructure and techniques in the pre-processing, archiving and distributing pipeline as well as re-thinking some existing paradigms. Before sending data off-site for archiving and distribution, we aim to perform real-time data-reduction by automatically removing RFI using a machine-learning (ML) based spectral kurtosis estimator. This new approach will both significantly reduce the volume of archived data and reduce the effort of individual scientists in removing the RFI by hand. Here we discuss the current status of the data pipeline and its ongoing development.

Mapping VOTable Data on Data Models: Implementation Status and Progress

Laurent Michel, Somia Floret, F.-X. Pineau, Gilles Landais, Grégory Mantelet, François Bonnarel, mireille louys

Model Instances in VOTables (MIVOT) defines a syntax to map VOTable data to any data model serialised in VO-DML (Virtual Observatory Data Modeling Language).
This annotation schema operates as a bridge between data and models. It associates both VOTable metadata and data to data model elements (class, attributes, types, etc.). It also brings up VOTable data or metadata that were possibly missing in the table, e.g., fine tuned coordinate system description, or curation tracing. MIVOT became an IVOA recommendation in May 2023.
Having this standard was necessary to exercise data models against real data and to make the data interpretation easier by using code working with common data models.
This paper presents our ongoing developments : reading and writing MIVOT annotations with the CDS RUST library, reading and interpreting annotations with AstroPy/PyVO and creating an add-on for the VOLLT TAP library enable to annotate query responses on the fly.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Metrics are a kind of telemetry too: Time-series capture and curation with Sasquatch

Angelo Fausti

At Rubin Observatory we have recently consolidated our high-frequency telemetry harness that captures engineering facility data with our metrics analysis system in a single system called Sasquatch. Based on Kaflka and InfluxDB, Sasquatch provides the observatory with a consolidated solution for capture, query, alerting and replication of time-series data. In this talk we'll cover our experience with this toolchain, already in production at the summit during Rubin commissioning.

Modernizing IRAF to Support Gemini Data Reduction

Michael Fitzpatrick

The Community Science and Data Center (CSDC) and US National Gemini
Office (US NGO) at NSF's NOIRLab is nearing completion of a project intended
to upgrade the IRAF-based Gemini reduction software to provide a fully
supported system capable of running natively on modern hardware. This work
includes 64-bit ports of the GEMINI package and dependency tasks (e.g. from
STSDAS), upgrades to the core IRAF system and all other external packages to
fix any platform and licensing problems, and the establishment of fully
supported Help Desk and distribution systems for the user community. The
project provides a bridge solution until the new DRAGONS software is
available for all facility instruments and modes as well as additional
benefits to the wider IRAF community.

Early results show a 10-20X speedup of reductions using the native 64-bit
software compared to the virtualized 32-bit solutions now in use. Results
are even better on new Apple M1/M2 platforms where the additional overhead of
Intel CPU emulation can be eliminated. Timing comparisons, science
verification testing and release plans will be discussed.

New Generation of Proposal Handling System (PHS) for ESA Missions: Evolution of XMM-Newton Software

Jose Antonio Quero Reina

The latest evolution of the XMM-Newton-mission PHS subsystem has expanded to a new development performed for the XRISM mission. XRISM is a JAXA-led X-ray observatory, developed through an international collaboration with NASA and ESA. New developed or updated software tools make up the entirety of the PHS subsystem, which is of vital importance for observatory-type missions, that include Announcements of Opportunity (AO) on their calendar. Future missions as the ESA led PLATO, ARIEL or the new ATHENA can be candidates for using them in the future. The tools are listed here below.
XIPS (XMM-Newton Interface for Proposal Submission) became operational just two years ago in XMM-Newton AO-21, replacing the HRPS (NASA HEASARC) software used since very early in the mission.
Designed for massive use, with a high activity in a few hours, it consists of a frontend developed with Angular 13, and a Backend developed with Java 11, using the SpringBoot-2 framework.
In its development, scalability has been taken into account so that it can be used in the future as an extended internal Proposal Handling system.
Visibility & Search Tool, with Ad-hoc software utilities for each mission, can share a frontend and access to data in a standardized way (eg VOTable). The XMM-Newton project is working on a new generation of this software, based on Vaadin 24 web framework (latest version), using SpringBoot 3 and Java Servlet 3.
XPET (XRISM Proposal Evaluation Tool) is the tool that will be used by the ESA-XRISM -Time Allocation Committee (TAC) members to evaluate, rank, and accept or reject proposals submitted from XRISM ARK/RPS. Based on the background and functionality of the XMM-Newton TAC tools, it has been developed with Angular 15 for the frontend, and with Java 11 for the backend, using the SpringBoot-2 framework. The purpose of the tool is to analyze all the observation proposals received for the AO of a mission, cataloguing and ranking them. The tool allows proposals to be assigned to specific committee members for evaluation. For this purpose, panels and panel’s chairpersons can be established to have greater control over the process. Finally, results can be handled by the administrators & chairpersons and exported from the database as a sheet or directly as a set of values.
Helpdesk. A new ESA Helpdesk, based on Jira Service Management, is being designed to provide support to the community on the above software.

New software tools provided by the Minor Planet Center

Paresh

In recent years the Minor Planet Center (MPC) has started to transform its processing and publication systems to better handle the increasing volume of submitted data. This has included both the update of older processes and systems, as well as the development of completely new functionality and services.

In this poster I will present some of the new and in-development software tools, such as:

Digital Object Identifiers (DOIs) for Minor Planet Electronic Circulars (MPECs) that are again searchable via The Astrophysics Data System (ADS);
Web-interface and APIs to allow the reporting of cometary activity;
An API to provide query access to the exposures (sky coverage) database.

In addition to that we will also show how to access common information from the MPC homepage, as well as provide links to relevant web pages where new software tools or existing upgrades can be found.

Software, tools and standards for Solar System, heliophysics, and planetary research

O-type Stars' Stellar Parameter Estimation with ANN

Luis J. Corral

We present the results of the implementation of a deep learning system that estimates the effective temperature and surface gravity of O-type stars. The proposed system was trained with a database of 5,557 synthetic spectra calculated with the CMFGEN stellar atmosphere code and covers stars with Teff from ∼20000 K to ∼58000 K, log(L/L⊙) from 4.3 to 6.3 dex, log g from 2.4 to 4.2 dex, and mass from 9 to 120 M⊙
The validation of the system was performed by processing a sample of twenty O-type stars taken from the IACOB database, and a subgroup of eleven stars of those twenty taken from The Galactic O-Star Spectroscopic Catalog (GOSC) with lower resolution.

Observers' Data Access Portal at KOA

Toba Oluyide

For all active instruments, the Keck Observatory Archive (KOA) (https://koa.ipac.caltech.edu) now ingests raw data from the Keck Telescopes within 1 minute of acquisition, quick look reduced data within 5 minutes of creation, and science-ready reduced data for four instruments as they are created by their automated pipelines. On August 1 2023, KOA released the Observers’+ Data Access Portal (ODAP), which enables observers at the telescope and their collaborators anywhere in the world to securely monitor and download science, calibration and quicklook data as they are ingested into the archive. The portal is built using Python socketio websockets, which ensure that metadata appear in the portal as the data themselves are ingested. The portal itself is a dynamic web interface built with React. It enables users to view and customize metadata fields, filter metadata according to data type, and download data as they are ingested or in bulk through WGET scripts. Observers have used the ODAP since its release and have provided feedback that will guide future releases.

Software, tools and standards for Solar System, heliophysics, and planetary research

Open developments towards VHE multi-messenger astrophysics: the open library Gammapy with the Very-high-energy Open Data Format

Claudio Galelli

The last decade has shown the development of the multi-messenger astrophysics with detections of gravitational waves, catalogues of very-high-energy (VHE) gamma rays and detection of astrophysical VHE neutrinos. The needs of interoperability between instruments and of joint analyses lead the VHE community to improve the data formats, as well the analysis software.

Gammapy is a community-driven open-source Python package based on astropy, numpy and scipy. Since 2014, this library is being initially developed for VHE gamma-ray astronomy and it is now the base of the science analysis tool of the Cherenkov Telescope Array (CTA) observatory. The recent years, it has been expanded to provide support for analysis methods in the multi-wavelength (MWL) and multi-messenger (MM) astrophysics domain. It is being utilized by a wide collection of pointing and all-sky instruments, such as H.E.S.S., VERITAS, MAGIC, Fermi-LAT, and HAWC, and tested for X-ray (XMM-Newton) and neutrino (KM3NeT) data. Gammapy operates on open FITS-based formats for science-ready high-level data (event list with their associated instrument response files).

In parallel, the VHE community is developing open data formats permitting a real interoperability between instruments. 11 major astroparticle experiments joint their effort into a open initiative to build a Very-high-energy Open Data Format (VODF). This format aims to provide standards in the VHE domain, from the science-ready high-level data to the astrophysical products (spectrum, light curve, sky maps) and catalogues.

In the general context of the open science commitments from research institutions, this contribution aims to describe the VHE gamma-ray and neutrino community effort to improve the interoperability of data between instruments. We will present the VODF initiative and its work to build an open data format respecting FAIR principles and following as much as possible the IVOA (International Virtual Observatory Alliance) recommendations. The work within the IVOA, e.g. the IVOA ObsCore (Observation Core Component) Data Model, will introduce VHE specificities and needs for data discovery. We will present the parallel efforts of the Gammapy project that targets for its next v2.0 LTS release to follow the FAIR4RS principles and the in-development high-energy IVOA recommendations, by using the VODF format.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Outlier Identification in the Chandra Source Catalog

Dustin Swarm

Outlier identification algorithms (OIAs) can help astronomers looking for worthwhile targets of study in a sea of data by focusing investigations on smaller sets of objects that do not follow the trends of the larger population. We applied a Principal Component Analysis (PCA) and an unsupervised Random Forest (uRF) to high-significance sources in the Chandra Source Catalog v.2 (CSC2). We found 119 sources that appeared in every application of the uRF OIA. We compare these 119 outliers with the rest of the analyzed CSC2 sources and crossmatch them with the SIMBAD astronomical database. We investigated 5 outliers located within the Chandra ACIS field of view of the Galactic center as accreting-white-dwarf candidates, using spectral analysis to characterize the systems and estimate white dwarf mass.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Grégory Mantelet, Markus Demleitner

ADQL is a language defined by the IVOA for querying astronomical data. It stands for Astronomical Data Query Language. It is a fork of SQL-92 in which only the query features are used. Astronomical functions and operators have been added, in particular to query data by position. This language is mainly used in the IVOA protocol called TAP for querying relational astronomical data.

Until now, the ADQL grammar has been described in a BNF (Backus-Naur-Form)-inspired formalism, largely following SQL-92 itself. However, in its current form it is not actually used by any implementation for several reasons, including some minor mistakes and lacunae, which were in practice filled borrowing from other SQL implementations. Also, the lack of stringent tokenisation rules resulted in differing interpretations of corner cases (e.g. the string literal '49a').

In the next version of ADQL, it is therefore proposed to change the ADQL language notation from BNF to PEG (Parsing Expression Grammar). In PEG, parsing and tokenisation are specified in a uniform way, and PEG's own grammar is standardised sufficiently well that multiple interoperating implementations can be used off the shelf to obtain parse trees of ADQL clauses (at least conceptually).

This poster aims to show what are the difference between the two notations and what improvements this implies for ADQL.

PSFMachine: a Python library for rapid PSF photometry on Kepler/TESS data

Jorge Martinez-Palomera

NASA’s Kepler, K2, and TESS missions employ simple aperture photometry to derive time-series photometry, where an aperture is estimated for each star, and pixels containing each star are summed to create a single light curve. This method is simple but could result in highly contaminated photometry in crowded fields. The alternate method of fitting a point-spread function (PSF) to the data can account for crowding but is computationally expensive. The Linearized Field Deblending (LFD, Hedges et al. 2021) method introduced a new approach to PSF photometry with simplified assumptions that improve computational performance. The LFD method uses precise astrometry from Gaia catalogs to fix sources in the field and fit the PSF shape with a linear model. The method also includes a perturbed PSF model that fits PSF changes due to velocity aberration and instrument systematics. The LFD method is implemented in the open-source Python library PSFMachine. This library enables users to rapidly extract light curves of sources from Kepler/K2/TESS Target Pixel Files (TPFs). The API provides PSF photometry extraction via pre-computed models and extraction metrics. The API can be used to create custom PSF shape models using full-frame images (FFIs) and perturbed PSF models with multiple basis vectors (e.g. position correctors) which can be saved for later use on TPF data. With PSFMachine users can extract robust light curves using single Kepler quarter TPFs with ~400 sources in ~5 min, or using TPFs from single TESS sector/camera/ccd with ~1000 sources in ~8 min. This method has been recently applied to the Kepler archive to extract more than 600,000 PSF light curves and currently supports Kepler/K2 TPFs, FFIs, SuperStamps, and TESS TPFs and FFIs.

Software, tools and standards for Solar System, heliophysics, and planetary research

Perl Metaprogrammed TikZ Spectra for the AAOmega Spectrograph

Travis Stenborg

AAOmega is a dual-beam, bench-mounted spectrograph of the 3.9 m Anglo-Australian Telescope. No shortage of tools exist for plotting astronomical spectra like those generated by AAOmega. Perl code is presented here however, for metaprogramming TikZ spectral plots styled to be visually harmonious with LaTeX documents. From a functional standpoint, the Perl software co-plots FITS data from both arms (red and blue), and autogenerates axis tick marks and labels customised to the data bounds. Automatic plot adaptation is not only to flux, but also to wavelength, automating visualization across AAOmega's configurable wavelength range. Example plots are presented, such as for 2dfdr-reduced spectra from the Two Degree Field multi-object fibre feed to AAOmega.

Planetary World Coordinate System in Astropy

Chiara Marmo, Erard

In the framework of the Europlanet H2024 European grant [1] some work has been done to define planetary standards for FITS World Coordinate System [2].
Different research communities are involved in planetary coordinate standardization.
Geologists and Remote Sensing specialists work on extending Earth standards to Planets using Geographical Information Systems (GIS) and coordinate descriptions endorsed by the Open Geospatial Consortium (OGC).
Astronomers work to define FITS World Coordinate System [3] for planetary bodies.
To improve interoperability between those two worlds we implemented the planetary WCS description in Astropy.
This poster describes the related new features available in Astropy 6.0:
- the possibility to define a Geodetic Coordinate Representation on a custom spheroid using its equatorial radius and flattening;
- the definition of a new Bodycentric coordinate representation for custom spheroids;
- the possibility to read and write planetary WCS keywords to define a body-fixed planetary reference frame.
Some examples of applications, also available as jupyter notebooks tutorials, will be discussed.
The work has been funded by the EuroplanetH2024 Research Infrastructure (RI) European project [1] which has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 871149.

[1] http://www.europlanet-2024-ri.eu/
[2] https://doi.org/10.1029/2018EA000388
[3] https://fits.gsfc.nasa.gov/fits_wcs.html

Software, tools and standards for Solar System, heliophysics, and planetary research

Planetary science discovery portals with spatial selection

le sidaner

The Vespa service uses the access protocols and infrastructure of IVOA, the International Virtual Observatory Alliance, to provide access to Planetary science data. Building on its success, the project provides a unified search system for data from more than forty data centres, including the PSA and several NASA nodes.
The search web portal http://vespa.obspm.fr offers advanced search functionalities and complex cross-searches. In addition, we are developing a discovery portal based on the elasticsearch search engine and also using spatial search on mapped objects. We propose to present spatial search tools based on multi-scale representation standards (HIPS and MOC) and the Aladin Lite visualisation client.
This work was carried out as part of the The Europlanet-2024 Research Infrastructure project has received funding from the European Union's Horizon 2020 research and innovation program under grant agreement No 871149

Software, tools and standards for Solar System, heliophysics, and planetary research

Point-spread-function interpolation as a key to precise scientific outcome

Mariia Demianenko

Nowadays, astronomers perform point spread function (PSF) fitting for most types of observational data. Interpolation of the PSF is often an intermediate step in such algorithms. The scope of the PSF interpolation challenge is illustrated by fundamental astrophysical tasks. HST WFC3 PSF interpolation (including interpolation over the focal plane) has implications for cosmological constants measurements. In the case of the Multi-AO Imaging Camera for Deep Observations (MICADO) at the Extremely Large Telescope (ELT), PSF interpolation will play a crucial role in high-precision astrometry of globular clusters and confirmation of the presence of intermediate-mass black holes. The Enhanced Resolution Imager and Spectrograph (ERIS) on the Very Large Telescope (VLT) is a recently commissioned analogue of the upcoming MICADO@ELT. Significant PSF variations across the field of view invalidate the approach of deconvolving with the mean PSF or on-axis PSF. This can be unsatisfactory when performing Single Conjugate Adaptive Optics (SCAO) observations, as these sophisticated and expensive systems are designed to achieve high resolution with ground-based telescopes by correcting for atmospheric turbulence in the direction of one reference star.
Our study aims to demonstrate how interpolation techniques affect scientific outcomes. To test it, we applied interpolation algorithms to the simulated SCAO-assisted MICADO@ELT, ERIS@VLT, and empirical HST WFC3 PSF grids. We use cross-validation and calculate physical-motivated metrics as well as probabilistic metrics. Through our investigation, we shed light on the nuanced challenges posed by PSF interpolation and emphasize its critical implications for advancing our understanding of the Universe.

Predicting the Radiation Field of Molecular Clouds using Denoising Diffusion Probabilistic Models

Duo Xu

Accurately quantifying the impact of radiation feedback in star formation is challenging. To address this complex problem, we employ deep learning techniques, denoising diffusion probabilistic models (DDPMs), to predict the interstellar radiation field (ISRF) strength based on three-band dust emission at 4.5 \um, 24 \um, and 250 \um. We adopt magnetohydrodynamic simulations from the STARFORGE (STAR FORmation in Gaseous Environments) project that model star formation and giant molecular cloud (GMC) evolution. We generate synthetic dust emission maps matching observed spectral energy distributions in the Monoceros R2 (MonR2) GMC. We train DDPMs to estimate the ISRF using synthetic three-band dust emission. The dispersion between the predictions and true values is within a factor of 0.1 for the test set. We further evaluate the diffusion model's performance on new simulations with ISRF intensities 10 and 100 times higher than that of the fiducial simulations. Despite a systematic underestimation factor of 1.8 and 2.7 for the higher ISRF simulations, the relative intensity remains well constrained. Meanwhile, our analysis reveals weak correlation between the ISRF solely derived from dust temperature and the actual ISRF. We apply our trained model to predict the ISRF in MonR2, revealing a correspondence between intense ISRF, bright sources, and high dust emission, confirming the model's ability to capture ISRF variations. Our model provides a robust means to predict the distribution of radiation feedback even where the ISRF is complex and not well constrained, such as in regions influenced by nearby star clusters.

Preliminary results of a new Deep Learning model to predict from the orbital parameters the background count rates of the AGILE Anticoincidence System.

Nicolò Parmiggiani

AGILE is a space mission launched in 2007 to study X-ray and gamma-ray astronomy. Since 2009, the AGILE satellite continuously spins around its sun-pointing axis, with an angular speed of about 0.8 degrees/sec, thus completing a rotation every ~7 minutes. This work uses data acquired during the so-called "spinning mode" observing period. AGILE has an anti-coincidence system (ACS) comprising five independent panels surrounding all AGILE detectors to reject background-charged particles efficiently. The ACS detects hard-X photons in the 50 - 200 KeV energy range and continuously stores each panel count rate in the telemetry as ratemeters data, with 1.024 sec resolution.

We developed a new Deep Learning (DL) model to predict the background value of the AGILE ACS top panel (perpendicular to the pointing direction of the payload detectors) using the satellite's orbital parameters. This model aims to learn how the orbital and spinning modulations of the satellite impact the background level of the ACS top panel, with the final scientific goal of developing a reliable method for detecting Gamma-ray bursts (GRB) and other transient events.

The DL model executes a regression problem and is implemented with a dense neural network of three hidden layers. The first layer has 1024 neurons, while the last two layers have 512 neurons. The input of the model is the AGILE orbital parameters (e.g., detector pointing, altitude, etc.), and the output is the predicted count rate of the ACS panel. The model is trained with a supervised learning technique. We created a dataset of more than twenty million orbital parameter configurations extracted from the 2020 data archive with the associated ACS top panel ratemeters (the labels). We split the dataset into training and test datasets with respective percentages of 90% and 10%. To improve the training process, the orbital parameters and the ACS ratemeters are normalized between 0 and 1. We excluded from the dataset the passages into the South Atlantic Anomalies and the time windows where already-known GRBs were detected by the AGILE or other instruments (GRB list taken from the public GRBweb catalog) to analyze only background time windows. The training uses the Nadam optimizer and the Mean Absolute Error loss function. We trained the model for 127 epochs with a batch size of over 130 thousand orbital parameter configurations.

We evaluated the trained model using the test dataset containing more than two hundred thousand orbital parameter configurations and compared the predicted ACS top panel ratemeters with the real ones. The results show that the model can reconstruct the background level of the ACS top panel with an accuracy of 96.7%, considering the orbital modulation and spinning of the satellite. Starting from these promising results, we are developing an anomaly detection method to detect GRBs when the differences between predicted and real ratemeters exceed a predefined threshold.

Presentation of the Astronomical Table Serialisation System Query tool (QATSS) and its ecosystem.

F.-X. Pineau

The Astronomical Table Serialisation System (ATSS) comprises a data model that represents tabular data along with the abstractions needed for serialization and deserialization across diverse data formats such as CSV, VOTable or FITS.

Drawing inspiration from the Serde.rs framework, ATSS shares conceptual similarities with the "format-neutral" core of STIL within TOPCAT. One of ATSS's distinctive features is its capability to discern three distinct data representations: a conventional in-memory representation; an also conventional ASCII representation; a potential representation at the storage byte level. The byte-level storage representation offers compression opportunities while preserving the FITS BINTABLE property of a consistent byte count per row.

The QATSS tool has been developed to perform queries on files in any format supported by, and implemented for, ATSS. Its capabilities encompass row filtering, column selection, position-based queries (including but not limited to cone, multi-cone, and MOCs queries), index-based queries, and multi-threading, among others. Notably, QATSS is currently in active production within VizieR to query large catalogs stored internally in a specialized CDS format.

Processing large Radio Astronomy data cubes within an Objectstore

Gordon WH German

The future Square Kilometre Array (SKA) telescope and its current precursors such as the Australian SKA Pathfinder and the Murchison Widefield Array are changing the way in which we handle large data. Typical ASKAP data cubes can be on the scale of a terrabyte or so; SKA data cubes may be larger by two orders of magnitude or more.

Reduction of these data can only efficiently occur in High Performance Compute (HPC) facilities. Modern HPC centres are moving to object storage for long-term storage of data, as opposed to the traditional POSIX-based file systems. They offer virtually limitless scalability, greater searchability (via metadata attributes), resilliency and cost efficiency. However, virtually all algorithms used by radio astronomers assume an underlying POSIX file system, with its familiar file methods of open(), write(), seek() etc. To work with objectstores, data must firstly be staged out to short-term POSIX file-system storage, prior to processing the data. This is not a trivial exercise; staging multi-terrabyte data sets may take several hours to days.

I present an alternative methodology to avoid this double-handling of data. A python wrapper requests cutouts from the datacube in the objectstore and converts the received stream into arrays to be fed directly into the process (in this case the source-finder SoFiA-2). This is shown to be considerably faster than staging out data to a scratch file system and then processing.

Cloud infrastructures for astronomical data analysis

Profiling and Optimizing the High Performance Gridder

Preshanth Jagannathan

Radio interferometric imaging with current and future instruments requires the handling of large datasets. GPUs provide an effective means of dealing with this data deluge. However, to fully leverage the computational power available on these GPUs, a tightly coupled iterative approach involving performance engineering and software engineering is required.

In the case of the high-performance gridder, we have utilized the Kokkos software framework to develop high-level C++ code that can be deployed across a wide range of software environments, benefiting from compile-time optimizations. In this presentation, we will explore the role of performance engineering in the context of the development and testing of the high-performance gridder. We will demonstrate the performance improvements achieved through optimizing already highly performant code, which has been tested on a variety of GPUs spanning different architectures and hardware generations

GPU implementations for core astronomical libraries

Prototyping access from visualisation tools to SKA science images and cubes stored in a rucio DataLake through IVOA discovery and access services

François Bonnarel, Susana, Marco Molinaro, Pierre Fernique, Vincenzo Galluzzi, Thomas Boch, Manuel Parra-Royón, caroline bot, Mark Allen, Jesus Salgado, Matthieu Baumann, Alessandra Zanichelli

Prototyping access from visualisation tools to SKA science images and cubes stored in a rucio DataLake through IVOA discovery and access services.

M.Allen, R.Barnsley, M.Baumann, F.Bonnarel, T.Boch, C.Bot, R.Butora, J.Collinson, P.Fernique, V.Galluzzi., R Joshi, M.Molinaro, M. Parra-Royon, J. Sanchez-Castaneda , S. Sanchez-Exposito, G.Tudisco, F .Vitello A.Zanichelli.

SKA is the major low frequency radioastronomy project of the future with several major scientific applications: It will upgrade the amount of available science data by several orders of magnitudes reaching eventually more than 700 petabytes of storage per year. The SKA observatory will proceed to the initial data processing to deliver observatory data products while the SKA Regional Center network (SRC) will provide storage for those and processing capabilities to deliver and store advanced data products for the user community.
Within the scope of the SRC network, Orange (visualisation), Magenta (data management) and Coral (node implementation) teams have prototyped the discovery acces and visualisation of science data. Our visualisation tools VisiVO and Aladin discover, access and visualize test science data produced by SKA pathfinders stored in the rucio DataLake. Science metadata functionality has been implemented by the Magenta team to the Rucio data lake prototype to demonstrate a means of enabling IVOA-compliant data discovery and server-side processing.
VisiVo, Aladin Desktop and Aladin Lite are able to query the Discovery service built on ObsCore and SCS IVOA protocols.
This allows them to load DataLink responses providing links towards a SODA cutout service developed by the Orange team able to extract subcubes or images directly from the datasets stored in the rucio DataLake.
The Rucio Storage Element and SODA developments have been deployed and configured on the Spanish SRC node, providing computing and storage resources, managed by the Coral Team members. This prototype paves the way to collaborative development in the SKA regional center network and shows the possible integration of VO services and visualisation tools in DataLakes and science platforms.

Science with data archives: challenges in multi-wavelength and time domain data analysis

PyCPL: The ESO Common Pipeline Library in Python v1.0

Mrunmayi Sudhakar Deshpande

PyCPL allows an pythonic approach to access the ESO’s Common Pipeline Library (CPL) for astronomical data reduction. It provides Python interface to the powerful CPL library, while also allowing users and developers to take full advantage of the rest of the scientific Python ecosystem. In 2003 CPL was developed in C for its efficiency and speed of execution, and because of its maturity and widespread use, it is well tested and understood. With the community however moving away from C/C++ programming and embracing Python for data processing tasks, there is a need to provide access to the CPL utilities within the Python environment
With the latest version being released users can now install PyCPL to run existing CPL recipes (written in C) and access the results from Python.
It also provides the ability to create new recipes in Python using functionality provided by CPL.

Quantum Convolutional Neural Networks for the detection of Gamma-Ray Bursts in the AGILE space mission data.

Alessandro Rizzo

Quantum computing represents a cutting-edge frontier in artificial intelligence, proposing to enhance machine learning and deep learning techniques, leveraging quantum mechanic principles, such as superposition and entanglement. The work presented in this project falls within the context of the AGILE space mission, launched in 2007 by the Italian Space Agency to study X-ray and gamma-ray phenomena. AGILE is equipped with detectors capable of collecting information about gamma and X-rays.

This work aims to investigate the feasibility and potential advantage of quantum machine learning methods in the context of the AGILE space mission. We implemented different quantum deep learning algorithms that analyze the data acquired by the instrument onboard AGILE to detect the Gamma-Ray Bursts from sky maps or light curves. Moreover, we want to compare the results obtained from Quantum Neural Networks (QNNs) with those obtained from classical networks in order to see if the former can provide an improvement.

We evaluated and adapted to our problems several QNNs implemented through various frameworks that have been employed to simulate the behaviour of a quantum computer: TensorFlow-Quantum, Qiskit, and PennyLane.
In order to measure the performances of different networks and to use both sky maps and light curves’ datasets, we developed different quantum architectures with an embedding layer used to represent input data as quantum states, taking a classical data point and translating it into a set of gate parameters in a quantum circuit. The methods presented in this work use amplitude encoding and re-uploading techniques.

The QNN and the classical architectures are evaluated using the same dataset of sky maps and time series. The QNN implemented with PennyLane achieved the best performances among the analysed architectures, reaching an accuracy of 0.951 with the dataset of sky maps and 0.875 with the dataset of time series. On the other hand, the classical version of the network achieved an accuracy of 0.984 for the former and 0.875 for the latter. Nevertheless, the quantum approach uses only 51 trainable parameters, with each convolution and pooling layer consisting of 15 and 2 parameters respectively, while the classical network uses more than 100k for sky maps. The results that we obtained using the QNN are similar to those obtained with classical machine learning, but with sensitively fewer trainable parameters. The adoption of quantum models for GRBs detection in the data acquired by the AGILE instruments allows us to deal with a simplification of the optimisation process while achieving close to state-of-the-art performance results. Therefore, this work can lead to some insights into the importance of quantum deep learning showing how it can ease the complexity of deep learning problems, achieving similar results to the classical approach.

RFInder: Radio Frequency Interference data evaluation and reduction

Athanaseus Ramaila

Radio Frequency Interference (RFI) is a significant problem for radio astronomy observations, becoming increasingly difficult to mitigate as the number of radio transmitters increases. RFInder is a Python library that uses Casacore to read MS files (the standard format of radio astronomy data) to identify and visualize RFI in their observations. RFInder products include interactive HTML and GIF files that visualize the presence of RFI, the percentage of flagged visibilities due to RFI, and the estimated visibility noise. These visualizations can be used by astronomers to identify and characterize RFI in their observations and to assess the impact of RFI on their data quality. The impact of RFI on an interferometer varies depending on the lengths of its baselines and the observation frequency. RFInder shows the amount of flagged RFI depending on frequency and baseline length, as well as the predicted noise given the flags. RFInder is a command-line tool, available on the Python Packaging Index and GitHub, that integrates seamlessly into pipeline frameworks. CARACal, a radio interferometry data reduction pipeline, leverages RFInder as a component of its toolkit to manage RFI in various MeerKAT Large Survey Programs, thereby providing a layer of quality control for the data processing. RFInder has shown to be a valuable tool for radio astronomers as it can identify trends in RFI levels, such as whether RFI is increasing or decreasing, and new RFI sources over time. This is important because RFI can contaminate radio astronomical signals, reducing the quality of the scientific outcomes.

Radio celestial source fringe signals detection based on Transformer self-attention mechanism

RuiqingYan

Radio observations serve as a powerful means of exploring the universe. Tianlai telescope observes the dark energy of the universe through radio radiation. The Tianlai telescope is a radio array composed of multiple radio telescopes. It obtains the distribution of matter in the universe by observing the 21cm spectral line of neutral hydrogen, and indirectly realizes the observation of dark energy. However, the 21cm spectral line of radio observation is very weak, and it is easily interfered by other weak celestial sources, which is a problem that must be solved for normal operation. In order to detect these weak radio signals, we propose a celestial source fringes detection method based on the Transformer self-attention mechanism. To prove the effectiveness of our proposed method, we performed some experiments on the method and compared with other deep learning methods in terms of mean of Average Precision. We also verified the impact of data quality on the model, and fringe signals can be detected at low signal-to-noise ratios. Experimental results show that the method achieves comparable detection performance for radio celestial source fringes in terms of accuracy and location regression. This study is important for the detection of dark energy in the universe through radio observation of the neutral hydrogen 21cm signal.

Ready-to-Use Astronomy Containers from CADC

Brian Major

The Canadian Astronomy Data Centre (CADC) is now publishing software images that are ready for use by the astronomy community. The services and tools available are in the form of docker images, ready to be deployed on your container orchestration system, such as kubernetes. Configuration has been standardized and simplified so that no customized changes to the images are required.

These services and tools cover four main areas: 1) Storage Inventory, for the management of archival file storage for a science data archive; 2) CAOM metadata services and tools, for observation harvesting and data engineering; 3) CANFAR Science Platform, for data analysis sessions and user storage; and 4) Science Containers, common astrononmy software to be run on the Science Platform.

The full complement of services and tools results in a feature rich ecosystem that meets many of the requirements of an astronomy data center. This has in fact been demonstrated in the SKA context at a number Science Regional Centre nodes.

Cloud infrastructures for astronomical data analysis

Revealing Predictive Maintenance Strategies from Comprehensive Data Analysis of ASTRI Horn Historical Data

Federico Incardona

Telescope facilities in modern astronomical research generate a substantial volume of data, including scientific observations and a large amount of housekeeping and auxiliary information from diverse sources, such as weather stations, sensors, log messages, LiDARs (light detection and ranging), and FRAMs (photometric robotic atmospheric monitor). The multitude of sensors spread throughout these facilities makes them some Internet of Things (IoT) environments. Handling and processing this vast amount of data necessitate sophisticated software architectures, which exploit cutting-edge technologies in the field of IoT and big data. While sensor data traditionally address systematic errors in scientific measurements, this paper explores their potential for novel maintenance techniques, akin to those in Industry 4.0.

Predictive maintenance has emerged as a proactive strategy to optimize the performance and operational efficiency of complex systems. In the context of telescope arrays, where downtime can significantly impact scientific research, the application of predictive maintenance assumes critical importance. Traditional maintenance practices often rely on scheduled routines or reactive approaches, leading to potential equipment failures and costly repairs. However, by exploiting the vast amounts of data generated by telescope facilities, a new era of maintenance for telescope facilities is unfolding.

This study provides significant outcomes resulting from an in-depth analysis of historical data gathered from ASTRI Horn, a Cherenkov telescope positioned at the Astrophysical Observatory of Catania (Serra La Nave, Mount Etna). The research focuses on a comprehensive exploration of data patterns spanning seven years of telescope activities. Within this analysis, we delved into the distribution of variables and how they correlate with each other. Additionally, we varied the analysis interval's granularity to assess the correlation time scale. The findings of this analysis provide valuable insights into potential progressions in strategies for predictive maintenance within telescope facilities.

Running Roman pipelines in NASA cloud with Airflow and Kubernetes

Emmanuel Joliet

The Nancy Grace Roman Space Telescope is a NASA observatory to unravel dark energy and dark matter, search for and image exoplanets, and explore infrared astrophysics. This talk highlights the solution adopted by the Roman SSC team at Caltech/IPAC to run data pipelines using Airflow and Kubernetes in the NASA cloud.

Cloud infrastructures for astronomical data analysis

New plugins were developed to integrate ImageJ in workflows centered on the Virtual Observatory (VO) environment.
In the scope of Virtual European Solar and Planetary Access (VESPA) project, a specific plug-in has been developed to provide SAMP connection and receive images from other VO tools.
This development also improves the support of fits files and compressed formats in ImageJ.
The plugins were tested with recent versions of ImageJ (v1.53t) and AstroImageJ (v5.1.3), which both required significant modifications with respect to previous versions. Since ImageJ result analysis can be of various types (image, tables, vectors, spectra…), no SAMP feedback connection has been installed at this point.
The new SAMP connection to ImageJ will provide extended visualization (e.g., TIFF format, especially useful from search clients), conversion between various image formats, and high level image processing functions to the VO.
Europlanet 2024 RI has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 871149

Software, tools and standards for Solar System, heliophysics, and planetary research

SKA Regional Centres Architecture: One data lake, multiples nodes

Jesus Salgado

The SKA Observatory is a next-generation radio astronomy facility that will help to revolutionise our understanding of the Universe and the laws of fundamental physics. The observatory has three locations: in South Africa's Karoo region (SKA_MID), Western Australia's Murchison Shire (SKA_LOW) and the Global Headquarters in the United Kingdom. The SKA_MID and SKA_LOW locations will be capable of producing a stream of science data products on the order of 700 PB/year. This large data volume is unprecedented for the astronomical community and thus poses unique challenges for curating and providing access to the datasets and resources required to analyse them in order to derive the final scientific insights. The approach chosen is the development and adoption of the SKA regional centre concept in the form of a loose SRCNet association consisting of regionally funded contributions.

The SRCNet data lake will be centrally managed but distributed and federated at the storage elements level. Known challenges of data lakes should be addressed like data exploitation of the data lake through the integration of data and computing and data latency due to distributed repositories. We present the architecture design that is being developed for the SRCNet to allow scientific analysis of the SKA data from the SRCNet data lake that minimises as much as possible the throwbacks of the federated data lakes.

Science with data archives: challenges in multi-wavelength and time domain data analysis

SOFIA/GREAT: inflight operations of a heterodyne receiver

Ronan Higgins

GREAT was a heterodyne receiver that flew on board the SOFIA observatory from early science flights in 2011 until the close of the mission in 2022. GREAT was a PI led instrument that observed at frequencies ranging from 480 to 4700 GHz. In this talk I will discuss the in-flight instrument operation, how observations were undertaken, how the receiver was tuned, how the data quality was monitored and processed in-flight. Unique to the airborne observatory was the system was offline with no internet connection allowed due to IT security reason, I will discuss some of the peculiarities of this setup and how we maintained an up to-date system on the aircraft.

SOFIA/GREAT: post-ops and quality measures for heterodyne data

Juan Luis Verbena

GREAT was a heterodyne array receiver that flew on board the SOFIA observatory from early science flights in 2011 until the close of the mission in 2022. In this work we discuss the SOFIA/GREAT post operation activities, such as: i) Support for the Astronomical Community, ii) maintenance of the GREAT data archive, iii) preparation of special datasets: OI data and atmospheric spectra, iv) GREAT data reduction pipeline, v) quality flags: comparison between theoretical radiometer RMS noise and real noise in the spectra, radio frequency interference (RFI), the presence of standing waves, assessment of the baseline quality and assessment of the quality of the atmospheric fit.

Science with data archives: challenges in multi-wavelength and time domain data analysis

SPARCL: SPectra Analysis and Retrievable Catalog Lab

Stephanie Juneau

SPectra Analysis & Retrievable Catalog Lab (SPARCL) at NOIRLab's Astro Data Lab was created to efficiently serve large optical and infrared spectroscopic datasets. It consists of services, tools, example workflows and currently contains spectra for over 7.5 million stars, galaxies and quasars from the Sloan Digital Sky Survey (SDSS) and the Dark Energy Spectroscopic Instrument (DESI) survey. We aim to eventually support the broad range of spectroscopic datasets that will be hosted at NOIRLab and beyond. Major elements of SPARCL include capabilities to discover and query for spectra based on parameters of interest, a fast web service that delivers desired spectra either individually or in bulk as well as documentation and example Jupyter Notebooks to empower users in their research. Learn more at https://astrosparcl.datalab.noirlab.edu

Science with data archives: challenges in multi-wavelength and time domain data analysis

Simulating stellar color-magnitude diagrams on the GPU

Laszlo Dobos

Simulating stellar color-magnitude diagrams on the GPU

L. Dobos, C. Filion, T. Budavári, R. Wyse, A.S. Szalay
Dept. of Physics & Astronomy, The Johns Hopkins University
Generating synthetic photometric catalogs of fields containing a large number of stars belonging to multiple stellar populations, such as the composite stellar populations of nearby satellite galaxies with resolved stars and the various sub-populations of the Galactic foreground, is computationally intensive. Stellar populations are usually characterized by their metallicity, age and distance distributions, which are often correlated, whereas the initial mass of each star is thought to come from a universal Initial Mass Function (IMF). Precomputed isochrone grids for a large number of magnitude systems are available to convert these fundamental stellar parameters to magnitudes. Interpolation between isochrones, however, must be done in the parameter called the Equivalent Evolutionary Phase (EEP), instead of initial mass, since stars with the same initial mass but slightly different age (or metallicity) can be in entirely different evolutionary phases. Hence, isochrone grids are tabulated for metallicity, age and EEP and interpolating the magnitudes from metallicity, age and initial mass requires an implicit interpolation scheme where one has to find the EEP as well, as part of the solution. On the other hand, synthetic catalogs are often generated with given magnitude and color cuts which must be observed during random sampling. Due to the steep IMF, the large number of low-mass, faint stars are likely to cause high random sampling rejection rates. While intricate heuristics can help mitigate the latter problem, our GPU-based solution to synthetic catalog generation is fast enough to generate hundreds of thousands of stars on the time scale of tens of seconds with broadly available hardware. Our software library, built with eager mode TensorFlow, solves the implicit isochrone interpolation problem, as well as provides a relatively simple, yet very flexible way of describing mixtures composite stellar populations for detailed modelling.

GPU implementations for core astronomical libraries

Study of a spectral analysis system for planetary surfaces

Erard

The composition of planetary surfaces is mainly studied through optical and near-IR observations from imaging spectrometers on space missions. This data are notoriously difficult to interpret because diagnostic features are subdued and hidden in highly correlated dataset.
A complete VO workflow to interpret such observations is being assessed, relying on:
- An absorption extraction algorithm based on multiscale wavelet analysis (Erard et al 2011, https://doi.org/10.1016/j.pss.2011.07.004)
- the same type of extraction performed on laboratory spectra of controlled samples in public databases
- documented bandlists collected from publications or recent measurements (eg, in SSHADE - Schmitt et al 2022, https://doi.org/10.5194/epsc2022-778)

The workflow is used to analyze the observations and to compare retrieved absorption characteristics with those from bandlists or samples. This activity relies on existing VO standards and tools — an extension of EPN-TAP (https://ivoa.net/documents/EPNTAP/) to access planetary science data in the VO may be designed to support this.
The evaluation of the quality of match between observations and references will be discussed on the basis of simple test cases.

Europlanet 2024 RI has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 871149

Software, tools and standards for Solar System, heliophysics, and planetary research

TOPCAT is a desktop GUI tool for working with tabular data such as source catalogues. Among other capabilities it provides a rich set of visualisation options suitable for interactive exploration of large (many rows) and high-dimensional (many columns) datasets.

The latest release introduces a Corner Plot window which displays a grid of linked scatter-plot-like and histogram-like plots for all pair and single combinations from a supplied list of coordinates. This type of visualisation is not novel; it has been used since the 1980s under the names "Scatter Plot Matrix", "SPLOM", "Pairs Plot" and "Corner Plot" and can be generated for instance in Python, R and IDL. But its inclusion in TOPCAT benefits from the interactivity, performance, scalability, flexibility, and ease of use provided by the existing GUI, and gives TOPCAT users a new tool for exploration of high-dimensional data.

Science with data archives: challenges in multi-wavelength and time domain data analysis

TSRS-Heritage Archive: Digitisation, Standardisation, and Future Perspectives

Giovanna Jerse, Marco Molinaro

The Trieste Solar Radio System (TSRS) was a set of two multi-channel solar radio polarimeters, performing continuous surveillance of the decimetric and metric coronal radio emissions with high time resolution. It was operational in Trieste (Italy) from 1969 to 2010, collecting data in digital form since 1999. However this archive encountered challenges in terms of maintenance and support due to resource constraints, including funding and personnel shortages.
More recently, the heliospheric physics community highlighted the importance of exploiting this resource, pushing the evaluation of new data preservation strategies with the primary goal to enhance accessibility and make the TSRS Heritage Archive (TSRS-HA) more adherent to the FAIR principles.
During the TSRS-HA setup process, the entire repository of original digitised raw data will undergo a complete re-ingestion, moving from bespoke data formats and proprietary software to standardised and open-source solutions. Consequently, this upgrade will lead to significant enhancements in data exploitation capabilities, addressing the limitations that were inherent in the former system and allowing users to conduct fast searches across the entire time series stored within the relational database.
The new architecture of TSRS-HA is based on a containerized microservice solution that breaks down the large application into smaller components, each enclosed within its own container. This introduces isolation allowing to change areas of the application without affecting the whole setup, improving security, providing fault isolation. Moreover, it supports portability and scalability, facilitating migration to new servers or cloud-based solutions.
The designed architecture allows accommodating multiple FAIR-enabling standards from different communities, like TAP, EPN-TAP and HAPI. The choice of multiple standard interfaces is driven both by their ability to connect to a larger audience and by the features they enable. The IVOA Table Access Protocol is chosen for its flexibility in deploying tabular data and rich metadata, as well as its support for metadata models. One of these models is the EPN-TAP to serve predefined atomic datasets alongside full historical time series. At the same time the Heliophysics Application Programming Interface can offer a solution dedicated to time series discovery and access.
Another benefit of employing containerized microservices is the availability of applications, like Jupyter notebooks, without the hurdles of conflicting frameworks on the host server. Jupyter notebooks can be used to provide practical examples and documentation for the services, offering users a cookbook-like introduction to leveraging the resources of TSRS-HA
This contribution provides technical details of the system and discusses future perspectives and potential refinements.

Software, tools and standards for Solar System, heliophysics, and planetary research

The CCAT consortium is in the process of constructing the Fred Young Submillimeter Telescope (FYST) that is to be put on Cerro Chajnantor in the Chilean Atacama Desert at 5600m. First light is planned for 2025. CCAT is a collaborative project involving Cornell University, the University of Cologne, the University of Bonn, Max-Planck-Institut für Astrophysik, and the Canadian Atacama Telescope Consortium. In the first years of operation, FYST will be equipped with two instruments: Prime-Cam, a modular receiver with up to seven science modules, and the CCAT Heterodyne Array Instrument (CHAI), which will operate in frequency ranges of 430-510 GHz and 800-835 GHz. The CCAT collaboration gathers the expertise of its consortium to address exciting astrophysical topics ranging from the formation of stars and the evolution of the interstellar medium (ISM) in the Milky Way and nearby galaxies to galaxies in the early Universe and cosmology.

The first-light instruments CHAI and Prime-Cam will generate around 5 petabytes of raw data over the first five years. To manage this substantial volume, the University of Cologne is developing the "CCAT Data Center" hosted at the Regional Computing Center (RRZK). The data center is currently focused on implementing pipelines for data transfer, storage, reduction, and quality assessment. For data processing, we plan to leverage the new RAMSES high-performance computer of the University of Cologne. We intend to monitor the entire data life cycle, from initial project definition through to observations, data reduction, and final science-ready data products and scientific publications, all within a single database. This will enable us to measure success and make informed adjustments at various stages. A web frontend will facilitate efficient data access and insight into the state of the telescope and projects.

In this talk, I will present shortly the CCAT project and its planned scientific endeavors. However, the main focus will be on introducing the “CCAT Data Center”. I will outline architecture plans for data transfer, data management, processing, and accessibility via a web frontend. Finally, I will highlight the challenges that we face in handling the data volume, including e.g. the data transfer from Chile to Cologne or the substantial processing needs for some of the projects.

The DRAO Solar Flux Monitoring Programme: Nearly 80 Years of Staring Directly at the Sun

Dave Del Rizzo

The National Research Council of Canada (NRC) has observed, recorded, and distributed the 10.7-cm Solar radio flux (F10.7) since 1947. During this time, the daily F10.7 measurements have become one of the most widely used indices of Solar activity, and a proxy for Solar properties that are not easily measured directly. Since 1990, the Solar Flux Monitoring programme has been located at the Dominion Radio Astrophysical Observatory (DRAO) near Penticton, BC, Canada, where the legacy infrastructure, including control, data pipeline, and data distribution systems are poised for a refresh. Plans are in place to introduce daily flux measurements at 5 additional frequencies, observed with a newly acquired 4-metre antenna and receiver system. We provide a brief history, description and status of the DRAO Solar Flux Monitoring Programme, and discuss exciting future updates to the overall system in the coming year.

The Future of LOFAR Data Services

Sangeeth Kochanthara

LOFAR, the LOw Frequency Array, is a continent-scale radio telescope, based in the Netherlands and with stations across much of Europe. In operation now for more than a decade, LOFAR has an enviable track record of delivering fundamental scientific results across multitude of areas including lightning, exoplanets, solar, extragalactic, cosmic rays, ionosphere, pulsar & transients, and galaxies. ASTRON and its partners in the International LOFAR Telescope collaboration are now in the midst of upgrading the system to produce LOFAR2.0, an even more capable and flexible instrument that promises to continue this record for many years to come.

Modern radio interferometers like LOFAR are extremely data intensive: the LOFAR Long Term Archive currently has around 60 PB under management, increasing at about 10 PB per year. The new LOFAR2.0 system is projected to be more than twice as data-intensive as the current one. Furthermore, we increasingly recognize that simply archiving instrumental data is no longer adequate: to fully realize the potential of this remarkable instrument, we need to provide a range of science-ready data products together with a compelling portfolio of data discovery tools as well as systems for batch processing and interactive data analysis.

In this talk, we will describe the system that has been developed over the last decade in support of LOFAR operations, highlighting recent major progress in the capabilities of our pipelines for bulk calibration and imaging of interferometric data. We will then describe our plans for modernizing and upgrading the system to ensure that we are ready to meet the needs of the community in the LOFAR2.0 era and beyond.

The INAF radio data archive: from data publication to interoperability of time-domain data

Vincenzo Galluzzi, Marco Molinaro, Alessandra Zanichelli, Marta Burgay

The Italian National Institute for Astrophysics (INAF) manages three single dish radio telescopes (Medicina, Noto and Sardinia Radio Telescope, SRT). The three dishes are also part of the European VLBI Network and the International VLBI Service for Geodesy & Astrometry. Also, SRT is involved in international collaborations dedicated to pulsar observation, namely the European Pulsar Timing Array and the Large European Array for Pulsars project.
The increasing importance of Science Archives and archive mining in defining the ultimate productivity of an observing facility motivated the Italian Centre for Astronomical Archives (IA2) service to develop and maintain the INAF radio data archive. Such a geographically-distributed archival facility flexibly handles different data models and formats, also supporting data discovery/access through Virtual Observatory (VO). In this contribution I will give an overview of the archival system, focusing on dealing with the increasing data rates/volumes produced by time-domain observations with state-of-the-art digital backends. I will address issues posed by the standardisation of time-domain-related data formats under the perspective of metadata completeness, necessary for archival publication. Also, I will present the INAF effort in modeling such data to enable their discoverability through VO tools and services.
Besides publishing radio data from the Italian radio telescopes, IA2 is also committed to provide access to data from international facilities and projects (such as ALMA data from ESO CalMS and Additional Representative Images for Legacy, ARI-L). I will finally mention the IA2 roadmap towards a modern Science Gateway, allowing users to produce advanced data products starting from telescope raw observations.

Science with data archives: challenges in multi-wavelength and time domain data analysis

The International Virtual Observatory Alliance in 2023

Christophe Arviset, Simon O'Toole

The International Virtual Observatory Alliance (IVOA) develops the technical standards needed for seamless discovery of and access to astronomy data worldwide, with the goal of realizing the Virtual Observatory (VO). Founded in 2002, the IVOA was an early advocate of what are now known as Findable, Accessible, Interoperable and Reusable (FAIR) principles. There are 23 member organizations, and astronomical communities from other nations have shown interest in joining the IVOA. In this paper we describe the activities of the IVOA in 2023, including the activities at the November 2022 virtual and May 2023 hybrid "interoperability meetings", engagement with the IAU, the impact of IVOA activities on the astronomy community, and its prospects for 2024 and beyond.

Science with data archives: challenges in multi-wavelength and time domain data analysis

The Online Observation Quality System Implementation for the ASTRI Mini-Array Project

Luca Castaldini

The ASTRI Mini-Array is an international project, led by the Italian National Institute for Astrophysics, whose purpose is to construct and operate an array of nine Imaging Atmospheric Cherenkov Telescopes to study gamma-ray sources at very high energy (TeV) and perform stellar intensity interferometry observations. The ASTRI Mini-Array, developed in all its hardware and software aspects, is under construction at the Teide Astronomical Observatory on Mount Teide in Tenerife (Canary Islands, Spain). The ASTRI Mini-Array will be remotely operated; such a functionality reflects in a critical work package of the entire software system to satisfy the requirements in terms of performance and security. In this context, a system that checks the outcomes of the observations to obtain prompt feedback is required.

This contribution describes the first implementation of the Online Observation Quality System (OOQS). The OOQS is part of the Supervisory Control And Data Acquisition (SCADA) software work package, that controls all the operations carried out at the observing site, like data acquisition, telescope control and monitoring, and handling alarms. The OOQS aims to execute data quality checks on the housekeeping, scientific and variance data acquired in real-time by the Cherenkov camera and Intensity Interferometry instruments and provide feedback to both SCADA and the operator about the status of the observation and to highlight abnormal conditions found by checking the results of the analysis with threshold values. This feedback is crucial to take corrective actions in the shortest time possible and maximize the outcomes of the observations. The results are stored in the Quality Archive to be visualized by the operator (during the observations) using a Human Machine Interface and for further investigations.

OOQS is developed in the context of a distributed application that exploits the Alma Common Software as a framework, which provides a distributed Component-Container model and services for components.
The prototype of the OOQS implements three main components. The first component is a Kafka consumer to manage the data stream received from the Array Data Acquisition System through Apache Kafka, which is a distributed event streaming platform. The data are encoded and decoded during the transmission using Apache Avro, a serialization data format.
The Kafka Consumer effectively handles the high data flow from the Cherenkov cameras, which can reach speeds of up to 1.15 Gb/s. The data stream is divided into batches of data written in files. The second component of the pipeline is a daemon that waits for new files and then executes a list of analyses using the Slurm workload scheduler to exploit its key features, such as parallel analyses and scalability. Finally, the results obtained by the processes are collected by the last component and stored in the Quality Archive.

The Road to Science Verification for the SKA Regional Centre Network

James Collinson

The volume and velocity of data produced by the SKA telescopes present significant challenges in making the SKA Observatory’s data products available to our scientific community, and enabling them to make new discoveries. Here we will present a high-level roadmap for the global network of SKA Regional Centres (SRCs); a global data centre network that will enable astronomers to turn SKA data into science.

As the SKA construction progresses, a phased roll out of antennas and dishes will allow for early testing of the data processing and observatory monitoring software and hardware systems. We plan to align the SRC Network (SRCNet) software development roadmap to these phases, since early science verification will provide the best way to test that the components of the SRCNet are up to the task.

The selection and development of the SRCNet software modules themselves are driven by the capabilities this global entity will require. In particular, the SRCs must be able to manage an exascale data archive in a fault-tolerant way. We wish to maintain the integrity of this archive by distributing it across the SRCs globally, whilst also being efficient with the total data footprint and transfer volume. Additionally, presenting a uniform, high quality user experience for astronomers to view and process their data, regardless of their geographic location, means we seek to build on cloud native technologies used by industry and other big science projects to fulfil these purposes.

By the start of science verification (Q1 2026) we are set to deliver our SRCNet v0.2 milestone, capable of supporting the needs of initial science verification. To achieve this, we will require a global ‘team of teams’ of software engineers and data/operations scientists to deliver on our collective vision.

Cloud infrastructures for astronomical data analysis

The Small Bodies Node's MPC Annex and MPC Database Distribution System

Andrei A. Mamoutkine

Andrei A. Mamoutkine1, James Bauer1, Matthew Payne2, Quanzhi Ye1, Taegon Hibbitts1, Patricia J. Lawton1, Peter S. Smith1, Elizabeth Warner1, Laura Tjiputra1, John W. Dailey1
1University of Maryland Department of Astronomy, College Park, MD, 2CfA | Harvard & Smithsonian, Cambridge, MA

The Minor Planet Center (MPC) is the International Astronomical Union (IAU) designated clearing-house of all astrometric and radar observations of the Solar System's small-bodies. The MPC is a functional subnode of the NASA Planetary Data System's Small Bodies Node. In anticipation of the imminent increase in reported observations from the next generation of surveys like NEO Surveyor (Mainzer et al. 2022) and LSST (Schwamb et al. 2018), the MPC has redesigned its processing pipeline, and implemented many of the anticipated improvements. The new MPC processing pipeline leverages the use of a relational SQL database as its core data reference system. Live copies (read-only) of its database are also distributed to interested institutions as a means of communicating MPC products. Taking advantage of the SBN-MPC relationship, and the SBN’s data distribution infrastructure, SBN has taken on the task of distributing live database public copies to the community. The SBN has also constituted a number of web-based products based on information from the MPC database. These products, located in the SBN's MPC Annex (https://sbnmpc.astro.umd.edu/), serve to communicate the status of the database distribution. The Annex also presents unique summaries of the information provided by the database in an easily accessible manner to the community, including through tools like MPEC Watch, and the Yearly Count summaries. Future products regarding comet discoveries and observations and a database query tool are also being developed. We will present the current and planned products within the Annex, as well as update our plans for the database distribution.

Mainzer A., Masiero J., Wright E., et al. 2022, DPS Meeting, London, Ontario, Canada.

Schwamb, M. E., Jones, R., L., Chesley, S. R. et al. 2018, arXiv:1802.01783 [astro-ph.EP]

Software, tools and standards for Solar System, heliophysics, and planetary research

For the past seven years I have been meticulously assembling a catalog of stellar properties beyond what available from any single resource. This database includes stellar properties, photometry, high-contrast imaging, spectra, and time series. I also include planetary properties from the NASA Exoplanet Archive and disk properties from circumstellardisks.org. The goal of assembling all this data into a single database is to reduce the need to query multiple catalogs when creating a stellar sample whether it be for writing a proposal or preparing to use a telescope. The Starchive currently contains over 35,000 stars, white dwarfs, and brown dwarfs. This is accompanied by 122k references, 192k photometry and flux values and 1.2 million physical properties including mass, radius, Teff, Prot, vsini and dust/planet/multiplicity properties. To facilitate accessing this database, I have written a web application with multiple functions. Users can do a filtered search, an object or list search, a radius search, and a reference search. If searching on one star, the user is sent to a page with the stellar parameters, finder charts, an Aladdin image, an airmass chart, an SED, and a multiplicity tree if applicable. If the search results in multiple stars, users are given a dynamic, downloadable table and a suite of plotting tools. While at an R1 public research university I have had to assemble the data and write the web app primarily on my own and hence it has taken quite a while to put data together for just these stars. During my talk, I’ll review the content and functionality of the Starchive and address the need for better funding support for these types of projects which may not have a single science case in mind. I will also address the need to make astronomical data truly more accessible when claiming that it is “open access”.

Science with data archives: challenges in multi-wavelength and time domain data analysis

The Supervisory Control and Data Acquisition Software System for the ASTRI Mini-Array Project

Andrea Bulgarelli

The ASTRI Mini-Array is an international collaboration led by the Italian National Institute for Astrophysics (INAF) and devoted to imaging atmospheric Cherenkov light for very-high γ-ray astrophysics, and stellar Hambury-Brown intensity interferometry. The project is deploying an array of nine Imaging Atmospheric Cherenkov Telescopes of a 4-m class at the Teide Observatory in Tenerife, in the Canary Islands. These telescopes are sensitive to γ-ray radiation with energies in the range from a few hundreds of GeV to 100 TeV and beyond. The Array will be managed remotely by the Operator from control rooms at different Array Operation Centers (AOCs) located remotely at the Instituto de Astrofisica de Canarias facilities in La Laguna (Tenerife), at different remote INAF locations in Italy and one at the Teide site to be used during the installation and commissioning phases. To allow this remote control of the Array, a Data Center is being built on-site, and the ASTRI software team, an international collaboration that includes INAF members, the AC3E of the Universidad Técnica Federico Santa María and the University of Geneve, is developing the Supervisory Control and Data Acquisition (SCADA) software systems to perform all the actions required to operate the telescopes remotely.
This contribution describes the overall software architecture and engineering approach to develop the SCADA system.
SCADA controls and monitors the array of telescopes, the observing site systems, the site infrastructure, and the safety and security systems installed at the Array Observing Site, acquire the data from the telescope and perform a quick-look of them.
Thanks to the high-speed networking connection to the Teide site, an Operator supervises SCADA through an Operator Human Machine Interface (Operator HMI) web interface. However, the system performs the operations automatically. SCADA allows remote access, monitoring, and control of the on-site systems, including automated reactions to critical conditions. The Archive System is synchronised between the on-site at Teide and the permanent off-site archiving at the Rome Data Center. It provides a central repository for all persistent information of the ASTRI Mini-Array, such as observing projects, observation plans, raw and reduced scientific data, monitoring data, system configuration data, and logs of all operations and schedules. It provides the main inputs and stores the main outputs of the SCADA system.
We deployed the first releases of the SCADA system in the on-site data centre, which is updated with incremental releases. Frequent releases ensure that each increment of the software can be tested and integrated with the available on-site hardware, allowing for early identification of issues. Current planning foresees a minor software release every two months and a major one every eight months, including an Operational Readiness Review for major releases. Now, it is possible to r

The continuing evolution of the Data Central web service

James Tocknell

Since 2017, Data Central has evolved from a single monolithic application to a science platform. A significant part of this evolution has been due to large structural changes in the core web service known internally as "dcapi", written in django, with djangorestframework and react being key supporting components. These changes include the introduction of private data releases (implying a significant change in access control and how it is tracked internally, including the merging of databases), the move from hadoop to postgresql, and the splitting out of non-survey-based components (including the telescope archive, user registration and management, and user support). In this talk, we will discuss the technical reasons behind these changes, and reflect on how choices at the start of the project still inform the choices we make now about future directions of this service.

The current state of Julia software libraries for astronomy

Paul Barrett

Julia is a programming language designed for high performance scientific computing. This poster presents the current state of Julia software libraries for astronomy, i.e., AstroJulia libraries. It lists those basic functional libraries that are available today and those that are expected soon, and compares and contrasts those libraries with similar AstroPy libraries. It shows that AstroJulia functionality is nearly on par with that of AstroPy, while providing much higher performance.

To monorepo or not to monorepo: a multi-lingual, telescope-agnostic steering control system

Nicholas Bruce

There has been a shift in the software industry from microservices to monorepos as the associated complexity of maintaining distributed code is realized. While the primary defense of microservices is scaleability, many large software companies like Facebook, StackOverflow and GitHub depend on monolithic architectures and evidently scale "just fine". As well, in the context of telescope control systems, scaleability is simply not a concern.

At the Dominion Radio Astrophysical Observatory (DRAO), we support 5 single-dish radio telescopes and one 7-dish radio interferometer. Each has it's own hardware interface for low-level motor control and previously depended on a series of network calls to communicate monitor and control tasks to progressively higher-level software layers such as pointing corrections, coordinate transforms, and complex steering paths.

In this poster we discuss the collapse of this networked stack of microservices into a strictly-typed monorepo which supports multiple programming languages (Python/C/C++), multiple hardware interfaces, unified interfaces for all layers above the hardware interfaces. We also discuss how this improved the developer experience and simplified the testing and deployment of a telescope control system.

Towards Machine-Interpretable Coordinate Transform Metadata in Heliophysics

Robert S Weigel

We report on the preliminary developments and recommendations from an ad-hoc working group for coordinate transforms in Heliophysics. The working group identified three needs: (1) The development of a comprehensive standard for acronyms and definitions, (2) the implementation of comprehensive software, services, and unit tests for coordinate transforms; and (3) understanding the uncertainty of transforms due to implementation choices. The current focus of the working group is to develop either a recommendation for additions to an existing standard (such as SPASE or IOVA) or to develop an independent standard if needed. Completion of these will enable developments related to the latter items. We also give examples of existing transform software and services that justify the need for items (2) and (3) and the possibility of using SPICE (Spacecraft Planet Instrument C-matrix and Events) as part of the software and services.

Software, tools and standards for Solar System, heliophysics, and planetary research

Towards using GANs in astrophysical Monte-Carlo simulations

Karel Adamek

Accurate modelling of spectra produced by the X-ray sources requires use of Monte-Carlo simulations. These simulations need to evaluate physical processes, such as those occurring in accretion processes around compact objects by sampling a number of different probability distributions. This is computationally time-consuming and could be sped up if replaced by neural networks. We demonstrate, on an example of the Maxwell-Juttner distribution that describes the speed of relativistic electrons, that the generative adversarial network (GAN) is capable of statistically replicating the distribution. The average value of the Kolmogorov-Smirnov test is 0.5 for samples generated by the neural network, showing that the generated distribution cannot be distinguished from the true distribution.

Using Open Science Studio platform to study structural relationships of remote galaxies from the CANDELS catalogs

Guillermo Valé Arteaga, Ramon E. Ramirez-Linan

Using an Open Source Platform that Navteca is deploying for NASA Astrophysics, I was able to complete my degree final thesis at the Universidad complutense de Madrid, this research is an example of why NASA's initiative to Open Science is so important and can have a global impact on science students and researchers.

This project aims to be an exploratory investigation of the structural relationships of remote galaxies at redshift values ranging between 1.5 and 2.5 using the CANDELS and 3D-HST catalogs. To do so, data analysis tools with Python will be employed. Various physical properties of the galaxies will be studied, such as stellar mass, star formation rate and luminosity. In addition, visualization and statistical analysis techniques will be used to identify the most important relationships between these properties and thus gain a better understanding of the formation and evolution of remote galaxies. Data analysis will allow establishing relationships between the studied properties, which will contribute to the understanding of the formation and evolution of galaxies.

Furthermore, the use of machine learning techniques will be implemented to predict a particular physical property as a function other parameters. We demonstrate the feasibility of this approach by training models using a large dataset of galactic physical properties and validating the accuracy of the predicted property against experimental data. Our results could show that machine learning models might provide a promising avenue for predicting desired galactic properties with high accuracy and efficiency, potentially enabling new insights and applications in extragalactic astrophysics and beyond.

Cloud infrastructures for astronomical data analysis

Using a convolutional neural network with all sky infrared images to classify sky regions as clear or cloudy

Brock Taylor

Used a Convolutional Neural Network to detect clouds on Mauna Kea using Canada France Hawaii Telescope’s (CFHT) All Sky Infrared and Visible Analyzer (ASIVA). Two models were constructed: a full-sky image classifier and a heatmap generator based on different size pixel kernels. The full-sky classifier was able to determine clear skies with 100% accuracy (0% false positive rate) and cloudy skies with 96% accuracy (4% false negative rate). The heatmap
generator model used a machine learning network on a small kernel
which it passed over an input image to determine the likelihood of cloud
coverage at each location. Data cleaning was required to yield significant results due to dynamic range limitations of the sensor causing significant differences between clear and cloudy images. Different batch sizes were compared to test model convergence, ROC performance, and overall effectiveness. Smaller batch sizes were found to be more effective with a batch size of 32 yielding an AUC of 0.987. Cloud coverage percentage was determined by comparing each kernel’s prediction value against a determined threshold constant and dividing the number of kernels classified as cloudy by the total number of kernels. Overall, the heatmap model was found to provide significantly more data than the current system on cloud coverage over Mauna Kea. Cloud coverage percentage is a metric not currently available for continuous image acquisitions over Mauna Kea. Additionally, the heatmap approach provides data on cloud coverage in specific sky regions, allowing for much more accurate observations of cloud activity in regions of interest.

Using unsupervised learning for explorative discovery in astrophysical simulations

Kai Polsterer

Simulations are the best, and often the only, approximation to experimental laboratories in Astrophysics. However, the complexity and richness of their outputs severely limits the interpretability of their predictions. We describe a new conceptual approach to obtaining useful scientific insights from a broad range of astrophysical simulations. These methods can be applied to state-of-the-art simulations and will be essential to automate the data exploration and analysis of the next-generation exascale simulations and the extreme data challenges they will present. Our concept is based on applying the latest advances in unsupervised deep learning algorithms to efficiently represent the multidimensional datasets produced by Astrophysics simulations and to learn compact but accurate representations of the data in a low-dimensional manifold that naturally describes the data in an optimal feature space. The data can seemingly be projected onto this latent space for interactive inspection, visual interpretation, and quantitative analysis, including the option of deriving symbolic expressions to build interpretable models. We present a working prototype of the pipeline using an autoencoder trained on galaxy images from SDSS (or equivalently simulated galaxies) as well as the Illustris simulations, to produce a natural ‘Hubble tuning fork’ similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in AladinLite.

VESPA is an integrated system connecting many data services related to Planetary Science and solar Physics. The VESPA portal allows the user to query these services simultaneously, to identify data of interest from science-oriented parameters, and to plot and analyze data on-line using standard techniques.

The VESPA portal (https://vespa.obspm.fr/) is a dedicated client conceived as a discovery tool: by default, it builds a query from text fields and sends it to all available EPN-TAP services. It uses the quaero name resolver for disambiguation and completion of Solar System target names. A local registry is used to maintain a selection of services which have been reviewed by the VESPA/EPN-TAP team, but it can also access any on-line service, given its URL.

The VESPA portal can display the answer of individual services (the rows of the metadata table responding to the query), or send it to TOPCAT which has the ability to cross-correlate tables from various services. When global queries are sent to all services, the portal also gathers together results from all responding services in a single metadata table, again for further use in TOPCAT or VO tools.

An ElasticSearch interface is also being developed in the portal to enlarge cross-service capacities. It is currently used internally to check the content of all services, public and in development, but is expected to become a standard access mode when several hundred services are published (Le Sidaner et al., 2023).

Finally, data products of interest selected in EPNCore tables can be forwarded to adequate VO tools for analysis and visualization, as identified by the dataproduct_type parameter. Functions dedicated to Solar System data have been included in TOPCAT, Aladin, CASSIS, AMDA, and 3DView in the past years, e.g., to support radiance or reflectance spectra, planetary surfaces, atmospheres and magnetospheres, etc. TOPCAT now has the ability to include full spectra directly in the metadata table together with their footprints, and 60+ planetary HiPS (multiresolution maps) are available from Aladin. More specific or higher-level processing can be installed on workflows platforms or Jupyter notebooks. Examples and tutorials are available here: https://github.com/epn-vespa/tutorials .

The work has been funded by the EuroplanetH2024 Research Infrastructure (RI) European project which has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 871149.

Software, tools and standards for Solar System, heliophysics, and planetary research

VLBI data processing in CASA

Mark Kettenis

Over the past years Very Long Baseline Interferometry (VLBI) data
processing in CASA has become a reality. It is now a supported
alternative to AIPS for continuum observations with the European VLBI
Network (EVN) and has established itself as one of the two official
calibration pipelines for the Event Horizon Telescope (EHT). It can
also be used for (EVN) spectral line observations, many Very Long
Baseline Array (VLBA) observations and observations by other (long
baseline) radio arrays such as LOFAR-VLBI, e-MERLIN and GMVA. A
comprehensive CASA-based VLBI pipeline that supports many of these
arrays is available in the form of rPICARD.

This presentation will give an overview of VLBI functionality in CASA,
as well as current and future developments that are needed to make
sure data processing software keeps track with other technical and
scientific developments in VLBI and future developments like the
ngVLA. It will provide some interesting examples on how basing these
developments on a packages written in "modern" languages like C++ and
Python eases improvements of existing algorithms and implementation of
new algorithms.

ViaLactea Knowledge Base: status and perspectives

Marco Molinaro

The ViaLactea Knowledge Base (VLKB) was set up as the main database for the Vialactea project, dealing with galaxy astrophysics of the Milky Way. The Vialactea project started at the end of 2013 and the VLKB, as a set of data resources and services customised to the benefit of the project, was ready and used by the end of 2015. The custom interfaces were defined keeping in mind the discovery and access scenario that is continuously developed in the Virtual Observatory (VO) ecosystem.

Interoperability was slowly brought inside the VLKB afterwards, depending on the limited resources available after the end of the Vialactea project. Nonetheless, the VLKB resources continued to be used in galactic astrophysics projects, and as a comprehensive resource of data and services in demonstrator projects. This helped the full system to be kept alive and updated (even if occasionally rather than continuously).

Currently, among the standards that are in use within the VLKB, an ObsCore table keeps the metadata for the observational datasets catalogue, a TAP service exposes the general underlying metadata content for all its data resources (catalogues, images, radial velocity cubes and morphologically complex objects, …), a custom implementation of the SODA standard is set up to enable dataset cutouts, and UWS is used to manage asynchronous cutout and merge requests. Furthermore, OAuth/OIDC AAI solutions have been tested on top of the cutout access service, and a multi-cutout solution has been presented at an IVOA meeting as a feedback to DataLink evolution. Other features, like management of complex morphology (tessellation, cross match, …), of simulated data, proper registration of the VLKB resources in a VO Registry, and more, are still missing or incomplete.

A dedicated client application to consume the VLKB, the ViaLactea Visual Analytics (VLVA), exists; the usage of standards is meant both to let the client be more general and easier to maintain and to enable generic client applications to connect to the VLKB resources.

For a smoother user experience and maintenance of the full VLKB system (e.g. to the benefit of the ECOGAL project and the contribution to the SKA-RC activities), the resources and services will however need to continue to mature and improve.

This contribution reports the status of the actual technologies and standards in use in the VLKB, and the future perspectives for the VLKB resources and services.

VisIVO Visual Analytics: An Interactive Visualization Tool for Astrophysical Data Analysis

Giuseppe Tudisco

Scientific visualization serves as a critical bridge between complex data sets and meaningful insights, enabling researchers to gain a deeper understanding of intricate astrophysical phenomena. The primary goal of scientific visualization techniques is to provide visual images, 3D reconstructions and animations that assist scientists in identifying properties and correlations within large and complex data. Advanced visualization tools can also be used to learn in great depth aspects that are otherwise difficult to explore, such as the process of star and galaxy formation and the evolution of large-scale structures in the Universe. We present the latest developments and results of the VisIVO Framework and showcase its Visual Analytics environment. This environment, whose developments are driven by the 2020 ERC-Synergy funded project ECOGAL and SKA Regional Centres communities, enables the integration of different types of visualization to analyze the correlation between heterogeneous data to study the star formation process of our Galaxy. The software provides the ability to visualize and compare multi-wavelength 2D images with 3D data cubes, visualize compact and extended sources obtained through dedicated Table Access Protocol (TAP) and source extractor services, and generate moment maps. An overview of the software's main functionalities is presented, with some technical details outlining its visualization pipeline and its developments towards a client-server based implementation to enable on-site visualization.

Software, tools and standards for Solar System, heliophysics, and planetary research

Welcome information about ADASS 2023.

ADASS provides a forum for scientists and programmers concerned with algorithms, software, and software systems employed in the acquisition, reduction, analysis, and dissemination of astronomical and planetary science data. An important element of the program is to foster communication between developers and users with a range of expertise in the production and use of software and systems. The program consists of invited talks, contributed oral and poster papers, tutorials, user group meetings, and special interest group meetings (collectively “Birds of a Feather” meetings). ADASS is known for its many fruitful community discussions during coffee breaks and after hours.

What’s new on ESA Datalabs?

Jan Reerink

In this talk we would like to update you on the latest developments of ESA Datalabs, the science platform supporting the astronomy, planetary and heliophysics missions of the European Space Agency: the idea of bringing the code to the data is catching on with more and more of ESA’s science missions while the functional extension of the platform is progressing: Datalabs allows users to easily access data from the ESA Science Data Centre mission archives, run analysis software, prepare custom pipeline for batch-processing and help users in creating custom applications. More functionality is planned based on the feedback of scientists and other mission stakeholders.
Aside from news related to updated functionality, increased reliability and more hardware we aim to discuss challenges and options related to scaling a platform, relate the early feedback we received from users and discuss the role of platforms in the digital strategy for ESA´s space science missions.

Cloud infrastructures for astronomical data analysis

Why does every observatory, survey, project, and PI have to build their own incompatible archive from scratch? And can something be done about it?

Benjamin Weiner

Astronomers generally pride themselves on a tradition of open data access, with many large surveys and many observatories providing archives of data and catalogs, each having a web presentation often with relatively sophisticated search functions. However, archives are expensive to build and maintain. Archives frequently have interfaces customized to their datasets, are rarely co-located, and don't have consistent APIs. As a result, doing cross-archive searches is very limited. Your options are typically to use a data aggregator like the NED or HyperLeda databases, to do a cone search of multiple archives with an interface like MAST or astroquery and match up the results yourself, or to download some catalogs and do the matching at home. As datasets get larger, this is unsustainable. Meanwhile, smaller observatories and individual projects often don't have public-facing archives or searchable datasets. This is frequently not because people want to keep the data private, but because they don't have time, funding, or skills to make and maintain an archive interface. The average astronomer (including me) releases data by putting a catalog file and a tarball of FITS files on a webpage. This is well intentioned but un-searchable. Although the field has invested substantial resources in community software to process data tables and query archives, there are few examples and no infrastructure to help projects create archives. Can we develop frameworks to allow archive interoperability? Can we build simple archives to make data release easier? I will discuss baby steps and possible progress and obstacles to these goals.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Writing Software Which Will Continue to Work

Jessica Mink

After having some experience keeping two widely-used data access and analysis software packages in use for several decades each, I have found several ways to make packages portable, user-installable, and easily-repairable. These are not the only way to do this, but with software involving special knowledge of particular astronomical data types, more detailed expertise is required than most astronomers and astrophysicists are likely to have. In the case of files of images, spectra, and object catalogs, there are lots of formatting, mapping, and translation problems which can be solved with reliable software that relatively few people can write. RVSAO in IRAF SPP and WCSTools in C have been doing more and more of that since 1989 and 1994 respectively. It has come time to translate the RVSAO spectral redshift package out of IRAF, so the programming and user interface questions needed for that translation to RVTools will be discussed.

XMM-Newton Science Analysis System building evolution over the years.

Jose Marcos

Authors: Jose Marcos (Telespazio UK for ESA), Aitor Ibarra (Telespazio UK for ESA), Richard Saxton (Telespazio UK for ESA), and Anthony Marston (ESA)
Abstract:
The XMM-Newton Science Analysis System (SAS) is the application used for processing the data obtained with the scientific instruments on board XMM-Newton, an indispensable tool that has been helping scientists in the publication of nearly all refereed scientific papers published up to date. The XMM-Newton Science Operation Centre has been working to bring modern technologies and modernize SAS infrastructure to keep SAS running for more than 20 years.

We would like to present the evolution from the ad-hoc monolithic building system to a new SAS building system based on Docker and Kubernetes technologies. Containerization through Docker encapsulates SAS and its intricate dependencies (CFITSIO, Qt, Grace and WCSTools), providing documentation of the configurations and ensuring consistency across diverse operating systems. Kubernetes orchestrates this constellation of containers, automating scaling, resource management, and fault tolerance. The integration streamlines deployment, reduces configuration overhead and optimizes resource utilization.

This presentation navigates the SAS integration within Docker containers and Kubernetes. These technologies streamline the establishment of diverse environments for software development, test automation, validation, and data analysis. Altogether, this unified ecosystem harmonizes research and development, enabling a rapid, reliable, and collaborative approach to advancing X-ray astronomy.

Cloud infrastructures for astronomical data analysis

08:45

Europe's revolutionary sky surveyors: Gaia and Euclid

Jos de Bruijne

The European Space Agency (ESA) is currently operating two revolutionary sky-mapping missions: Gaia, launched in 2013, to map more than a billion stars in our Milky Way in three dimensions to study its structure, dynamics, and evolution, and Euclid, launched mid-2023, to map billions of galaxies across the Universe in three dimensions to study the growth of structure under the influence of dark energy and dark matter. The ESAC Science Data Centre (ESDC) hosts the science archives of both missions. Upcoming major milestones are Euclid’s first “quick data release” (Q1), towards the end of 2024, and Gaia’s fourth data release (DR4), not before the end of 2025. This presentation introduces both sky surveyors, their science cases, their archive systems, and their data products, which include epoch astrometry, epoch photometry, and epoch spectra in the case of Gaia, and pixel images, spectra, and catalogues in the case of Euclid. ESA is developing novel and interoperable, IVOA-compliant user interfaces to access these massive, petabyte-level data sets, including web GUIs, Python Astroquery modules, bulk-download repositories, and ESA Datalabs modules allowing users to bring their code to the data.

Science with data archives: challenges in multi-wavelength and time domain data analysis

09:00

ESASky: Unveiling the Universe through Multi-Wavelength and Time-Domain Exploration

Deborah Baines

In the evolving landscape of astronomical research, a comprehensive understanding requires more than isolated observations across the electromagnetic spectrum. Today, the study of many astronomical phenomena demands an integrated approach, where multi-wavelength analysis converges with time-domain investigation. To facilitate such a holistic exploration, astronomical archives play an important role in providing access to multi-wavelength and time-domain data, enabling astronomers to study their objects of interest effectively. However, navigating multiple archives can be time-consuming and cumbersome.

ESASky (https://sky.esa.int) is a science-driven discovery portal with the primary goal of facilitating data discovery and archival science of multi-mission, multi-wavelength and multi-messenger astronomical data. ESASky provides full access to the entire sky as observed by ESA space astronomy missions, missions from international partners such as Chandra (NASA), Suzaku (JAXA) and AKARI (JAXA), and ground-based and space-based observatories from the major astronomical data centres of the European Southern Observatory (ESO), the Canadian Astronomy Data Center (CADC), the Mikulski Archive for Space Telescopes (MAST), the High Energy Astrophysics Science Archive Research Center (HEASARC) and the Netherlands Institute for Radio Astronomy (ASTRON). Users can search, visualise and download all public high-quality data from these observatories, including science-ready images, spectra, catalogues, data cubes and time series data, as well as search for publications associated with sources and plan JWST observations. Additionally, the multi-messenger feature of ESASky provides access to gravitational wave events and probability maps on the sky from the LIGO-Virgo-KAGRA collaboration and Neutrino events from the IceCube Neutrino Observatory.

Exciting new features have been added to ESASky this year to provide users with access to even more astronomical data. These include the ability to access all tables in the VizieR Catalogue Service from the Strasbourg Astronomical Data Centre (CDS) and access to all data centres registered in the Virtual Observatory (VO) Table Access Protocol (TAP) Registry. Notably, these new data centres encompass VizieR, NASA/IPAC Infrared Science Archive (IRSA), the German Astrophysical Virtual Observatory (GAVO) Data Centre, the Javalambre-Photometric Local Universe Survey (J-PLUS) and all tables within the ESA Archives (Gaia, XMM-Newton, JWST, HST, Herschel, ISO, INTEGRAL, and Legacy archives such as Hipparcos, Cos-B and CoRoT).

In this presentation, I’ll highlight the numerous multi-wavelength features of ESASky and discuss the current and new developments aimed at incorporating time-domain data, ultimately evolving ESASky into a fully-featured multi-wavelength and time-domain exploration tool.

Science with data archives: challenges in multi-wavelength and time domain data analysis

09:30

Multi-wavelength archival research: where are the obstacles and how to tackle them?

Igor Chilingarian

Multi-wavelength analysis of archival data can lead to groundbreaking discoveries. Following our ADASS tutorial in 2015, we discovered a population of intermediate-mass black holes using archival data from SDSS, Chandra, and XMM-Newton, which became a major contribution to the field. Despite the final success, our team faced numerous quite serious issues with data formats, access, reduction and analysis tools which slowed down the project by about 1.5 years. In this presentation, I will outline the most important challenges we faced and discuss the workarounds and a path forward. Presently, in order to succeed in multi-wavelength archival research, one really needs to be an expert in observations and data reduction in every spectral domain that is used. The main obstacle is that some major space missions and most ground-based observatories do not provide science ready data in the archives or provide them only for a small fraction of all observations. Data reduction efforts require very high time and manpower investments. Even when reduced data are available, they often do not conform to any standards (from FITS WCS representation to IVOA standards for metadata). A potential path forward is to organize a push from the funding agencies towards data providers to deliver science ready data which would comply to the FAIR principles.

Science with data archives: challenges in multi-wavelength and time domain data analysis

09:45

Lessons Learned from a Multi-wavelength Time Domain Use Case on a Science Platform

Jessica Krick

We present a list of obstacles as well as possible solutions for doing multi-wavelength time domain science on large scales inside of science platforms. Relevant to this talk, science platforms are computing environments provided by archives near the data which allow fast, convenient data access and computing, which thereby increase inclusion and reproducibility in science. Our specific use case is to generate light curves from many available archives at many wavelengths for a sample of 500,000 quasars. In writing this use case, we have hit stumbling blocks at 1) determining the best data structures to store and work with time-domain data, 2) finding the best way for archives to serve large time-domain catalogs so that scientists can 3) access those catalogs, and 4) concerns about understanding the calibration of large multi-wavelength surveys. For each of these obstacles, we discuss our requirements as well as solutions we have developed to address those obstacles at scale.

Science with data archives: challenges in multi-wavelength and time domain data analysis

10:15

EXPLORING THE DARK SIDE OF THE UNIVERSE: THE EUCLID SCIENTIFIC ARCHIVE SYSTEM

Sara Nieto, Marcos López-Caniego

Euclid is the ESA mission to explore the dark universe in the next decade. Launched on the 1st of July this year, Euclid is orbiting around the Lagrange L2 point and will map the 3D distribution of billions of galaxies and dark matter associated with them. It will hence measure the large-scale structures of the Universe across 10 billion light years, revealing the history of its expansion and the growth of structures during the last three-quarters of its history. The Euclid Consortium (EC) is in charge of processing all the Euclid data, of which only the most scientifically valuable data will be released through the Euclid Science Archive System (ESAS) during 6 years of mission lifetime: images, various types of catalogues and spectra.

Regarding data release contents, it is planned to combine Euclid observations with ground-based images obtained from several telescopes, and a huge pixel data collection, catalogues and spectra. At the end of 2023, the first science ready data products of the Early Release Observations (EROs) shall be published in ESAS. At the same time, the first data of the EC pipeline will be made available in ESAS too but only to EC members. The first public release, Q1 is planned by the end of 2024.
In the meantime, the science archive already hosts simulated images, catalogues and spectra that were used to excercise the scientific exploitation. Thus, in order to demonstrate how to explore, visualize and analyze the first public data, within the next Focus Demo, we will show the latest functionalities of the archive and the tools available for the users, such as the ESA Euclid Astroquery and ESA Datalabs Science Platform among others.

Science with data archives: challenges in multi-wavelength and time domain data analysis

11:00

Non-negative matrix factorization approach to sky subtraction for optical spectroscopy

Fedor Kolganov

Numerous sky background subtraction techniques have been developed, since the first implementations of computer-based reduction of spectra. Kurtz and Mink (2000) for the first time discussed a PCA-based method which allowed them to subtract night sky background from multi-fiber spectroscopic observations without any additional sky observations. We hereby take this approach one step further with usage of NNMF instead of PCA and generalize it to long-slit and IFU spectra. This allows us to generate approximately 10 times as many valid eigenspectra because of non-negativity. We combine this approach with the algorithm proposed in Kelson (2003) to generate an oversampled sky model. We apply our method to “short”-slit spectra of low-mass galaxies originating from intermediate-resolution Echelle spectrographs (ESI at Keck, MagE at Magellan, X-Shooter at the VLT) when galaxies fill the entire slit and demonstrate its efficiency even when no offset sky observations were obtained.

Science with data archives: challenges in multi-wavelength and time domain data analysis

11:15

3D visualisation of radio data in scientific archive

Ixaka Labadie García

New advances in techniques for visualisation of multi-dimensional data contribute to a more efficient scientific analysis. The large amount of data that the Square Kilometre Array Observatory (SKAO) will produce and its data size will bring new challenges for current visualisation techniques in astronomy. Remote visualisation of data sets is required in order to avoid the transfer of large data sets. This is especially critical for next-generation telescopes.

In this contribution, we will present an exploratory study on how existing software and technologies for 3D visualisation can be used in scientific archives, in a way that allows the user to customise the visualisation and interact with the data, and the viability of using them for larger data sets, as the ones coming from SKA data products. The study includes general purpose software, not being limited to astronomy software. We will also present a tool to create interactive visualisations of spectral line data, making use of the technologies selected from that aforementioned study. The aim of the tool is to be complementary to other visualisation and analysis tools, while being useful for multi-dimensional Big Data from SKAO and its precursor telescopes. The 3D models are written with the X3D standard and represent chosen iso-surfaces, which are extracted as a triangular mesh using the marching cubes algorithm. Other information, like markers and images, can be added to the model in order to improve the visualisations. We have prioritised the use of web technologies to favour interactivity in scientific archive platforms. As a result, the interactive tool allows showing and hiding objects, changing the colormap, changing the scale, adding multiple cubes and other features. Furthermore, it allows its integration in an observatory archive. The tool was created targeting radio data cubes but we are studying and expanding its functionalities to allow visualisation of cubes containing data from other wavelengths. This implementation is being integrated in a Virtual Observatory platform through SODA services.

Science with data archives: challenges in multi-wavelength and time domain data analysis

11:30

Remote observations with DISCOS and the Sardinia Radio Telescope

Giuseppe Carboni

DISCOS is the control software of the three INAF (Italian National Institute for Astrophysics) single-dish radio telescopes. It has been developed starting from the ALMA Common Software framework and its core is shared between the Sardinia Radio Telescope (SRT), the Grueff Radio Telescope in Medicina and the Noto radio telescope. DISCOS controls all the telescope subsystems, it offers the user multiple observing strategies providing a combination of textual and graphical user interfaces. In order to ease operations for the aforementioned telescopes and carry on the observations in a more efficient manner, the users are provided with a means to control the telescope remotely. We hereby describe the solutions we implemented, with particular emphasis on the remote access to the Sardinia Radio Telescope, for both observation and maintenance purposes, presenting also some ideas for future improvements.

11:45

Automation of VLASS Quick Look Image Quality Assurance

Trent Seelig

In September 2017 the Very Large Array (VLA) began the first three epochs of observations for the Very Large Array Sky Survey (VLASS). Each epoch of the survey is split into two observing cycles with 6 cycles total to be completed over 7 years. During each epoch the VLA will survey ~80% of the sky with declination > -40° in full polarization between 2-4 GHz and generate 35,500 sets of products, each covering ~1 square degree of sky. To ensure the survey meets its science goals a Quality Assurance (QA) workflow was developed whereby each product was manually inspected before being released to the community. However, this manual workflow has been found to be prone to random human error and the pace depended on the efficiency of those performing the QA. We have sought to decrease the time between observation and the delivery of the image product to the community and to standardize the QA of each product by developing an automated QA workflow. In doing so we have transcribed the manual QA ruleset for Quick Look (QL) image products into a python code that employs heuristic methods and a neural network to identify image products that contain unwanted artifacts. After applying our new automated QA workflow to QL images produced during the first half of the third epoch of VLASS we present the results of our automation. We show that compared to previous observing cycles we have significantly increased the efficiency of QL image QA through our automation by decreasing delivery time to the community as well as other overhead costs of manually performing QA.

13:30

SPOT: A collaborative framework for Planetary Science Operations Planning

Sara De La Fuente, Javier Espinosa Aranda

SPOT: A collaborative framework for Planetary Science Operations Planning
Sara de la Fuente, Iñaki Ortiz de Landaluce, Javier Espinosa,
Fernando Félix-Redondo, Pablo Turrión, Sergio Ibarmia, Juan de Pablos, Ángeles Cuevas
RHEA Group
This abstract presents the Science Planning Operational Tool (SPOT), designed and developed by the RHEA Group team, located at the European Space Astronomy Centre (ESA/ESAC), responsible of the Payload Operations and Software Development Service for ESA Planetary Missions.
SPOT is a a collaborative framework to support the planning of Planetary scientific payload operations along the entire missions life, including the long cruise phases. SPOT is already being used by BepiColombo and JUICE missions and it is going to be adapted soon to other missions like the planetary defence mission HERA.
BepiColombo is an interdisciplinary ESA mission launched in October 2018 to explore Mercury in cooperation with the Japan Aerospace Exploration Agency (JAXA). A long cruise phase of 7.2 years toward the inner part of the Solar System will bring BepiColombo to Mercury, after nine planetary flybys, one to the Earth, two to Venus and six to Mercury.
JUICE -JUpiter ICy moons Explorer -is the first large-class mission, launched in April 2023 and arriving at Jupiter in 2031. It will spend at least three years making detailed observations of Jupiter and three of its largest moons, Ganymede, Callisto and Europa. A long cruise phase will bring JUICE to Jupiter and its moons, after four planetary flybys, three to the Earth and one to Venus.
SPOT provides operational capabilities to generate science observations, based on geometry conditions, define scenarios for the different phases of the mission, schedule observations according opportunity windows, and simulation of the scientific payload timelines and the main spacecraft resources used, such as power consumption or generated data volume. SPOT can also generate payload operational products, using the Command Request File format (CRF), in compliance with the European Space Operations Centre (ESOC) tools: Mission Planning System (MPS), Mission Control System (MCS) and Mission Information Database (MIB).
In addition, SPOT incorporates advanced 2D/3D visualisation tools that can display, among others, geospatial data sets or the spacecraft attitude, supporting operators and scientific users in the planning process, providing for example surface coverage data results. All the information and generated products are centralised and version-controlled, and privacy and confidentiality of the data is ensured through user authorisation and authentication processes.
The SPOT framework was originally design and developed for BepiColombo and has already supported the scientific payload teams during the Near Earth Commissioning Phase (NECP), the periodic payload check-outs and the science operations carried out during

13:45

Updating science operations planning software for a sudden loss of capacity in the HiRISE CCD array

Nicole Baugh

The High Resolution Imaging Science Experiment (HiRISE, [1][2]) on the Mars Reconnaissance Orbiter (MRO) has been acquiring high resolution images of the Martian surface since 2006. HiRISE acquires its images with an array of 14 CCDs - ten covered by broadband red filters that cover the full swath width (1.14° field of view), as well as two sets of two CCDs covered by blue-green and near-infrared filters, respectively, arranged to provide three-color coverage in the center of the swath.

In July of 2023, HiRISE experienced a sudden failure of one of the central red CCDs. This loss introduced a gap in the center of the processed image products, including the loss of half of the central color swath. The remaining color field of view is off-center, which isnot optimal for imaging many of the very small surface features that HiRISE typically targets.

HiRISE uses HiPlan, an in-house built extension of the MRO project's customized version of the Java Mission-planning and Analysis for Remote Sensing (JMARS, [3][4]), as its primary image planning software. Due to the rapid cadence of MRO science operations planning, quick updates to HiPlan after the loss of the CCD were essential for resuming operations. In response to the need to rapidly and accurately provide offset image centers for ongoing planning cycles, we have created a new application for the HiPlan suite to ingest files of partially-planned images, identify the images that require a shift in center coordinates, and apply the coordinate shift in updated files, using the JMARS implementation of the JPL NAIF SPICE toolkit [5]. We have also updated additional applications and procedures for the HiRISE science team and operations personnel to manage the offset targets, with minimum by-hand manipulations to avoid introducing errors.

Our response to the changing instrument conditions is focused on streamlining the incorporation of additional complications for the science operations engineers while maintaining scientific capability, and building in flexibility for future changes in operations procedure.

[1] McEwen, A. S., et al. (2007), J. Geophys. Res., 112, 10.1029/2005JE002605
[2] McEwen, A. S., et al. (2023), Icarus, in press.
[3] Zurich, R. W., et al. (2007), J. Geophys. Res., 112, 10.1029/2006JE002701
[4] Christensen, P. R., et al. (2009), AGU Fall Meeting 2009, Abstract IN22A-06
[5] Acton, C. H. (1996), Planetary and Space Sci., 44, 10.1016/0032-0633(95)00107-7

14:00

SKA Observatory software

Marco Bartolini

In this talk we will examine what are the main characteristics of the software supporting
observations with the SKA telescopes.
Starting from the main requirements and use cases, an outline will be provided of the main subsystems composing the SKA software, and their main design choices. This will include considerations on the technologies being adopted and also on the organisation of the development activity.

14:30

Cloud Data Processing for the Event Horizon Telescope

Chi-kwan Chan

The Event Horizon Telescope (EHT), a heterogeneous Very Long Baseline Interferometry (VLBI) array that captures horizon-scale resolution images of black holes, utilizes the Google Cloud Platform (GCP) to process its data and construct its images. Using cloud-native technologies such as Docker and Kubernetes, the collaboration was able to scale its analyses and process its data with bit-to-bit reproducibility. In this talk, I will briefly overview EHT's data pathway, describe how GCP was used by the collaboration, and address some technical challenges and their resolutions. I will also outline the future plans for the EHT computing infrastructure and data processing pipelines.

Cloud infrastructures for astronomical data analysis

14:45

Open Source Software for Processing and Using Dark Energy Spectroscopic Instrument Data

Anthony Kremin

I will briefly introduce the Dark Energy Spectroscopic Instrument (DESI) and DESI survey before focusing on the work of a small, dedicated team to develop open-source software for the reduction, analysis, and dissemination of the DESI data. The MPI- and GPU-enabled python code, with wrapped C functions in a few locations, is capable of parallelizing to multiple compute nodes. We further leverage the embarrassingly parallel nature of the data to process each set of individual targets independently, giving us the ability to scale our processing to tens of thousands of CPU cores. For each set of 5000 targets, the code simultaneously extracts the 5000 spectra from the raw data, wavelength-calibrates, sky-subtracts, and flux-calibrates them. It also estimates their inverse variances while propagating a resolution matrix encoding the wavelength-dependent non-Gaussian line spread function per fiber, for proper modeling of the spectra in an analysis. The raw data, intermediate data products, and final data products are stored in Flexible Image Transport System (FITS) files along with metadata and derived quantities stored in the headers of the Header-Data Units.

I will give details about key improvements in the software to reduce wall-clock time, discuss the performance of the software, and conclude with a discussion of the recent release of early data with accompanying code tags in DESI’s Early Data Release.

15:15

hypothesis - property-based testing for Python

James Tocknell

In this focus demo, I will give a short introduction to the hypothesis Python library (https://hypothesis.readthedocs.io/), which provides a property-based testing framework that integrates into the existing Python testing frameworks of pytest and unittest. I'll provide some examples of how I've used hypothesis in the past, and show how effective it is at finding edge and corner cases in your code.

16:00

FITS Data Displays for Observatory Operations

Max Brodheim

For the last several decades, observatories have enjoyed a convenient alignment between the data visualization needs of scientific researchers and observatory staff, largely through the development and widespread adoption of SAOImage/DS9 and IRAF. However, as the astronomy community at large orients itself towards a future of larger and larger datasets fed by ELTs and massive TDA surveys, the tools astronomers use to visualize data are increasingly oriented towards browser and notebook based UIs (e.g. jdaviz from the Space Telescope Science Institute or LSST’s Rubin Science Platform), and away from the stand-alone applications that frequently populate the screens found in our observatory control rooms. While these tools may be of great utility to researchers, they often fail to fulfill the needs of observatory operators in terms of legacy support, customizability, and their ability to be run within VNC sessions. The tools used by observatory staff (such as the Ginga FITS viewing toolkit or DS9) generally do not receive nearly as much attention or investment as the new browser-based programs.

In this BOF session, we will discuss and record what software tools and/or programs observatory developers are using for their operational needs (such as data readouts or quick-look analysis), what concerns developers have about those tools, and finally whether any coordinated action is needed to ensure that observatories have access to modern and reliable FITS data viewers in the future.

16:00

Quantifying and Mitigating Satellite Constellation Interference with SatHub

Meredith Rawls

It’s a star, it’s a galaxy, it’s a... satellite streak?! Join colleagues and leaders of the IAU Centre for the Protection of the Dark and Quiet Sky from Satellite Constellation Interference (CPS) SatHub for an engaging session about new software challenges in an era of thousands of commercial low-Earth orbit satellites. Come prepared to share and learn how the community is approaching this issue across the electromagnetic spectrum, what software tools already exist, and what the missing pieces are.

16:00

Science Platform in the multi messenger and exascale era.

Giuliano Taffoni

Science platforms (MPs) have emerged as powerful tools that integrate vast astronomical datasets, advanced computational capabilities, and collaborative tools into a unified framework. They provide astronomers with a streamlined and efficient means to access, analyse, and visualise complex data sets, enabling breakthrough discoveries and fostering interdisciplinary collaborations.
In Astronomy and Astrophysics there is growing interest towards SP driven in particular by large experiments (as LSST or SKA), and there already exists a number of different implementations of SPs, offering differing features and capabilities.

This BoF session will examine the state of the art and future perspectives involving data centers, computing centers or astrophysical projects. The BoF will address some of the key questions about SPs in astronomy.
What platforms are already available for astronomers? Are platforms interoperable? Do we need interoperable platforms and which standards should be developed? How can platforms simplify the use of complex cloud or HPC facilities?

In addition, the BoF will address a specific functionality to be supported and integrated in the SPs: high performance data visualization. This represents one of the most relevant - and challenging - services supported by the SP, since it requires the usage of remote computing resources, the access to distribute data, the support to complex workflows and to real-time user interaction. In the BoF we aim to discuss the requirements of the astronomers and the challenges to support such needs within a SP.

In conclusion, this BoF will provide an overview of SP technology for astronomy focusing in particular on big data visualization, and highlighting the efforts already done to develop science platforms able to offer a data analysis framework, and the role of IVOA in providing standards and services to design and develop interoperable platforms.

Cloud infrastructures for astronomical data analysis

17:40

Best practices in data presentation

Xiuqin Wu

Last year, NED (NASA/IPAC Extragalactic Database) team members led the effort to work with many archives to publishe the paper "Best Practices for Data Publication in the Astronomical Literature" (Chen et al. 2022, https://ui.adsabs.harvard.edu/abs/2022ApJS..260....5C/abstract) article. The paper provided many guidelines for researchers on how to publish their data accurately and make them more open and FAIR (Findable, Accessible, Interoperable and Reusable). Although the published paper was originally geared towards researchers in publishing their data, many guidelines also apply to archives in how to present the data to make them more scientifically accurate, open and FAIR. The BoF will discuss on those guidelines with examples to explain the justification behind these best practices recommendations. We will also showcase how, by following these guidelines, it will not only improve the scientific records, but also help to facilitate new modes of open science discovery. Co-authors of this paper including Mark Allen from CDS, Gus Muench AAS Data Editor, Luisa Rebull from IRSA, Raffaele D'Abrusco from Chandra, Alberto Accomazzi from ADS, etc.

Science with data archives: challenges in multi-wavelength and time domain data analysis

17:40

Software and Shared Workflows for the Planetary Defense Community

Larry Denneau

Planetary Defense is practical astronomy: the applied science of discovering, tracking, and characterizing near-Earth asteroids and comets (NEOs). The goal is to retire the risk of Potentially Hazardous Asteroids that may impact the Earth in the coming decades.

The first NEO was discovered photographically in 1898, and about a hundred more were discovered by manual photographic techniques before the first automated digital NEO discovery by LPL’s Spacewatch in 1990, just one year before the first ADASS meeting. Since then, more than 30,000 Near-Earth Objects have been discovered by dozens of surveys, with more than 3,000 NEOs per year currently.

Perhaps needless to say, this involves a lot of software.

This BoF will bring together representatives from major NEO surveys, follow-up and characterization projects, both space and ground-based, to discuss the systems engineering and some of the planetary science of the small and sometimes cantankerous bodies in the Solar System. The NEO community provides a microcosm of the larger astronomical time domain, with professional and amateur observers engaging processing and archive centers in near real-time to cooperatively discover and predict the future behavior of often uncomfortably close astronomical bodies. The software and workflows discussed in this BoF will be of interest to many other parts of the ADASS community.

Software, tools and standards for Solar System, heliophysics, and planetary research

17:40

The Future of FITS and Other Standardized Astronomical Data Formats.

Rob Seaman, Jessica Mink

The FITS data standard has served astronomers well for four decades.
The original integer image format has been revised to support additional
pixel data types, to support world coordinates and other scientific
metadata, to include an integrated data compression framework, and to
support generalized binary tables, among other features.

Over the years, a variety of alternative scientific data standards have been
proposed. These usually reach only a limited audience specific to a particular
project or community. No other format has ever garnered the widespread
support of FITS.

We'll hear from several groups who are generating data and how they have
been using FITS standards and extending or creating standards for newer
projects. Are people talking across projects about new standards, even partial
ones? Have people published details of their standard formats? Where?

08:30

UX for Docs: Documentation Engineering at Rubin

Jonathan Sick

At Rubin Data Management we set out early in Construction to create a healthy documentation culture. In order to provide a good UX for documentation contributors, we developed a documentation infrastructure (now also used by the NASA SphereX project) that values low-friction documentation creation and guaranteed-accurate documentation techniques. By using development tools such as Github Actions, Slack bots and Jupyter Notebooks, we are battling successfully the traditional view of documentation as the chore of last resort for both writers and readers.

08:45

NASA SMD Information Policy: Let's All Be FAIR

Demitri Muna

The NASA Science Mission Directorate (SMD) Information Policy (whimsically named "SPD-41a") prescribes that NASA data be findable, accessible, interoperable, and reusable (FAIR). Despite the catchy acronym, these aims are not new: the scientific community has in principle always strived for data to meet these criteria. Of course, the devil is in the details. There is no "FAIR" specification or unit test to pass, so what "findable" or "accessible" might mean will vary between user groups, communities from different backgrounds, different archives, and even different datasets within an archive. (Certainly no one wants to think the data they make available are not easy to find.) This talk will explore these ideas in more detail, discuss NASA’s efforts in implementing FAIR for SMD data, and an overview of how FAIR astronomical data actually are.

09:00

User Experience and its role in astronomy

Joe Masters

In this talk I will explore the fundamental principles of user experience (UX), emphasize its importance in astronomy, and share some techniques we can use to incorporate UX-centered design into our workflows.

Drawing a connection to previous trends of increased awareness and discipline surrounding version control, testing and documentation, I will explore how the rise of UX practices align with the broader goals of improving scientific workflows.

Additionally, I will present practical techniques and strategies that can be implemented immediately to integrate UX considerations into our software development processes. By embracing UX principles, we can create more user-friendly, inclusive and intuitive tools that empower users and raise the scientific output of the community.

09:30

Firefly: Using VO Protocols to Build Dynamic UIs

Trey Roby

The open-source Firefly toolkit for astronomical data exploration, display, and analysis has extensive web-based visualization capabilities. Firefly provides highly interactive, linked visualizations for image and catalog data, supporting FITS and HiPS images, tables, and scientific charts. These components include many UI features that assist archive users to better understand their data.

Firefly has been used to construct archive interfaces for many missions, each with its own unique display requirements to help scientists get the most out of their data. Supporting these has been difficult and often requires specialized code for each mission despite superficial similarities in the requirements. This takes development effort away from creating new common, reusable capabilities, and it has also made it difficult to deliver new features back to the interfaces for legacy missions.

We have embarked on a long-term effort toward building UIs in which mission-specific features are produced by common code reacting to mission-specific metadata by the data services. This has been facilitated by a simultaneous increasing use of IVOA standards and services internally to connect the Firefly applications to the back-end mission data. These efforts have begun to bear substantial fruit.

Using IVOA standards to describe the data, we are able to build query interfaces and display search results in ways that make the most sense for that particular data set. Firefly uses the TAP, ObsCore, UWS, VOTable, and DataLink standards extensively to support this. We’ve found DataLink’s “service descriptors” particularly useful in enabling the creation of metadata-driven UIs. Unfortunately, these standards don’t always go far enough to describe a UI to the level of detail that we need. Consequently, we have worked together with our back-end team partners, at IPAC and in the Rubin Observatory, to find ways to extend the standards in a backwardly compatible way.

This talk demonstrates these new capabilities and how we are using VO protocols and Firefly’s extensive data visualization capabilities to create dynamic UIs and search results displays. It will exemplify the powerful experiences that will be available to users of the IPAC and Rubin archives.

IPAC Firefly was created in IRSA, the NASA/IPAC Infrared Science Archive (http://irsa.ipac.caltech.edu) and its development is extensively supported by NASA and by the NSF, through the Vera C. Rubin Observatory. Firefly is the core of applications serving many project archives including Spitzer, WISE, ZTF, SOFIA and others. It is also used in IRSA’s general Finder Chart and IRSA Viewer applications. Firefly underpins the Portal Aspect of the Rubin Science Platform as well as being used, via its Python API, for visualizations in notebooks. The NED and NASA Exoplanet Archive use the Firefly JavaScript API inside their web applications.

GitHub: https://github.com/Caltech-IPAC/firefly

09:45

Revealing the Unknown Unknowns: Citizen Science as a Tool for Exploring Large Data Sets

Carson Fuls

We report results and key lessons from the development and implementation of “The Daily Minor Planet”, a citizen science project developed by the Catalina Sky Survey using nightly data from our G96 survey telescope and hosted by Zooniverse. The project asks volunteers to review candidate detections of asteroids and distinguish between real and false detections. Key lessons to be covered include what results have revealed about our normal moving object detection pipeline, managing large user bases for optimal results, and statistical methods for analyzing user’s responses and managing discovery bias in users. “The Daily Minor Planet” is live and can be found at: https://www.zooniverse.org/projects/fulsdavid/the-daily-minor-planet

10:15

EXPLORE science platform and scientific data applications for space sciences

Nick Cox

This focus demo session presents the science platform and the scientific data applications for space sciences delivered by the EXPLORE project (Horizon 2020 EU project - Nov 2020 - Dec 2023).

EXPLORE platform

The EXPLORE platform (https://explore-platform.eu) is a cloud-based science platform allowing users to remotely run different science applications in their web browser. App developers can easily deploy their own (dockerized) apps using a simple on-boarding procedure. EXPLORE occupies a niche in the broader landscape of open science platforms and the European Open Science Cloud initiative.

Personal workspaces allow users to persist and share data between applications. Users can also give other users access to specific files/folders. The platform provides developers a streamlined process to deploy, test, and share their (dockerized) applications. Developers can link their application to shared datasets uploaded to the platform, define meta-data, set environment variables, and assign minimum resource requirements.

The EXPLORE platform also has a planetary space science browser which allows to easily find data for a selected number of planetary missions. These data can then be previewed and saved to the user’s workspace for further analysis (for example, with one of the available apps on the platform).

At this time, the platform hosts the applications from the EXPLORE project as well as third-party apps provided by the Europlanet RI-2024.

In the first part of this demo session we will show how to create (deploy) a new app on the platform.

EXPLORE scientific data applications (apps)

The EXPLORE science applications are demonstrators and blue-prints (open-source licence) for containerised scientific web applications that can be deployed locally or remotely, on different science platforms. Each SDA targets specific science use cases for different astronomy or planetary science communities (e.g., stellar photometry, interstellar medium, stellar spectroscopy, galactic archaeology, lunar exploration) and are deployed also on external platforms such as ESA Datalabs and Rosetta science platform. The goal is to supply methodologies, tools, and inspiration for others to create their own web apps and services!

In the second part of this demo session we will highlight one or two of the EXPLORE apps (the ones that get most votes from the participants). We will also give insights into their basic structure and the different frameworks used for their realisation.

Acknowledgement: This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004214.

Cloud infrastructures for astronomical data analysis

11:00

"You might also like these images": unsupervised affine-transformation-independent representation learning for the ALMA Science Archive

Felix Stoehr

With the exponential growth of the amount of astronomical data with time, finding the needles in the haystack is getting increasingly more difficult. Traditionally, archives have described their observations with metadata and made those searchable through web interfaces as well as programmatically. The next frontier for science archives is to also allow searches on the content of the observations themselves. As a step into this direction, we have implemented a prototype of a recommender system for the ALMA Science Archive. We use self-supervised affine-transformation-independent representation learning of source morphologies for the similarity estimation through contrastive learning with a deep neuronal network. Once the neuronal network is trained, the feature vectors for all images - both for continuum images and for peak-flux images of datacubes - are evaluated. In a next step, we compute the similarity matrix holding for each image the corresponding 1000 most similar images, ordered by their pairwise similarity. A kd-tree is used to speed up that computation from O(N^2) to O(N log(N)). Our prototype interface then shows the most-similar images of which the archival researcher can select the most interesting ones. When they do select an image on the interface, we use a scoring algorithm to instantaneously compute the combined similarity of the all already selected images and reorder the displayed remaining images accordingly. Each selection thus further refines the similarity display. Finally, we use k-means clustering on the feature vectors of the displayed images to provide selectable 'source morphology categories' for a quick-select option. We conclude from the prototype that an image similarity interface can be a valuable asset to science archives and we are looking forward to discussing this work and related ideas with the ADASS community.

11:15

Detection and classification of radio sources with deep learning

Simone Riggi

New software developments in data post-processing are being made within the SKA precursor communities to enable extraction of science information from radio images in a mostly automated way. Many of them exploit HPC processing paradigms and machine learning (ML) methodologies for various tasks, such as source detection, object or morphology classification, or anomaly detection.
In this context, we are developing several ML-based tools to support the scientific analysis conducted within the ASKAP EMU and MeerKAT surveys. One tool employs deep neural networks to detect compact and extended radio sources and imaging artifacts from radio continuum images. Another tool uses different ML techniques to classify compact sources into different classes (galaxy, QSO, star, pulsar, HII, PN, YSO) using radio and infrared multi-band images. Furthermore, we have developed self-supervised models for radio data representation learning, and generative models to produce synthetic radio image data for data challenges or model performance boosting.
These tools have been trained and tested on different radio survey data including the ASKAP EMU survey. An overview of the results achieved will be presented at the workshop, along with details on the ongoing activities and future prospects.

11:30

Deep learning in automatic detection of radio 21 cm neutral hydrogen absorption

Xiu Liu

FLASH is the First Large Absorption Survey in HI which surveys for neutral hydrogen (HI) absorption lines in the intermediate redshifts (0.42 - 1.0), across the entire sky south of Declination +18 degrees from spectra of 100,000 bright continuum sources, using the Australian SKA Pathfinder (ASKAP) telescope. FLASHfinder is a Bayesian method based automated source finder to identify absorption candidates. However, verification of true absorption detections from the candidate list is currently performed manually and true detections are outnumbered by artifacts (false positives). We present a new deep learning (DL) based source finder to automatically distinguish true detections from artifacts from the FLASH Pilot Survey. To address the issue of limited true detections in the training dataset, we employ a 1D Deep Convolutional Generative Adversarial Network (1D DCGAN) to synthesize true detections. For spectral line classification, we employ a 1D Convolutional Variational Autoencoder (1D CVAE) which combines the power of CNNs to capture local spectral features with VAEs' capability to learn meaningful latent space representations. Our results showcase the great potential of DL in automatic detection of HI absorption more accurately and robustly which can help reduce the burden of manual verification in large all-sky surveys.

11:45

A new architecture of Convolutional Neural Networks for astronomical data

Faezeh Bidjarchian

The emergence of deep Convolutional Neural Networks (CNNs) has brought about a significant shift in the realm of computer vision.
There have been many successful applications of CNN architectures for various image-related tasks. Nevertheless, these architectures may not be the best choice for astronomical data. Despite the advancements in Deep Learning, the inner structure of neural networks and how they interpret and understand the world remains a black box.
A deeper understanding of how exactly they distinguish available objects and
patterns in order to accurately predict new data could enable us to create more efficient and effective models which we need for better processing of our data.
In this talk, we investigate the limitations of the current neural network architectures when applied to astronomical data. Our research and experience have revealed that the existing networks may not be well-suited for handling the unique characteristics of the astronomical data.
Our findings suggest that there are multiple issues that could potentially impact the performance of these networks.
Despite the investment in high-quality astronomical data in FITS format, existing neural networks are unable to fully take advantage of it due to limitations in handling JPEG-format images, resulting in the
loss of valuable data early on in the pipeline. In addition, while there are ways to use the FITS files, such as converting them to Numpy arrays, there are some problems that need to be addressed when dealing with them. In many surveys, we used many filters for astronomical data. We can use only Grayscale or JPEG format may not fully capture the complexities of astronomical data. Additionally, the presence of nan values in the data can negatively impact the performance of the network.

To eliminate such and other problems, we used Gnuastro which is a software platform for processing astronomical data. Its extensive library makes it a suitable tool for tackling the challenges encountered in working with astronomical data. We have thoroughly explored these issues and are planning to use Gnuastro to implement a new architecture for the neural networks, which we hope will mitigate the existing problems with handling astronomical data.

13:30

Developing an efficient large-scale machine learning pipeline to classify the millions of NASA TESS light curves in search for variable stars

Jeroen Audenaert

The NASA Transiting Exoplanet Survey Satellite (TESS) is observing millions of stars each month. The vast amounts of light curves that are being generated from these photometric observations contain a wealth of information for asteroseismology, binarity and rotation studies. However, before these light curves can be used for stellar structure and evolution studies, we first need to be able to identify the relevant stars in this massive data set. The TESS Data for Asteroseismology (T’DA) working group therefore created an automated open-source machine learning pipeline to classify the millions of light curves delivered by TESS according to their stellar variability types. The pipeline is highly-parallelized and has been optimized for large-scale computing infrastructures. Furthermore, it has been developed in a modular way such that new state-of-the-art classifiers in search for other variability types can easily be added. In this contribution, we will present the pipeline and the structure of the machine learning classifiers, and explore how the pipeline can be used for other space missions and large ground-based observatories.

13:45

Learning from the Machines

Nima Sedaghat

Machine learning has been widely applied to clearly defined tasks in astronomy and astrophysics. Contrastively, in a sequence of our recent works we have gone beyond tasks and have focused on letting deep architectures ``listen'' to the real, raw, astrophysical data, letting it speak for itself.
During the talk, I will showcase two implementations of this idea on stellar spectra: The first work, called Astro-machines, demonstrates how a machine can start to make sense of raw numerical data and begin learning known astrophysical parameters from them, without being asked to do so! The second one, called Stellar Karaoke, shows how machines can provide us with novel insights into a long-standing problem, namely the removal of adversarial atmospheric effects, just by examining a large number of raw numerical vectors.

14:00

This talk is a journey exploring the cosmos through the intersection of AI and astronomy. We'll see how machine learning, specifically computer vision and parameter inference, has revolutionised the identification and analysis of celestial bodies. I'll show you how AI has enriched my own projects, from modelling clusters of galaxies to enhancing astronomical datasets, illuminating the cosmos like never before. Importantly, I'll also delve into the role of language models, such as ChatGPT, in astronomical research. By facilitating improved data interpretation and communication, these models are not only transforming how we understand the cosmos, but also how we share these discoveries. However, it’s also important to emphasise the need for careful consideration of ethics in the deployment of AI, to ensure scientific integrity and inclusivity.

14:30

Experimenting with Large Language Models and vector embeddings in NASA SciX

Sergi Blanco-Cuaresma

Open Source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users’ privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when self-reflection is used. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds.

14:45

Gemini Data Reductions - Saving Legacy Software With The Cloud

Wesley Fraser

This version of the software has been deprecated - a phrase that has struck fear into the heart of every user. Naturally, scientific endeavours move at a slower pace than does software development. Moreover, research software development efforts are historically woefully underfunded, which is especially true for code modernization projects. And so, it is inevitable that every scientist will eventually be faced with having to install legacy software on modern platforms. The successful build of iraf from scratch on a modern linux machine need not be a right of passage. The data reduction software platforms of even modern facilities such as the Gemini telescopes rely on software that has been deprecated for so long that installation on some popular modern platforms is impossible. Some technological developments however, can greatly alleviate this problem. With the advent of software containers, even the most obsolete of software can be made to work without too much hassle, and even enable new compute capabilities not previously available with legacy systems. The Gemini team is undergoing the herculean effort of converting the most used iraf reduction routines into a modern python environment, though support for some legacy instruments is not planned. In the interim, the popularity of OSX means that many Gemini users cannot use the Gemini-provided tools to reduce their data. In this presentation I will discuss my use of Docker to provide an environment which contains many generations of the Gemini software stack. With the choice of an appropriate OS, installation of all three versions of the Gemini software stack – iraf, Dragons, and Dragons 3 – was a relatively painless process. Beyond the ability to launch otherwise incompatible software, the resulting container has the notable advantage of being usable in the cloud; I will demonstrate the use of the Gemini Docker container on the Canadian Advanced Network for Astronomical Research (CANFAR), accessible by the entire Canadian astronomical community. In this presentation I will advocate for the use of compute environments that utilize software containers over software environments that are common to all users like the Rubin Science Platform. A container-based environment can provide flexibility and access to critical and common pieces of legacy software that would otherwise be impossible to maintain on common software platforms. Moreover, with a small amount of education, the burden of maintenance does not solely fall on a small underfunded development team, but can largely be shouldered by the more abundant power users, like it is on CANFAR.

Cloud infrastructures for astronomical data analysis

08:30

High throughput VLA Imaging with multiple GPUs

Felipe Madsen

The increasing data volumes from present observations with the Very Large Array (VLA) and the prospects for orders of magnitude increase with the next generation VLA (ngVLA) have motivated the development of a high performance and high throughput data processing model to enable data processing rates to be compatible with data acquisition rates. The high performance component is achieved through the GPU-enabled implementation of compute intensive operations. To further scale data processing rates, high throughput is achieved by distributing data partitions across multiple GPUs for independent processing, enabling access to computing resources at a national scale. We present the current state of the development of a high throughput image processing model for VLA data, as well as run time scaling results from our test campaign in the PATh (Partnership to Advance Throughput) facility, that provides access to multiple GPUs on supercomputing infrastructures across the USA.

GPU implementations for core astronomical libraries

08:45

The Gaia AVU–GSR Parallel Solver: CUDA solutions for linear systems solving and covariances calculation toward Exascale infrastructures

Valentina Cesare

We ported to the GPU with CUDA the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) Parallel Solver, developed for the ESA Gaia mission, by optimizing a previous OpenACC porting of the code. The code finds, with a 10-100 μas precision, the astrometric parameters of ∼10^8 sources, the attitude and instrument settings of the Gaia satellite, and the parameter γ of the PPN formalism, by solving a system of linear equations, A×x=b, with the LSQR iterative algorithm. The coefficient matrix A of the final Gaia dataset is large, with ∼10^11x(5x10^8) elements, and sparse, reaching a size of ∼10–100 TB, typical for the Big Data analysis, which requires an efficient parallelization to obtain scientific results in reasonable timescales. In the matrix size, 10^11 is the number of equations, i.e., of stellar observations, and 5x10^8 is the number of unknowns, Nunk. The speedup of the CUDA code over the original AVU–GSR solver, parallelized on the CPU with MPI+OpenMP, increases with the system size and the number of resources, reaching a maximum of 14x, >9x over the OpenACC code. This result is obtained by comparing the two codes on the CINECA cluster Marconi100, with 4 16 GB V100 GPUs per node. We verified the agreement between CUDA and OpenMP solutions for a set of production systems. The CUDA code was then put in production on Marconi100, essential for an optimal AVU–GSR pipeline and the successive Gaia Data Releases. We aim to port the production of this code on Leonardo CINECA infrastructure, expecting to obtain even higher performances, since this platform has 4x GPU memory per node compared to Marconi100.
To solve a system of linear equations, the system solution, the errors on the unknowns (variances) and the covariances can be calculated. Whereas the solution and the variances arrays have size Nunk~5x10^8, the variances-covariances matrix has a size ~Nunk^2/2, which can occupy ~1 EB. This represents a “Big Data” problem, which cannot be solved with standard methods. To cope with this difficulty, we define a novel I/O- based strategy in a two jobs-pipeline, where one job is dedicated to the files writing and the second concurrent job reads the files as they are created, iteratively computes the covariances, and deletes the files, to avoid storage issues. In this way, the covariances calculation does not significantly slowdown the AVU-GSR code for a number of covariances up to ~10^6.
These analyses represent a first step to understand the (pre-)Exascale behavior of a class of codes based on the same structure of this one.
Acknowledgments: This work is supported by the Spoke 1 “FutureHPC & BigData” of the ICSC– CN di Ricerca in HPC, Big Data and Quantum Computing–and hosting entity, funded by European Union–NextGenerationEU”. This work was also supported by ASI grant No. 2018-24-HH.0, in support of the Italian participation to the Gaia mission, and by CINI, under the project EUPEX, EC H2020 RIA, EuroHPC-02-2020 grant No. 101033975.

GPU implementations for core astronomical libraries

09:00

Streaming Signal Processing on GPUs

John Romein

In this presentation, I will discuss the latest developments in radio-astronomical signal processing on GPUs.
I will present the Tensor-Core Correlator, a GPU library that combines antenna data at unprecedented speed and energy efficiency.
The library is rapidly adopted by radio telescopes worldwide.
We currently develop similar libraries for beam forming and filtering.

As I/O is our next bottleneck, we explore new methods (DPDK and RDMA) to stream digitized antenna data directly from the network into a GPU. I will show how the GPU handles 200 Gb/s Ethernet packets at line speed.

Finally, I will show how a proper codesign of the digitizer FPGA firmware, the network, and the GPU correlator leads to a highly cost- and energy-efficient instrument design.

GPU implementations for core astronomical libraries

09:30

From LOFAR to SKA: towards a GPU-based source extractor

Hanno Spreeuw

The Amsterdam-ASTRON Radio Transients Facility And Analysis Center (AARTFAAC) is an all-sky radio telescope and transient-detection facility. It piggybacks on raw data from a limited number of antennas of the LOFAR telescope. In 2018, the AARTFAAC 2.0 program started, which couples a planned telescope upgrade with better transient-detection capabilities and new science. The PetaFLOP AARTFAAC Data-Reduction Engine (PADRE) aims to improve the AARTFAAC processing pipeline to detect transients in real time with low latency, so that the raw samples of all LOFAR antennas (which are available for only seven seconds) can be saved for further analysis, while other instruments, observing at other wavelengths, are alerted to initiate follow-up observations immediately.

The last part of the AARTFAAC pipeline is image based. Every second, for every subband, an all-sky image is produced which may contain anything between several tens up to several thousands detectable sources. The pixels constituting those sources are extracted in order to measure the properties of each source, such as peak flux density, integrated flux, position and shape parameters. These properties are inserted into a database and associated with previous measurements of the same source: a process called source association. Peak flux densities of the same source, ordered in time, form light curves which are analysed, e.g. using machine learning techniques, to find transient sources. Source extraction, measurement and association together form a subpipeline called TraP: the LOFAR Transients Pipeline.

This talk will focus on refactoring PySE, the Python Source Extractor and source measurer in TraP, in order to speed it up: from an original running time of ~20s per typical 2300² pixels image with ~2000 sources to less than a second. We will discuss the software engineering effort to turn slow, serial Python code into fast, parallel code. There are abundant options for parallellisation on the CPU, such as Ray and Dask. These tools were used to speed up the compute-intense task of deriving background characteristics through kappa, sigma clipping. Source measurements could be parallellised using Python's multiprocessing module. These and algorithmic improvements were not sufficient to reduce the total time for source extraction and source measurent to below 1s. To achieve further performance improvements the sep library (based on SExtractor) was used for kappa, sigma clipping, segmentation and connected component labeling. Source measurements were speeded up impressively using Numba's guvectorize decorator. This decorator opened up the way to perform the source measurements on the GPU, by adding the "target='cuda'" argument. In combination with replacing Numpy arrays by CuPy arrays all the naturally parallel workload can now be shifted to the GPU, which will make it a suitable source extractor for SKA, processing 4K * 4K images with tens of thousands of sources in less than 1s.

GPU implementations for core astronomical libraries

09:45

GPU implementation in radio astronomy: a “giant leap” into the SKA era

Emanuele De Rubeis

Since the last decade, radio astronomy has started a new era: the advent of the Square Kilometer Array (SKA), preceded by its pathfinders (like Low Frequency Array, LOFAR, or MeerKAT), will produce a huge amount of data that will be hard to process with a traditional approach. This means that the current state-of-the-art software for data reduction and imaging will have to be re-modelled to face such data challenge. In order to manage such an increase in data size and computational requirements, scientists need to exploit modern HPC architectures. In particular, heterogeneous systems, based on complex combinations of CPUs, accelerators, high-speed networks and composite storage devices need to be used in an efficient and effective way.

Our goal is to develop a software for radio imaging, that is currently one of the most computational demanding steps of the radio astronomy data processing, both in terms of memory request and computing time. The GPU porting is a key point that allows to make the most out of the accelerators parallel computational capabilities, minimizing the communication and data movement.

Starting from the original code presented in Gheller et al. (2023), I will present the implementation of the Fast Fourier Transform (FFT) on the GPUs adopting the distributed version of the NVIDIA optimized library (so-called cuFFTMp) to be able to allot the large datasets produced by the radio telescopes across multiple GPUs. This is a key point for the GPU development of the code, given that the size of the involved problems is so huge that cannot be handled by a single accelerator.
I will show the results in terms of speedup and scalability of this new accelerated version of the code based on a scientific case, namely real LOFAR VLBI data, and discuss the comparison with the CPU version of the FFT presented in the original code.

Overall, we would set a new way to approach not only radio astronomy, but astrophysical software all-round. This will represent the first example of radio imaging software enabled to GPUs, becoming a potential state-of-the-art work for the future SKA software suite.

GPU implementations for core astronomical libraries

10:15

Building a production ML pipeline for gravitational wave detection

Alec Gunny

Real-time gravitational wave astronomy stands to benefit substantially from the adoption of machine learning algorithms, which have demonstrated an ability to model complex signals, even in the presence of considerable noise, with minimal run-time latency and compute requirements. Moreover, many gravitational wave event morphologies and noise sources are well understood and easily simulated, acting as physical priors which can be exploited to regularize training to produce more robust models. However, adoption of production ML systems in this setting has been impeded by a lack of software tools simplifying the development of experimental and deployment pipelines that leverage these priors in a computationally efficient manner. In this demo, we’ll introduce ml4gw and hermes, two libraries for accelerating training and inference of models in the context of gravitational waves, and show how they can be combined with other infrastructure tools to build, evaluate, and deploy a competitive model for detecting binary black hole mergers in real LIGO gravitational strain data.

11:00

Multi-Spacecraft Observatory Data Analysis Techniques

Theodore Broeren

Spacecraft missions of the future, such as HelioSwarm, will consist of collections of many small spacecraft taking simultaneous in-situ measurements. An open question in the field is how to best combine these single point measurements to gain a global understanding of dynamic phenomena. We investigate how to best leverage these types of multi-point data sets to extract meaningful scientific insights, focusing on two particular kinds of analysis techniques.
For the first, we have developed a new method of reconstructing a vector field over a large volume of space using a sparse set of local measurements. This is accomplished via distance weighted averaging in a 1D + 2D framework, where we separately weigh components along the spacecraft direction of travel and the perpendicular plane.
The second method focuses on characterizing the wave-like structures that are often seen in astronomical data. As the direction and velocity of a wave changes, a given configuration of spacecraft will detect a wave with varying levels of accuracy. We have developed a method of quantifying the level of accuracy and precision that a configuration of 4-9 spacecraft will achieve in the detection of an arbitrary wave. The resulting uncertainty quantification allows us to select optimal configurations of spacecraft for characterizing waves in space plasmas.

Software, tools and standards for Solar System, heliophysics, and planetary research

11:15

Cartographic Mapping using High-Resolution Shape Models

Kris J. Becker

NASA’s OSIRIS-REx (Origins, Spectral Interpretation, Resource Identification, and Security–Regolith Explorer; OREx) spacecraft successfully accomplished its primary mission objective to retrieve a sample from the surface of the near-Earth asteroid (101955) Bennu on October 20, 2020 (Lauretta et al., 2022). The OREx team had been preparing for this event since the spacecraft’s arrival at Bennu in December of 2018 by mapping the asteroid and characterizing its geological, chemical, and physical properties. Mapping continued after sample collection to quantify surface changes resulting from contact between the spacecraft and Bennu. The spacecraft departed Bennu in May of 2021, and the sample is scheduled to return to Earth on September 24, 2023. Image processing, photogrammetric control, and mosaicking were accomplished with a modified version of the Integrated Software for Imagers and Spectrometers (ISIS) planetary cartography package, developed and maintained by the OREx team. In the control process, images acquired by the OSIRIS-REx Camera Suite were registered to and orthorectified onto tessellated shape models created by the OREx team. The shape models are global in coverage and range from 80 cm to 5 cm average ground sample distance. OREx ISIS improves the execution of critical tasks in the cartographic process such as precision ray tracing, manual and automated image measurement, accurate determination of terrain-relative photometric/observation angles and characteristics, orthorectification, and the detection of occlusions and shadows. These improvements in operational efficiency and mapping accuracy provide enhanced support not only for irregularly shaped bodies such as asteroids and comets, but for small-body mapping in general, and for bodies such as the Moon, for which high-resolution shape models and imagery may be available. Validated over more than three years of proximity operations at Bennu, ISIS enabled the generation of a global basemap and regional image mosaics at resolutions from ~6 cm/pixel to 1 mm/pixel. These products were crucial to the sample-site selection process and the sample acquisition itself. We will submit our enhancements to the public version of ISIS and release them to the scientific community in the form of a standalone shared library developed in C++ as the Planetary Shape Model Ray Tracing System (PSMRTS). Additionally, we will provide a C-like interface that provides access to many different languages such as C, Python, Rust, and others. The current version of our ISIS implementation supports three publicly available ray tracing libraries. These libraries are the NAIF DSK, the Bullet Physics SDK, and the Intel Embree ray tracing systems. This work is supported by NASA under contract NNM10AA11C issued through the New Frontiers Program. Lauretta, D.S., et al., 2022. Spacecraft sample collection and subsurface excavation of asteroid (101955) Bennu. Science. 377 (6603), 285–291. doi.org/10.1126/science.abm101

Software, tools and standards for Solar System, heliophysics, and planetary research

11:30

ESA Heliophysics archives interoperability and data access enhancements

Adrian

The ESAC Science Data Center (ESDC) develops and operates the science archives for the ESA missions. Within the ESDC, a different archive provides the scientists with access to the data of every Heliophysics mission: Ulysses, Soho, Cluster/DoubleStar, Proba-2, ISS-Solac, and Solar Orbiter.

The ESDC is taking steps towards the homogenisation of the interoperability mechanisms and the ways to access the data for the different Heliophysics archives. This abstract provides an insight on the enhancements implemented/planned for the ESA Heliophysics archives.

The first step to homogenise the interoperability is the IVOA TAP protocol. Due to the demanding requirements for data download of the Heliophysics missions, the TAP+ extension developed at the ESDC is currently used by the Soho, Proba-2, Solar Orbiter and Cluster/DoubleStar archives. On top of that, the helio-commons library developed at the ESDC allows the download of any set of products via filters on any column as for usual metadata queries. The next step in the Heliophysics archives evolution regarding IVOA is to become EPN-TAP compliant.

The Heliophysics archives data access requirements vary a lot between in-situ and remote sensing data.

For remote sensing, a product listing approach is enough to access and download the products (examples of FITS files for Solar Orbiter, Soho, Proba-2). Additional features are provided by some archives: HEK events overlay on the remote sensing images (Soho archive), or visualisation of simultaneous Carrington rotation movies (Proba-2).

However, for in-situ data (Cluster/Double Star), a concatenation mechanism is used to build on-the-fly products including the metadata and data for the desired custom timeframe. This mechanism has been implemented on top of the TAP+ extension for the Cluster archive. In addition, this approach opens the door for the next step in interoperability evolution: to make the archives compliant with the HAPI standard, with the Cluster archive being the first one at ESA to integrate a test HAPI server which will become operational in the near future.

Software, tools and standards for Solar System, heliophysics, and planetary research

11:45

The Python in Heliophysics Community: an overview and call to connect with the wider ADASS Python community

Julie Barnum

Since its creation in 2018, the Python in Heliophysics Community (PyHC) has strove to facilitate scientific discovery by promoting the use and development of sustainable open-source Python software across the solar and space physics community, improve communication and collaboration between disciplines, developers, and users, establish and maintain development standards, and foster interoperability and reproducibility. Through the community's resources, bi-annual meetings, bi-weekly telecons, PyHC summer schools, and other meeting outreach opportunities, PyHC continues to educate scientists on the importance of open source software in open science and demonstrates how the PyHC can aid scientists in Heliophysics research.
Although PyHC has an obviously heavy focus on Heliophysics research, the impact can be broadened. PyHC seeks to connect with the wider ADASS community on open source Python software tools and standards. In that vein of collaborative spirit, many of PyHC’s packages are already extensible to disciplines outside the scope of Heliophysics. PyHC packages often leverage astronomy packages such as Astropy (who were invited to hold a tutorial session at the inaugural PyHC 2022 summer school). Finally, the PyHC package standards are even modeled off of the example set by Astropy. This talk will give a brief overview of the PyHC, how PyHC can be an answer to the open science needs of today in Heliophysics and beyond, and how to get connected with the community.

Software, tools and standards for Solar System, heliophysics, and planetary research

13:30

Improving detection of small planets in upcoming transit surveys

Yash Gondhalekar

A major goal of space-based exoplanet transit survey missions (Kepler, K2, TESS, Plato, Roman) is to detect small Earth-sized planets. However, almost all of the detection pipelines use the Box-Least Squares (BLS) periodogram algorithm following the detrending of stellar light curves with common approaches such as Gaussian Processes Regression or Splines. Such detrending approaches cannot effectively remove short-memory autocorrelation, resulting in a severe degradation in BLS's sensitivity to small planets. We find that a combination of AutoRegressive Integrated Moving Average (ARIMA) modeling and the Transit Comb Filter (TCF) periodogram improves sensitivity over BLS. To show this, we simulate transiting planets in stellar light curves with two different noise models: pure Gaussian and AutoRegressive Moving Average (ARMA). Two measures are used: the periodogram signal-to-noise ratio (SNR) and False Alarm Probability (FAP) based on the generalized extreme value distribution for quantifying periodogram peak significance. We compare the relative sensitivities of the BLS and TCF periodograms by varying the number of transits in the light curve and, for each case calculating the minimum detectable depth (i.e., the lowest depth at which FAP < 0.01 or SNR > 6). Considering the goal to detect small planets reliably, we find that the combination of the ARIMA + TCF pipeline and the SNR detection metric is preferred since it yields the lowest MDD. The application of our approach to real TESS light curves with small exoplanets agrees with the simulation results. We recommend analysts replace the BLS periodogram with the ARIMA + TCF pipeline for greater sensitivity to small planets in future transiting surveys.

Software, tools and standards for Solar System, heliophysics, and planetary research

13:45

Improving the visibility and citability of exoplanet research software

Alice Allen

The Astrophysics Source Code Library (ASCL, ascl.net) is a free online registry for source codes of interest to astronomers, astrophysicists, and planetary scientists. It lists, and in some cases houses, software that has been used in research that has appeared in, or been submitted to, peer-reviewed publications. It now has over 3300 software entries and is indexed by ADS and Clarivate’s Web of Science.

In 2020, NASA created the Exoplanet Modeling and Analysis Center (EMAC, emac.gsfc.nasa.gov). Housed at the Goddard Space Flight Center, EMAC serves, in part, as a catalog and repository for exoplanet research resources. EMAC currently has 223 entries, 77% of which are for downloadable software.

This presentation will cover the collaborative work the ASCL is doing with EMAC and with NASA’s Astrophysics Data System (ADS) to increase the discoverability and citability of EMAC’s software entries and to strengthen the ASCL's ability to serve the planetary science community.

Software, tools and standards for Solar System, heliophysics, and planetary research

14:00

FAIR approach for Low Frequency Radio Astronomy

Baptiste Cecconi

The Open Science paradigm and the FAIR principles (Findable, Accessible, Interoperable, Reusable) is aiming to foster scientific return, and reinforce the trust in science production. We present how the MASER (Measuring, Analysing and Simulating Emissions in the Radio range) implements Open Science through a series of existing solutions that have been put together, only adding new pieces where needed.

The MASER service is a “science ready” and “open science” toolbox dedicated to time-domain low frequency radioastronomy, which data products mostly covers Solar and planetary observations. The principal data product in this domain is a “dynamic spectrum”, i.e., a series of consecutive spectra with the same observing configuration. The observed physical phenomena are related to plasma instabilities and energetic particles in magnetized plasma. Hence low frequency radio astronomy is a remote sensing tool for plasma diagnostics.

MASER covers four community needs:
1. Discovering data products,
2. Exploring data collections before downloading TB’s of files,
3. Annotating and then storing and sharing annotations on radio dynamic spectra,
4. Accessing data in Python.

MASER solutions are based on IVOA protocols for data discovery, on IHDEA tools for data exploration, and on a dedicated format developed by MASER for the temporal-spectral annotations. The service also proposes a data repository for sharing data collections, catalogues and associated documentation, as well as supplementary materials associated to papers. Each collection is managed through a Data Management Plan, which purpose is two-fold: supporting the provider for managing the collection content; and supporting the data centre for resource management. Each product of the repository is citable with a DOI, and the landing page contains web semantics annotations (using schema.org).

Software, tools and standards for Solar System, heliophysics, and planetary research

14:30

An Adaptive-Scale Multi-Frequency CLEAN Deconvolution in CASA for Radio Interferometric Images

Genie Hsieh

Scale sensitive solvers are widely used for accurate reconstruction of extended emission in radio astronomy. The Adaptive Scale Pixel decomposition (Asp) algorithm models the sky brightness by adaptively determining the optimal scales. It thus gives a significantly better imaging performance, but at a cost of significantly increased computational time. In this report, we described an improved Asp algorithm that can be used in both single-frequency and multifrequency mode. It achieves 3x-20x speed up in computational time comparing to the original Asp-Clean algorithm. It also outperforms the current multifrequency imaging techniques. It is combined with the scale-insensitive Hogbom CLEAN algorithm to achieve even better computational efficiency for both compact and diffuse emission.

We implemented the algorithm in CASA and applied it to data sets from EVLA and ALMA telescopes. We show that this algorithm has performed better than the wide used MS-
Clean and MS-MFS algorithms. It has also achieved imaging performance without the need for hand-tuning of scale sizes or an expensive automasking algorithm, typically used in pipeline processing (like the current ALMA imaging pipeline).

Software, tools and standards for Solar System, heliophysics, and planetary research

14:45

HelioLinC3D: Software for Discovery of Solar System Objects in LSST-scale Datasets

Mario Juric

In about two years, the Vera C. Rubin Observatory, an 8m-class ground based facility currently under construction at Cerro Pachón, Chile, will start scanning the southern sky to produce the Legacy Survey of Space and Time (LSST). Over its 10-year duration, this survey has the potential to produce on average 200 observations of each of about 4-5 million new asteroids it can discover. However, the cadence that Rubin will use -- taking pairs of observations each night -- is not suitable for moving object identification using traditional approaches which generally require three or four. Instead, it requires an algorithm able to connect three or more pairs over a window of nights, in presence of substantial amount of false detections (Kubica et al. 2007; Denneau et al. 2013).

In this talk, we will present the algorithm and a fast C++ implementation of Heliolinc3D. Based on the concept of heliocentric linking presented in Holman et al. (2018), Heliolinc3D can identify asteroids in large surveys taking only pairs of observations per night, with three such pairs over 2-4 week periods. This code has been extensively tested on simulated LSST data at realistic scale, as well as real data from ATLAS. In tests with LSST simulations, we successfully linked 98.8% of potentially discoverable main-belt asteroids and 97% of NEOs. With ATLAS data, we re-discovered numerous NEOs already found by the ATLAS team, as well as an additional hitherto unknown PHA (2022 SF289). This implementation meets requirements for Rubin's asteroid discovery system (95% completeness), but is also available to any other observatory wishing to use it (https://github.com/lsst-dm/heliolinc2).

In addition to providing the source code, we are working to develop a Heliolinc3D linking web service where observatories could upload data to be linked without the need to run the software themselves. In the longer term, this service-based approach could significantly optimize the global asteroid discovery process. In addition to tracklets, future surveys could choose to submit individual detections to such a "hub", where linking could be performed by state-of-the-art algorithms (e.g. HelioLinC3D, pumalink, or others). This could dramatically simplify the operations of individual surveys -- especially the smaller ones -- as well as open opportunities for cross-survey linking and observation coordination.

Software, tools and standards for Solar System, heliophysics, and planetary research

15:15

The Rubin Science Platform: powered by IVOA standards and contemporary software deployment methodologies

Gregory Dubois-Felsmann

We present an advance look at the capabilities and implementation technologies of the Rubin Science Platform (RSP). The RSP will be the astronomical community's primary interface to the Vera C. Rubin Observatory's data products, with full operations beginning in 2025. The RSP also plays a key role in the Observatory staff’s work, providing internal access to data and a computing environment for interacting with the Observatory control system. The RSP is being used heavily in the commissioning process. It has also been made available to members of the scientific community, serving simulated datasets, for the past two years, enabling community members to gain experience with the system and provide us feedback on how its capabilities meet their needs and how it can be improved.

The RSP provides access to data through three "Aspects": Web APIs based on IVOA standards, a graphical-user-interface Portal based on IPAC Firefly, and a Notebook interface based on JupyterLab and the Python-based Rubin software stack. The RSP, itself entirely open-source, is implemented on top of a deep stack of contemporary open-source containerized software deployment tools, Rubin-developed authentication and authorization software, and infrastructure-as-code configuration mechanisms from Rubin and from the open-source community. The deployment architecture supports numerous deployments of the RSP in a variety of configurations, on commercial cloud services, in on-premises datacenters, and in the Observatory’s summit computing systems.

We will demonstrate how catalog and image data are made available through the RSP, how a pervasive use of IVOA standards underlies powerful tools to help users find data and navigate relationships between data products, and how the three Aspects of the RSP interact with each other to enable users to use all the data access methods in flexible combinations to support their needs and their varying levels of expertise.

We will also demonstrate the deployment and management tools, showing how our git-based configuration mechanisms support the many deployments and how the open-source stack we have assembled provides easily understood Web-based tooling for the management of the platform.

16:00

DOIs at the ESDC and the new ESDC DOI TAP Service.

Maria Henar Sarmiento Carrión

For several decades, the European Space Agency (ESA) has been producing a wealth of datasets coming from the spacecrafts launched during its history. Since 2021, the ESAC Science Data Center (ESDC) is in the process of registering and publishing Digital Object Identifiers (DOIs), which ensures a permanent identifier for each dataset in the long term. To this date, more than 30,000 DOIs have been registered by ESA for datasets accessible through the ESA Science Archives managed by the ESDC in order to improve the traceability of the usage of those datasets.

The sheer size and diversity of the ESDC datasets makes creating DOIs a challenging task. This not only requires doing the best possible mapping between the data granularity used by the different missions and what can be considered "DOI-worthy", but also the development of a tool that is flexible, scalable in time and able to deal with the diversity of data formats and accessing methods implemented by each science archive. In addition to this, there is a third challenge which is how to let the scientific community know that these DOIs exist and the ways to access them.

In the past the ESA DOIs could be accessed in two main ways, via Google Dataset Search (Masson et al., 2021; https://doi.org/10.1016/j.asr.2021.01.035) and the ESAC Data Discovery Portal, located at data.esa.int, under which all ESA data holdings have been given a dedicated DOI (C.Arviset et al., PV2023). A third way was recently added via the ESA DOI TAP Service.

The ESA DOI TAP Service allows for researches and also science centers and institutions to connect their tools via the TAP protocol to this TAP service to resolve the DOIs associated to specific proposal or experiment in an easy, clean and fast and way.

During the first part of this presentation we will offer a general overview of the tool currently used by the ESDC to generate and register DOIs together with a brief description of the challenges and solution taken for its development. Next we will review the different ways to resolve the DOIs associated to each dataset stored at the ESDC focusing on how the new ESA DOI TAP service works and how can it be accessed with practical examples.

Software, tools and standards for Solar System, heliophysics, and planetary research

16:15

CATCH: Finding celestial objects with Google's Spatial Indexing Library 'S2'

Daniel Darg

We present the Comet–Asteroid Telescopic Catalog Hunter (CATCH), a data archive search tool currently deployed at The NASA Planetary Data System Small Bodies Node. The Small Bodies Node is the main data archive for NASA's near-Earth objects, such as the Catalina Sky Survey data archive (Seaman et al. 2022). These are large data sets, totaling hundreds of terabytes in volume, making them difficult for most researchers to work with. To better serve the research community, we developed the CATCH tool to search these archives for potential observations of solar system small bodies. It works by identifying the intersection between a surveyed portion of the sky at a given time, and the trajectory of an object computed from its ephemerides. A core part of CATCH's architecture is the use of Google's open-source 'S2 library' to perform spatial-indexing on these data sets using a one-dimensional, space-filling Hilbert curve. Metadata for matched data products are presented to the user, and image cutouts around the ephemeris positions are presented. We believe that this tool/technique has great potential for other survey archives.

Software, tools and standards for Solar System, heliophysics, and planetary research

16:30

Comet Statistics - A graphical representation of international comet discovery and observation statistics from the NASA PDS Small Bodies Node and the Minor Planet Center

Peter Smith, James (Gerbs) Bauer

The Minor Planet Center (MPC) is the International Astronomical Union’s recognized clearing house for all positional and categorization data for small bodies, a.k.a. comets, asteroids, and irregular satellites. Comprehensive Discovery statistics are often represented as lists and specialized data tables on the MPC’s website (https://minorplanetcenter.net). We are creating an interface to summarize the discovery and observation data available through a variety of sources. Specifically, we use the Minor Planet Center’s table containing orbital elements for all comets, the NASA JPL Small Bodies Database API, and a copy of the MPC’s live processing database hosted by the Small Bodies Node. Each of these are parsed for the relevant data with a Python script using either HTTP requests or a Python-PostgreSQL library. This data is then processed and copied to the live site files, where it is then rendered in graph form by the user’s browser by using the Plotly.js framework. The site is designed to work as it is and, beyond a web browser, should not require the user to install any additional software.

Comet Statistics contains three main sections: comet discoveries, orbital elements, and unique observations. Similarly to summary information provided in Bauer et al. (2023), each page also supports overlaying an arbitrary number of user-specified observatory codes onto each graph.

The comet discovery subpage shows statistics by year and comet type. Data is retrieved from the Minor Planet Center, along with the NASA JPL Small Bodies Database to cover potential missing data. The orbital elements section contains subpages for different comet types. Each subpage contains five graphs relating to orbital elements. Orbital element data is retrieved from the Minor Planet Center. The third section of the site, unique observations, shows unique object observations by year and by comet type.

In addition to the three graphical data sections of the site, Comet Statistics is also standing up an engine for converting between the “old” IAU designation system and the “new” IAU designation system. The purpose of this is to allow for a single reference source for comet nomenclature that will open legacy data sources to research investigations. By using this, researchers may be able to avoid confusion and refer to comets using only one designation system, even when source data may be referred to using multiple names.

References:
Bauer et al. 2023. arXiv:2210.09400 [astro-ph.EP]
https://minorplanetcenter.net
https://sbnmpc.astro.umd.edu/cometInfo/

Software, tools and standards for Solar System, heliophysics, and planetary research

16:45

MPEC Watch, a novel community reference resource based on the Minor Planet Center live data

Quanzhi Ye

MPEC Watch (https://sbnmpc.astro.umd.edu/mpecwatch/) is a utility that digests the Minor Planet Center’s publications to present statistical summaries of the reported observations of small bodies that are of high interest to the community. MPECs, or “Minor Planet Electronic Circulars” (https://minorplanetcenter.net/mpec/RecentMPECs.html), are issued in the form of emails and corresponding website postings, with their own DOIs by the Minor Planet Center for announcing discoveries of objects of interest (like Near-Earth Objects - NEOs, irregular satellites, or comets) and updates to the MPC’s database. MPEC Watch's backend component is responsible for creating and updating a SQLite database that contains metadata for all MPECs dating back to the first Circular in September 1993. The component queries the MPC website on a daily basis to add the previous day's MPECs to the database. The usage by the community of Planetary astronomers is the target audience. Observations critical to updating the orbits, follow-up, first-follow-up, and discoveries are all MPEC material, and MPEC Watch provides these in aggregate totals, as well as for individual observing sites (Figure 2). The observations by sites can be used to demonstrate the effectiveness of observing programs to the community (e.g. proposals and program reviews). We will present the suite products provided by MPEC Watch, along with the underlying architecture and future planned augmentations to the website.

This work is supported by the NASA Planetary Data System’s Small Bodies Node (NASA 80NSSC22M0024).

Software, tools and standards for Solar System, heliophysics, and planetary research

23:14

AstroDB Toolkit: A Collaborative Data Management Tool

Kelle Cruz, David Rodriguez

We present the AstroDB Toolkit, a collaborative database management tool, intended for small-to-medium sized projects to manage their living datasets. The database management relies on a review using GitHub, thus integrating itself into the existing workflows of many Astronomers. The Toolkit relies on the astrodbkit2 python package to convert back and forth between a non-relational and relational database that can be used with established packages like SQLAlchemy. This facilitates a novel way of reviewing, updating, and version-controlling the contents of medium-scale databases while still integrating with existing tools. We have also developed a web app which presents the database contents as an interactive, queryable website. The proof-of-concept application is the SIMPLE Archive of low mass stars, brown dwarfs, and exoplanets (https://simple-bd-archive.org/). In this poster, we present the architecture of the AstroDB Toolkit, the document-store model, and the GitHub workflow for reviewing and approving database modifications.

Software, tools and standards for Solar System, heliophysics, and planetary research

08:30

High performance visualization for Astronomy & Cosmology: the VisIVO’s pathway toward Exascale systems

Eva Sciacca, Nicola Tuccari, Valentina Cesare, Fabio Vitello

Petabyte-scale data volumes are generated by observations and simulations in modern astronomy and astrophysics. Storage, access, and data analysis are significantly hampered by such data volumes and are leading to the development of a new generation of software tools. The Visualization Interface for the Virtual Observatory (VisIVO) has been designed, developed and maintained by INAF since 2005 to perform multi-dimensional data analysis and knowledge discovery in multivariate astrophysical datasets. Utilizing containerization and virtualization technologies, VisIVO has already been used to exploit distributed computing infrastructures including the European Open Science Cloud (EOSC).

We intend to adapt VisIVO solutions for high performance visualization of data generated on the (pre-)Exascale systems by HPC applications in Astrophysics and Cosmology (A&C), including GADGET (GAlaxies with Dark matter and Gas) and PLUTO simulations, thanks to the collaboration within the SPACE Center of Excellence, the H2020 EUPEX Project, and the ICSC National Research Centre. In this work, we outline the evolution's course as well as the execution strategies designed to achieve the following goals: enhance the portability of the VisIVO modular applications and their resource requirements; foster reproducibility and maintainability; take advantage of a more flexible resource exploitation over heterogeneous HPC facilities; and, finally, minimize data-movement overheads and improve I/O performances.

Acknowledgements: This work is funded by the European High Performance Computing Joint Undertaking (JU) and Belgium, Czech Republic, France, Germany, Greece, Italy, Norway, and Spain under grant agreement No 101093441 and it is supported by the spoke "FutureHPC & BigData” of the ICSC – Centro Nazionale di Ricerca in High Performance Computing, Big Data and Quantum Computing – and hosting entity, funded by European Union – NextGenerationEU”

A brightening future for research software engineering

08:45

14min

Nuria Lorente

It is understood that astronomy relies on research software and data engineering. From the collection of telescope proposals, the control of telescopes and their miriad instruments, to driving the archives, simulating and processing data, research software engineering underpins almost every process in the advancement of astronomy. And yet, for a large number of projects and institutes, in planning and funding conversations, the requirements of the discipline for producing the best results have at times been an afterthought, receiving little attendion or funding. Although our more enlightened institutes have always valued software engineering the community at large is slowly coming to realise that the discipline must be supported and career paths nurtured, so that the best science can be carried out.
In this talk I will discuss some of the joy and pain of pursuing a research software engineering career within astronomy, and the problems we must tackle if we wish to continue to attract excellent creative, engineering, and scientific minds to our field. Not just attract them but retain them, in an era where flexible working conditions are no longer a perk of academia, and salary disparity between our institutions and industry is larger than ever.
I will describe the AAO's Research Data & Software section's work to provide a stable career path for its research software and data engineers, and our aims to attract a portfolio of work which both satisfies the needs of the instrumentation and data projects of the community, and the needs of our team to have a challenging, creative, and fulfilling work life.

Research software engineering as a career path

09:00

From AI to L2 and beyond: A software engineer career turned into a journey through fascinating territories, landscapes and, of course, languages.

Andrea Balestra

A talk about my career as a software engineer and how I like to see it as a journey through many different territories, landscapes and, of course, languages. Do not expect leadership or big money: I have been, and still I am, only a modest but curious traveler. My journey started more than thirty years ago in the land of AI with a thesis on an “Expert System”, as rules-based AI was called at the time, and continued through remote observing, the birth of the web, telescope control systems, detector controllers, programming standards, virtualization, space applications (L2), MBSE, software quality assurance, and still continues even if the final station is not so far away. Different landscapes under the sky of Astronomical projects: Spectral classification, AI, Electronics, System Engineering, Control Software, Quality Assurance… different organizations and different countries. Also, different languages, some of them long forgotten: from Fortran to Occam, Ksh, Bash, C, C++, Java, Python etc. Not a journey that will make it to any travel guide but maybe it can give hints to other fellow travelers when the moment of picking the next destination arrives. I then also hope this travel experience can be of interest to the people visited by travelers like me in understanding what the spirit of a wandering software engineer is.

Research software engineering as a career path

09:30

Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure

Alberto Accomazzi

The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. In this talk I will describe the evolution of the ADS system, its capabilities, and the technological infrastructure underpinning it.

I will begin with an overview of the ADS’s original architecture, constructed primarily around simple database models. This bespoke system allowed for the efficient indexing of metadata and citations, the digitization and archival of full-text articles, and the rapid development of discipline-specific capabilities running on commodity hardware. The move towards a cloud-based microservices architecture and an open-source search engine in the late 2010s marked a significant shift, bringing full-text search capabilities, a modern API, higher uptime, more reliable data retrieval, and integration of advanced visualizations and analytics.

Another crucial evolution came with the gradual and ongoing incorporation of Machine Learning and Natural Language Processing algorithms in our data pipelines. Originally used for information extraction and classification tasks, NLP and ML techniques are now being developed to improve metadata enrichment, search, notifications, and recommendations. I will describe how these computational techniques are being embedded into our software infrastructure, the challenges faced, and the benefits reaped.

Finally, I’ll conclude by describing the future prospects of ADS and its ongoing expansion, discussing the challenges of managing an interdisciplinary information system in the era of AI and Open Science, where information is abundant, technology is transformative, but their trustworthiness can be elusive.

Prize

10:15

Empowering SKA Data Challenges: A homogeneous platform for enhanced collaboration and scalability fully aligned with Open Science.

Manuel Parra-Royón

The Square Kilometre Array Observatory (SKAO) is an international collaborative effort focused on constructing and operating the world's most advanced radio telescope. The SKAO Science Data Challenges (SDCs) are a series of competitions that are designed to help scientists and engineers develop new techniques for analysing the vast amounts of data that the SKAO will generate. These SDCs have traditionally been conceived to use computing resources kindly provided by scientific institutions and facilities. The method of allocating computing resources for participants in the Data Challenges has varied among resource providers, resulting in a heterogeneous user experience where the users have access to Virtual Machines (VMs) with differing configurations, while others provide HPC-type resources. Providing an uniform platform for computing resources for SDC brings fairness, scalability, enhanced collaboration and consistency. Participants work with equal tools and streamlined collaboration. A standardised setup simplifies resource management, support, and evaluation, leading to enhanced efficiency and reliable results.

JupyterHub provides a platform for provisioning compute resources through a container orchestration service such as Kubernetes, in addition to providing user demand scaling, and enabling centrally managed authentication. The advantages of this approach include ease of deployment through Helm, homogenisation of the customisation for software and compute environment needed for the SDC, and horizontal scalability by allowing resources to be allocated to users by the Kubernetes cluster based on demand and availability.

With this contribution we want to present a highly portable, interactive and fully OpenScience-aligned analysis service for future participants in different Science Data Challenges to develop solutions on a horizontally scalable platform within the infrastructures of the SKA Regional Centres Network (SRCNet) and other IT facilities. In this context, we will show the process of configuring the Kubernetes cluster, the installation and preparation for BinderHub/JupyterHub, as well as a use case for a data analysis and workflow in radio astronomy, using Dask (a Python library for parallel and distributed computing) to take advantage of the capabilities of large distributed clusters in the cloud on Kubernetes. To ensure portability, two SRCNet cloud platforms such as ESPSRC (Spain) and CHSRC (Switzerland) have been used in addition to the infrastructure of a supercomputing centre (CESGA).

Cloud infrastructures for astronomical data analysis

11:00

The VLA Sky Survey (VLASS) is a multi-epoch radio survey of the whole sky visible to the Very Large Array. It has a frequency range of 2-4 GHz, with 2.5-arcsecond resolution, and is taken in "on the fly" (OTF) mode with the antennas rastered on the sky in sets of 10x4 deg tiles in three epochs. The combination of the high angular resolution of VLASS and the OTF observing mode produce significant challenges for data processing. Although “Quick Look” images are made within ~ 1 month of observing, we are exploring new algorithms involving GPUs to speed the gridding of the observed visibilities in order to make higher accuracy images for the final processing. The large computing resources needed for VLASS has led to us develop methods for processing on remote clusters in order to complete the survey imaging in a timely fashion. Finally, the scale of the survey also means that accessing and visualizing the 34,000 individual images per epoch is itself a challenge, both for the VLASS quality assessment (QA) team and for our users. In this talk I will discuss the data challenges associated with VLASS and the solutions we are adopting, including algorithmic and machine learning approaches to QA and VO services and applications such as HIPS, SIA2 and SODA (via CADC) for data access and visualization.

11:15

Lessons learned from building LOFAR data pipelines

Matthijs van der Wild

In this presentation I will show how automated data processing provides great opportunities in developing robust and efficient results in astronomical research. As research code and data processing pipelines grow ever more complex, it has become more important than ever that scientists have access to frameworks that facilitate the validation of their results, and ensure that those results are fully reproducible.

I will demonstrate the current state of pipeline development for the processing of data from the International LOFAR telescope, how this pipeline leverages familiarity of common software tools and community-supported frameworks, and how research software can be embedded into this pipeline to create complex but understandable and consistent processing steps that reliably produce science-ready results.

Another point I want to address is the importance of interdisciplinary communication and coding standards. The existence of such allows a larger part of the scientific community to collaborate on mutually shared goals — of which data processing is a prominent example — and allows us as developers to create and maintain tools that anticipate the needs of future scaling. I will show how the pipeline that I will present is partially a product of such collaboration.

Finally, during this talk I would like to reflect on the broader lessons I have learned during my time developing this pipeline as someone who had no prior experience as a scientific software developer. I hope that, by sharing my experiences, I can inspire others to build and improve on them, and that in turn I can learn from the experiences of others.

11:30

Towards automated structural analysis of galaxies in large imaging surveys

Sarah Casura

I will present our pipeline for the surface brightness fitting of galaxies using optical and near infrared imaging data from large surveys, which we applied to ~13,000 nearby galaxies with z<0.08 from the Galaxy And Mass Assembly (GAMA) survey. We fit three models to each galaxy in each of our nine wavelength bands with a fully automated Markov-chain Monte Carlo analysis using the Bayesian two-dimensional profile fitting code ProFit. For the first time, we employ ProFits multi-frame fitting functionality, working with data at the pawprint level and fitting all exposures of the same galaxy in the same band simultaneously, thus avoiding point spread function (PSF) uncertainties due to stacking. All preparatory work, including image segmentation, background subtraction, PSF estimation, and obtaining initial guesses, is carried out using the complementary image analysis package ProFound; and we develop additional routines for post-processing, including model selection, extensive quality control and a detailed investigation into systematic uncertainties. The resulting catalogue of robust structural parameters for the stellar components of galaxies (bulges and disks) can be used to study a variety of properties of galaxies and their components such as colours, luminosity functions, mass-size relations and dust attenuation. At the same time, our work contributes to the advancement of image analysis, surface brightness fitting and post-processing routines for quality assurance in the context of automated large-scale bulge-disk decomposition studies. Such advancements are vital to fully exploit the high-quality data of current and upcoming large imaging surveys.

11:45

Stimela 2, kubernauts, and dask-ms: radio interferometry data reduction in the cloud

Oleg Smirnov

Radio interferometry has been slow in adopting cloud-based technologies, despite some of their apparent advantages. I argue that it has been difficult to make radio interferometry on the cloud cost-effective for a number of reasons, chief among them: (a) awkward legacy data formats ill-suited to object store, (b) complex and heterogeneous software stacks with a heavy reliance on legacy code, and (c) awkward and complicated "thick/thin" workflows with very different resource requirements at different stages of the pipeline.

Recent software developments, however, offer a way forward. I will showcase some of these, including the Stimela 2 workflow management and containerization framework, which streamlines the orchestration of complex workflows on a Kubernetes cluster, and the dask-ms library, which maps legacy data formats onto diverse storage backends, providing support for object store. A new generation of software packages leverages these technologies, providing cloud-efficient implementations of the basic processing steps, which are able to exploit the auto-scaling capabilities inherent to cloud architectures. I will demonstrate a full data reduction workflow running on AWS. I will also argue that cloud-compatible pipelines go a long way to providing fully reproducible workflows.

Cloud infrastructures for astronomical data analysis

13:30

Using The NEOfixer API for NEO Follow-Up and NEO Queries

Alex R. Gibbs

NEOfixer's primary goal is to provide NEO targeting recommendations that aid in coordinating follow-up efforts. To do that effectively it creates a unique database of all the known NEOs and NEO candidates. It monitors the obvious data sources at the MPC and JPL, but incorporates other information as well, such as lists of potential radar and mission targets. It calculates orbits, ephemerides, and a variety of custom scores for each NEO based on that information. Finally, a ranked list of target recommendations is generated, customized for each subscribing telescope. Much of this information is available on the website, and more still is available via the API. You do not need to have an account to use NEOfixer and its API.

The NEOfixer API allows users to participate in its primary mission using scripts and automation, but it can be used for more than that. I will demonstrate how to use the API for everyday NEO follow-up, how to obtain details about specific NEOs, and how to generate filtered lists of NEOs for other purposes. I will give examples of how Catalina Sky Survey incorporates NEOfixer API calls into its workflow.

Catalina Sky Survey would like to thank NASA's Planetary Defense Coordination Office for its continued support, including for NEOfixer, currently via grant 80NSSC21K0893-NEOO.

13:45

XMM-Newton Science Analysis System (SAS) on the cloud

Aitor Ibarra Ibaibarriaga

Authors:
Aitor Ibarra (Telespazio UK for ESA), Richard Saxton (Telespazio UK for ESA), Jose Marcos (Telespazio UK for ESA), Anthony Marston (ESA), Esin Gülbahar (ESA) and Peter Kretschmar (ESA)

Abstract:

The XMM-Newton satellite is one of the most successful missions ever built for ESA. It has been operating as an open X-ray observatory since the beginning of 2000, producing high quality scientific results since then.

The XMM-Newton Science Analysis Software (SAS) is the application used for processing the data obtained with the scientific instruments on board XMM-Newton, an indispensable tool that has been helping scientists in the publication of nearly all refereed scientific papers published up to date. SAS is a robust software that has allowed users to produce good scientific results since the beginning of the mission. This has been possible given the SAS capability to evolve from a stand-alone to a SaaS (Software as a Service) application and adapt to the needs of the scientific community.

Today, the landscape of data analysis is evolving with the advent of cloud computing, offering new dimensions to enhance scalability and efficiency. Recently, XMM-Newton project developed a pilot prototype to migrate the current Remote Interface for Science Analysis (RISA), available through the XMM-Newton Science Archive (XSA) to Amazon Web Services (AWS). This presentation explores the synergy between SAS and cloud processing, showcasing how this collaboration transforms the landscape of X-ray astronomy.

This presentation explores the collaborative potential between the XMM-Newton SAS, cloud processing and the European Space Agency’s (ESA) DataLab initiative. In the future, we will explore as well other collaborative data-driven science platforms, like SciServer , which could form a synergy that revolutionizes X-ray astronomy analysis.

Furthermore, we address recent SAS developments, focused on Docker technologies, to prepare SAS for this new technology paradigm. In particular, new SAS python interfaces that will help users to run data processing threads based on ESA DataLab platform. Further developments, such as X-ray image interactivity is needed to exploit all SAS capabilities in these cloud environments.

In conclusion, the fusion of the XMM-Newton SAS, ESA DataLab, and cloud processing represents a significant leap forward in X-ray astronomy data analysis. This symbiotic relationship not only accelerates scientific discoveries but also paves the way for innovative research methodologies, empowering astronomers to explore the depths of the universe with unprecedented efficiency and precision.

Cloud infrastructures for astronomical data analysis

14:00

Open Source Science Initiative at NASA

Steven Crawford

The Open Source Science Initiative implements the ambitious, open science vision outlined in the NASA Science Mission Directorate’s “Strategy for Data Management and Computing for Groundbreaking Science 2019-2024.” OSSI includes the recently updated Scientific Information Policy (SPD-41a) that includes updated requirements, compliant with the recent OSTP memo on “ Ensuring Free, Immediate, and Equitable Access to Federally Funded Research”, for sharing data, publications, and software produced from SMD’s research activities. The Initiative further aims to increase accessibility, inclusion, and reproducibility in Earth and Space Sciences through a range of activities including training in open science, development of Open Science technologies, and grants to support Open Science. NASA’s Transform to Open Science, a program to train 20,000 scientists over the next 5 years in Open Science and broaden participation from historically excluded groups, kicked off with the Year of Open Science in 2023. NASA had provided over $6 million to sustaining and scientific software as well as to support innovate open science projects. The latest developments include a range of new infrastructure to support open science. This includes the Science Discovery Engine providing cross divisional data search and the Science Explorer, an expansion of the Astrophysics Data Service to include the other divisions in SMD. With the release of the core services strategy, SMD is laying out a path to enable groundbreaking science through cloud and high performance computing access and services.

Cloud infrastructures for astronomical data analysis

14:30

Taking TESScut to the Cloud: Architecting for Availability, Performance and Cost

Ben Falk

We present the challenges encountered and solutions we reached while converting the TESScut application to run in the cloud. TESScut is a web application that provides image cutouts of chronologically-stacked TESS full-frame images, without requiring the user to work with the image stacks themselves. While running inside our on-premises datacenter, the application ran on a large virtual machine: 32 cores, 64 GB of memory, and nearly 400 TB of high-performance local-storage for serving the image stack data. This single machine served terabytes of cutout request data to users each month. Replicating this specific environment in the cloud would have been prohibitively expensive and beyond our budget. Instead, our cloud architecture utilizes serverless tasks inside AWS ECS Fargate, performs cutouts from remote files on an open data S3 bucket, and relies heavily on autoscaling, to achieve our performance goals while keeping costs within budget. We hope that others can benefit from our experiences and lessons learned.

Cloud infrastructures for astronomical data analysis

14:45

A Good IDIA : Scientific Computing at Scale

Srikrishna Sekhar

The high data rates from current and next generation radio interferometers (MeerKAT, JVLA; SKA, ngVLA) necessarily require the data to be processed via a highly parallelized architecture in order to complete processing at reasonable timescales.

In this talk, I will discuss the Institute for Data Intensive Astronomy (IDIA) facility in Cape Town, South Africa - a pathfinder SKA science regional data centre; the tools and systems developed and adapted to perform processing at scale with the aim of producing high-fidelity images from radio interferometers. I present the IDIA MeerKAT pipeline - an automated, parallel, scaleable full Stokes calibration and imaging pipeline for MeerKAT data designed to operate on the ilifu cluster using off-the-shelf software. Our setup uses the IDIA platform running on hardware provided by the ilifu national facility, taking advantage of cluster-level parallelism, resource management and software containers.

I also discuss use of the CARTA software to efficiently visualize terabyte scale image products remotely, and briefly discuss some of the algorithm developments in progress to produce high fidelity widefield polarimetric maps with MeerKAT and other interferometers.

Cloud infrastructures for astronomical data analysis

15:15

Navigating ESA HST and JWST Science Archives through Automated Jupyter Notebooks

Javier Espinosa Aranda, Marcos López-Caniego, Maria Arevalo Sanchez

Efficient data access and analysis are crucial in the ever-expanding realm of astrophysical research. This demonstration aims to showcase a comprehensive workflow for initiating and conducting research using the European Space Agency's (ESA) Hubble Space Telescope (HST) and James Webb Space Telescope (JWST) Science Archives. Guidance will be provided from the User Interfaces to advanced scripting, supporting researchers when navigating the vast repositories of observations and data.

Starting from scratch, participants will learn how to execute simple searches using the available User Interfaces (https://hst.esac.esa.int/ehst, https://jwst.esac.esa.int/archive). These user-friendly applications will help users to identify the desired observations and check the associated files in the quick-look viewers for images, cubes and even their footprints, using an embedded version of ESASky. The objective of this step is to construct complex queries that target specific celestial objects, time periods, and data types, among many other filters.

A step-by-step walkthrough will highlight the direct integration of these queries into automated Jupyter Notebooks generated on-the-fly in the User Interfaces, removing the need for manual data extraction. These notebooks will be readily equipped with essential code snippets for data retrieval, pre-processing, and initial analysis. Participants will gain insights into effectively handling and visualizing data directly within the notebooks.

The automated notebooks serve as a foundation for attendees to embark on scientific exploration immediately, facilitating faster insights and reducing the barrier to entry for researchers new to the archives. This approach not only empowers researchers but also encourages collaborative and reproducible research practices within the astrophysical community (e.g. integrating these Notebooks into ESA Datalabs).

Science with data archives: challenges in multi-wavelength and time domain data analysis

16:00

NASA Archival Data in The Cloud: Service & Discovery

Abdu Zoghbi

NASA data archives started serving data from the cloud for several Astrophysics space missions. Making this data findable and discoverable means that discovery services from the archives needs to be updated to include the cloud data. It also means clients used by scientists need to know how to process and interpret that cloud information. Here, I will present some of the work that the NASA archives have been doing in serving cloud data, and tools developed that allow users to find and access the cloud data seamlessly.

Cloud infrastructures for astronomical data analysis

16:15

Processing All- Sky Images At Scale On The Amazon Cloud: A HiPS Example

Bruce Berriman

We report here on a project that has has developed a practical approach to processing all-sky image collections on cloud platforms, using as an exemplar application the creation of 3-color Hierarchical Progressive Survey (HiPS) maps of the 2MASS data set with the Montage Image Mosaic Engine on Amazon Web Services. We will emphasize issues that must be considered by scientists wishing to use cloud platforms to perform such parallel processing, so providing a guide for scientists wishing to take exploit cloud platforms for similar large-scale processing. A HiPS map is based on the HEALPix sky tiling scheme. Progressive zooming of a HiPS map reveals an image sampled at ever smaller or larger spatial scales that are defined by the HEALPix standard. Briefly, the approach used by Montage involves creating a base mosaic at the lowest required HEALPix level, usually chosen to match as closely as possible the spatial sampling of the input images, then cutting out the HiPS cells in PNG format from this mosaic. The process is repeated at successive HEALPix levels to create a nested collection of FITS files, from which are created PNG files that are shown in HiPS viewers. Stretching FITS files to produce PNGs is based on an image histogram. For composite regions (up and including the whole sky) the histograms for each tile can be combined to create a composite histogram for the region. Using this single histogram for each of the individual FITS files means all the PNGs are on the same brightness scale and displaying them side by side in a HiPS viewer produces a continuous uniform map across the entire sky.

All the processing just described can one readily performed in parallel on AWS instances. To create the HiPS maps on AWS, jobs were set up with a Docker container that contains the requisite data software components, including modules added to streamline processing on cloud platforms, including adjusting for inter-image background variations and developing a global model for visualization stretches. Jobs are set up and run with the Amazon Web Services (AWS) Batch processing mode, which spins up server instances as needed, pulling from a pool of pre-defined job script. When a job is done it either the compute instance another job from the pool or shuts the instance down. This approach minimizes having idle instances which would still incur charges even when not processing. A set of script generators developed for this project create, by design, simple scripts that are handed to the instances to run jobs inside the containers. Processing the whole sky at three wavelengths requires about ten thousand such jobs. We will discuss processing times and costs.

Cloud infrastructures for astronomical data analysis

16:30

Preparing a scientific data processing facility for Rubin Observatory’s LSST: the case of France’s CC-IN2P3

Fabio Hernandez, Gabriele Mainetti

Located in Lyon, France, the IN2P3 / CNRS Computing Centre (CC-IN2P3) has been preparing its contribution to produce the Legacy Survey of Space and Time in its role as the Rubin Observatory’s France Data Facility.

An integral copy of the raw images will be imported and stored there for the duration of the 10 year-long survey and annual campaigns of reprocessing of 40% of the raw images recorded since the beginning of the survey will be performed on its premises. The data products of those campaigns will be sent back to the Observatory’s archive center in the USA.

As a scientific data processing facility shared by several dozen international projects in high energy physics, nuclear physics and astroparticle physics, in recent years we have observed a significant increase in both the computing and storage capacity demand as well as in the complexity of the services required for supporting astroparticle physics projects. We expect their needs to continue increasing for the foreseeable future: major international projects like Rubin, Euclid, KM3NeT, Virgo/LIGO represent a sizeable fraction of the resources CC-IN2P3 provides to the science projects it supports, even if not yet at the level of the high energy physics projects.

In this contribution we will address how we have been preparing to perform bulk image processing for the needs of the Rubin Observatory annual data release processing campaigns for the duration of the survey. We will present the architecture of the system we deployed with focus on the storage, compute and data transfer components and how we have been testing the system at significant scale. We will highlight and motivate some of the solutions we adopted which have proven effective for our successful contribution to other large science projects like CERN’s Large Hadron Collider. We will also cover our initial experience with components deployed for the specific needs of scientific exploitation of Rubin data such as the astronomical catalog database and the Rubin science platform.

Cloud infrastructures for astronomical data analysis

16:45

Rubin Science Platform: on cloud, on-prem, all of the above

Frossie Economou

The Rubin Science Platform is already in production before system first light and is approaching 1,000 registered early access users working with precursor data products on our outward-facing deployment on Google Cloud . In this talk I describe the architecture that allows a small team to manage over a dozen separate deployments of the platform on cloud, on-premises (including the telescope summit) and in our hybrid model for operations, a mixture of both. I will also briefly address common mistakes made when evaluating cloud economics

Cloud infrastructures for astronomical data analysis