Astronomical Data Analysis Software & Systems XXXIV :: pretalx

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.

Sunday, Nov. 10, 2024

Monday, Nov. 11, 2024

Tuesday, Nov. 12, 2024

Wednesday, Nov. 13, 2024

Thursday, Nov. 14, 2024

13:00

Programming the GPU on your laptop - is it easy, is it useful?

Keith Shortridge

This tutorial is aimed at ADASS attendees who may have sat through numerous talks about how GPUs make everything faster, wondered about making use of the GPUs to speed up compute tasks, and then somehow never found the time to actually try it. The aim is to give people who have no experience with GPU programming a kick-start towards trying it for themselves on their own laptops. The tutorial will be based around a small set of example C++ command line programs that perform calculations on 2D data, all of which run on MacOS, Linux, and Windows. Attendees will be able to build, run, modify, and experiment with these programs, seeing how the GPU performance compares with the CPU. The structure of the programs will be explained together with details of the code, with an emphasis on what is actually going on in the GPU when it runs. Metal and Vulkan versions of each program will be provided. Metal is Apple’s current GPU infrastructure, and Vulkan - a descendant of OpenGL - will run on almost any recent GPU. (Those wondering about CUDA should note that CUDA and systems based on it need an Nvidia GPU, while most laptops use different GPUs.) As the underlying operations they perform are the same, seeing what these two quite different systems have in common will provide some insight into the internal workings of a modern GPU. Comparing the GPU and CPU code will show which programs gain from using the GPU and which may not. There will be a bit of fun stuff at the end with graphics, and everyone will take away an amount of potentially helpful example code.

The latest release of the example code, and the latest installation instructions, can be found at:
https://github.com/KnaveAndVarlet/ADASS2024_GPU

The Advanced Scientific Data Format (ASDF)

ASDF is a language-neutral file format for serializing scientific data,
in use by JWST, DKIST and Roman. It is integrated in data analysis and
visualization tools. It has human-readable hierarchical metadata structure,
followed by binary blocks and is designed to be easily extensible and
customizable. The tutorial will be a hands-on session focusing on reading,
writing and creating ASDF files in Python. It aims to show the advantages
of ASDF over other formats by working through an example challenging
in other formats (WCS). Participants will gain understanding of how ASDF
works at the format and library level.

Metadata, Semantics and Data Models Applicable to Data Formats

15:00

15:00

30min

Coffee Break

Aula Magna

15:00

30min

Coffee Break

Aula Prima

15:30

Namespaces Outside of Containers

Linux namespaces are half of the infrastructure used to create “containers”, but it is not commonly known that namespaces can be effectively used without requiring the use of container technology such as docker or Kubernetes. This tutorial will provide an overview of the current namespaces that can be used and the tools that exist to interact with namespaces, and then provide some practical examples (primarily around named network namespaces) that can be run by learners on their own systems.

The first step when thinking about User Experience: Set-up an UX Vision

This tutorial is aimed at anybody who wants to implement High-Level UX (User Experience) thinking in their software development process. A glance at how we used to set up a UX vision group and UX templates at Astron. Gathering stakeholders and collaborating on:
* Persona(s)
* User story maps
* User journey maps

Establishing a clear UX vision is the foundational step in creating a successful User Experience. This vision serves as a guiding star for the design process, ensuring that all design decisions align with overarching goals and objectives. A well-defined UX vision helps create a cohesive and consistent user experience, considering user needs and objectives. By defining a strategic framework for designers and stakeholders, it enables teams to deliver meaningful and valuable experiences that meet user needs and organization goals.

User Experience

18:00

18:00

120min

Welcome Reception

Aula Magna

08:30

08:30

30min

Conference Welcome

Aula Magna

09:00

The Challenges of Astronomical Data Systems

Astronomical data systems are the critical link between telescopes and the scientific community. Creating and maintaining these systems requires a wide range of valuable skills, not just scientific, as unfortunately highlighted by recent security-breaches. Is ADASS the correct venue for such extended topics?

Data Management and Trusted Repository in the Open Data Era

09:30

Securing Space Science: Advanced Data Protection in the HREDA Archive

Anastasia Andres, Angela Carasa

The HREDA (Human and Robotic Exploration Data Archive) is a data archive and information portal that contains ESA funded or co-funded investigations and experiments since 1972. These experiments are performed in the International Space Experiments from different investigation fields such as growing vegetables and fluids physics studies in the space.
The archive is a joint effort by ESA’s Directorate for Human and Robotic Exploration, the Directorate of Science, and the Science Data Centre (SDC) Madrid. It became operational in 2020 and supersedes the former Erasmus Experiment Archive (EEA) and the ESA Microgravity Database (MGDB).
HREDA is developed by the ESAC Science Data Centre (ESDC). The ESDC provides services and tools to access and retrieve observations and data from ESA's space science missions (astronomy, planetary science, heliophysics and human robotic exploration).

The data archive within the system is highly heterogeneous, requiring the management of different security levels. Some data is always public, some becomes public after an initial prior access, whereas sensitive data requires special permissions for accessing like medical analysis from astronauts. This paper focuses on the advanced mechanisms developed for accessing sensitive data.

Our archive incorporates an advanced security framework for managing sensitive data, ensuring compliance with data protection standards. The system ensures that all incoming data is received in an encrypted format, safeguarding it from unauthorized access from the moment it enters the network. Each dataset is assigned an unique certificate, adding an additional layer of security and traceability. Decryption keys are securely stored in a robust Key Management Service (KMS) server, further protecting the data from breaches.

Our solution integrates two-factor authentication (2FA) with HRE-IC Internet Secured Services (HISS) to provide an extra layer of security, ensuring that only verified users can access the data. Additionally, access authorization is meticulously managed on an individual basis, with each user requiring explicit approval. This personalized authorization process guarantees that only the right personnel have access to the sensitive information.
This implementation enables the end user to securely download decrypted data without having to manage the complexities of certification and key management.

Data Management and Trusted Repository in the Open Data Era

09:45

Insights from a 30-Year international Partnership on Astronomical Archives

David Rodriguez

In an era where astronomical data is expanding at an unprecedented rate, the importance of data sharing and accessibility among astronomy archives cannot be overstated. Since the 1990s, an international partnership between the Space Telescope Science Institute (STScI), the European Space Astronomy Centre (ESAC), and the Canadian Astronomy Data Centre (CADC) has been focused on this endeavor, facilitating the exchange of data from the Hubble and James Webb Space Telescopes.

We will present how this collaboration has evolved over time, highlighting key milestones and innovations in decision-making, communication, and technology. Additionally, we will discuss some of the challenges we have encountered and the strategies we employed to overcome them, offering insights that could benefit future archive collaborations.

Data Management and Trusted Repository in the Open Data Era

10:00

10:00

60min

Coffee Break

Aula Magna

10:15

The ESA Near-Earth Objects Coordination Centre Python Interface

Eduardo Peleato

The Near-Earth Objects Coordination Centre (NEOCC), part of the ESA’s Planetary Defence Office, is dedicated to monitoring, tracking and assessing the risks associated with near-Earth objects (NEO).

The NEOCC offers a variety of services and tools, including public access to data through its portal and related APIs. The NEOCC Python Interface is a powerful tool designed to streamline access to critical data from the NEOCC API. The demonstration will highlight how this Python library simplifies the retrieval and analysis of near-Earth object data like orbital data and physical properties data, and how to efficiently integrate NEOCC data into third-party libraries like ESA’s flight dynamics software GODOT. Furthermore, it will be demonstrated how to use it for retrieval and analysis of advanced information about asteroids, ephemeris and other ancillary data such as the risk list and the close approach list. Finally, a simulation of an Apophis 2029 intercept mission will be showcased.

Data Management and Trusted Repository in the Open Data Era

Meeting Room 101

11:00

Dynamic Imaging With MeerKAT: The Time Axis As The Final Frontier

With the increased sensitivity and field of view of SKA pathfinders, dynamic radio imaging (that is, imaging the time axis) is becoming a burgeoning field, yielding rich new discoveries of transients and variable sources. MeerKAT is capable of reaching sub-150 uJy image rms in an 8s integration, which opens up studies of variability on much shorter timescales than was possible with previous radio interferometers. This also has important implications for interferometric SETI, since any potential technosignatures would be a subset of such transient events.

At the same time, imaging at such short timescales introduces its own substantial challenges. Instrumental effects that tend to average out in a traditional long synthesis observation can become limiting for dynamic imaging if not addressed correctly. I will discuss these challenges and present MeerKAT dynamic imaging of Jupiter’s radiation belts, which have led to the serendipitous discovery of a pulsar-class object named the PARROT (pulsar with abnormal refraction recurring on odd timescales).

This work has led to the development of (and given the name to) a more general dynamic imaging pipeline, developed in collaboration with the Breakthrough Listen initiative. The PARROT pipeline is capable of detecting short-duration transients in imaging data, and yielding light curves and dynamic spectra for thousands of field sources en masse. We are already starting to use it to “mine” existing archival MeerKAT data, yielding a couple of new discoveries. The longer-term plan is to develop the PARROT pipeline to a state where it can be run in real-time, commensally with any MeerKAT imaging observation. This would open the door to transient event triggers -- something that has never been done with a radio interferometer before. With 6 years of observational data in the MeerKAT archive ready to be mined, and new observations arriving daily, this has the potential to turn MeerKAT into a transient and variability discovery machine, opening up new frontiers in astrophysics and SETI.

Real-time and Near Real-time Processing Pipelines

11:15

Sub-arcsecond degree-scale imaging pipelines with LOFAR

In recent years, significant efforts have been made to automatically calibrate and image observations conducted with the Dutch high-band antennas from the Low Frequency Array (LOFAR) observing the universe at 150 MHz. These efforts have led to the LOFAR Two-metre Sky Survey (LoTSS; Shimwell et al. 2017, 2019, 2022) and the LoTSS-deep fields (Kondapally et al. 2021; Duncan et al. 2021; Tasse et al. 2021; Sabater et al. 2021), providing wide-field images of the northern sky at 144 MHz and 6” resolutions. However, 90% of the radio sources at 6” remain unresolved at 144 MHz. This necessitates higher resolutions by using data from all of LOFAR’s international stations, extending the maximum baselines to about 2000 km and resulting in sub-arcsecond resolutions.

We now present a calibration and imaging pipeline capable of producing deep sub-arcsecond resolution images, achieving the highest sensitivities and highest resolutions at the lowest frequencies radio frequencies (Morabito et al. 2022; Sweijen et al. 2022; de Jong et al. 2024). Given the challenge of working with hundreds of terabytes of data, we are now focused on reducing computational costs such that we enable the possibility of a near real-time calibration and imaging pipeline. This advancement will allow for LoTSS-type surveys with data from all of LOFAR's international stations, facilitating further research of radio sources at 150 MHz and sub-arcsecond angular scales.

Real-time and Near Real-time Processing Pipelines

11:30

Finding Fireballs in Lightning: A Daily Pipeline to Find Meteors in Weather Satellite Data

Weather satellite data contains a wealth of information well beyond its application to meteorology. The GOES weather satellite lightning mapper instruments detect millions of lightning strikes per day. Within these haystacks are a handful of bolides (exploding meteors). Through a combination of hard manual work, advanced machine learning techniques, statistical analysis and supercomputers, our multi-disciplinary team has succeeded in creating an efficient pipeline to identify the bolides. Our algorithms are also sensitive to other interesting phenomena in the data. Funded by NASA's Planetary Defense Coordination Office (PDCO), our goal is to create a rich, calibrated, and statistically consistent data set of bolide light curves to inform the planetary defense community of the risks associated with large asteroidal impacts. We utilize a three-stage detection pipeline, with successively more computationally expensive algorithms: 1) simple Hierarchical Clustering, 2) Random Forests and then 3) Convolutional Neural Networks. Detections are promptly published on a NASA hosted publicly available website, https://neo-bolide.ndc.nasa.gov. We present the evolution of our pipeline, the ML techniques utilized and how we continue to incorporate new information to improve detection performance.

Real-time and Near Real-time Processing Pipelines

11:45

Enhancing Keck Observatory Operations: The Data Services Initiative's Journey

In 2021, the W. M. Keck Observatory and the NASA Exoplanet Science Institute enhanced their collaboration through the Data Services Initiative aiming to produce and archive science-ready data. This endeavor focused on refining two pivotal aspects of observatory operations: the planning and execution of observations, and the processes of data reduction and archiving. To achieve these, the team developed new tools, including web-based GUIs, to capture observer intentions into a machine-readable format and an execution engine to carry out these observation plans. A revamped data handling architecture was also introduced to expedite the creation of reduced data products and their subsequent near-real time ingestion into the Keck Observatory Archive. As a result, raw and reduced data are now made available to users well under the requirement of 5 minutes, significantly enhancing the efficiency of data access. Additionally, the number of community-supported data reduction pipelines for Observatory instrumentation expanded from 2 to 9 (across 11 instruments), exceeding the initial project goals. Despite these advancements, the team navigated significant challenges, such as integrating automated workflows within a classically scheduled observing model and managing with limited personnel and budgetary resources. This presentation will explore the project journey, highlighting the successes achieved, obstacles faced, and lessons learned throughout the development cycle that can be applied to your future work.

Real-time and Near Real-time Processing Pipelines

12:00

High-Performance Computing in Astronomy: Triumphs and Tribulations of Pipeline Processing on Supercomputers

The advent of supercomputing has the potential to revolutionise astronomical data processing and is essential for the analysis of massive Radio Astronomy datasets of the SKA era at unprecedented speeds. This talk presents our experiences implementing a complex data reduction pipeline on a number of state-of-the-art supercomputer facilities. We demonstrate how, when optimised, our pipeline achieves remarkable throughput, reducing processing times from weeks to mere hours for large-scale surveys.

However, the transition from traditional computing environments to supercomputing infrastructures is not without challenges. We discuss several critical issues encountered, including:
1) Scratch file management in distributed file systems like Lustre
2) File handle limitations in massively parallel operations
3) Scheduler conflicts and queue optimisation with measurement sets being very different in size depending on observation time and flagging
4) Wall time constraints and job segmentation strategies

We offer some solutions for astronomers looking to leverage high-performance computing resources and tools, such as DALiuGE, to mitigate many of these issues. Our findings highlight both the transformative potential and the practical considerations of supercomputing for modern astronomical research.

Real-time and Near Real-time Processing Pipelines

12:15

High-Performance Pipeline Processing for the Australian Square Kilometre Array Pathfinder

Matthew Whiting

The Australian Square Kilometre Array Pathfinder (ASKAP) is a new-technology radio telescope operated by CSIRO at Inyarrimanha Ilgari Bundara, the CSIRO Murchison Radio-astronomy Observatory in the Western Australian outback. Its innovative receivers, with their wide field-of-view, generate very large data rates, necessitating high-performance computing to create the required calibrated images and catalogues, and deposit them in the CSIRO ASKAP Science Data Archive (CASDA) for use by astronomers.

The processing is orchestrated by the ASKAP pipeline, a scripted workflow that interfaces with the Slurm workload manager to run all necessary data preparation, calibration, imaging, and source-extraction tasks. The computationally-intensive processing is done using a custom-written imaging package called ASKAPsoft, specially designed to handle the scale of data produced by ASKAP. Crucially, the pipeline must run in near-real-time to keep up with the incoming data rate, allowing the telescope to efficiently survey the entire sky.

The ASKAP pipeline is operational, with regular survey observing resulting in large amounts of data (currently >3.8PB since full-surveys started late 2022) being made publicly available through CASDA. ASKAP processing is a demonstration of what can be possible through a large and complex nearly-autonomous supercomputing workflow, and provides important lessons for planning of even larger workflows anticipated for future instruments.

This talk will describe the design decisions that went into creating and scaling up the workflow, and describe how it has been set up to work on the supercomputers at the Pawsey Supercomputing Centre. This will include the range of different types of processing jobs and their contrasting requirements, the impact of the high I/O on overall processing efficiency, and lessons learned from both developing and running the pipeline at scale. We look ahead also to planned upgrades, as well as considerations for implementing processing for future facilities such as the SKA.

Real-time and Near Real-time Processing Pipelines

12:30

12:30

90min

Lunch

Aula Magna

14:00

Empowering Science with Good Design

Scientists rely on software to make discoveries. As the creators of science tools, our design decisions can create gigantic challenges out of the simplest tasks or enable scientists to do more science with ease. This talk will cover practical design methods for building astronomy tools, best practices for user experience and accessibility, and examples of design patterns especially relevant to science workflows.

User Experience

14:30

DARTS Timescape: Exploring 50 Years of Space Science Data Through Interactive Visualization

Miriam Sawczuck

DARTS, the data archive operated by JAXA’s Institute of Space and Astronautical Science, provides a wide range of time series data obtained from space science missions over a period of 50 years. In this paper, we present the development of “DARTS Timescape”, a system designed to provide efficient access to these time series data and allow users to explore the vast temporal landscape of missions archived in DARTS.
To build this system, we utilized InfluxDB, an open-source time series database designed for fast, high-availability storage and retrieval of time series data. InfluxDB can be easily used as a data source in the likewise open-source tool Grafana to build an interactive web application for data visualization.
Users of DARTS Timescape can compare data from various space missions side by side, analyse long-term trends with data aggregation such as calculating averages, and more.
For demonstration purposes, we will present various datasets providing insights into lunar seismology, magnetospheric phenomena, and high-energy astrophysics, specifically:

Apollo Lunar Seismic Data: This dataset includes seismic measurements from the Moon's surface, captured by instruments deployed during the Apollo missions from 1969 to 1977. These data illustrate the motion and geologic characteristics of the lunar surface.
Arase, MAXI and CALET Magnetosphere Data: Observations from the Arase satellite (launched in December 2016), the Monitor of All-sky X-ray Image (MAXI), and the Calorimetric Electron Telescope (CALET), the latter two being mounted on the International Space Station (ISS) since July 2009 and August 2015, respectively, provide data on relativistic electrons and plasma waves trapped in the Earth's magnetosphere. This data is crucial for advancing space weather research, enhancing our understanding of space environment dynamics and their impacts.
MAXI X-ray Data: MAXI furthermore offers X-ray data from various celestial bodies, enhancing our understanding of high-energy phenomena in the universe.

By integrating these diverse datasets into a unified visualization platform, DARTS Timescape aims to facilitate comprehensive research and discovery across multiple domains of space science.

User Experience

14:45

Integrating UX Design in Astronomical Software Development: A Case Study

In 2023, ASTRON took the step of incorporating a dedicated User Experience (UX) designer into its software development process. This decision aimed to enhance the accessibility and usability of services providing access to our data holdings, as well as to optimize the design of services within the SKA Regional Centres Network.

The field of astronomical software development has historically underemphasized UX design. ASTRON's initiative represents a shift, not only in improving our own tools but also in demonstrating to the broader community the value of integrating UX expertise into development teams.

This presentation will explore the impact of embedding a UX designer within our organisation. We will discuss:

The rationale behind hiring a dedicated UX professional
The integration of UX methodologies into our software development lifecycle
Challenges and lessons learned in this integration process
The potential for wider adoption of UX-focused approaches in astronomical software development

By sharing our experiences, we aim to contribute to the ongoing dialogue about best practices in software engineering within astronomy and astrophysics, emphasizing the critical role of user-centered design in creating more effective and accessible tools for the scientific community.

User Experience

15:00

15:00

60min

Coffee Break

Aula Magna

15:15

Using LSDB to enable large-scale catalog distribution, cross-matching, and analytics

In recent years, the exponential growth of large survey catalogs has introduced new challenges in the joint analysis of astronomical datasets, particularly as we move towards handling petabytes of data. Our demo will showcase the latest advancements in our Large Survey DataBase (LSDB) framework. The framework utilizes a particular hierarchically sharded spatial partitioning of large datasets, using Parquet to store the data. This approach facilitates efficient and scalable cross-matching and analysis of big datasets.

In this demo, we will explore the new features in LSDB; such as support for nested Pandas/Dask, making it easier to work on time-domain data and spectral data by storing observations from the same astronomical objects in the same dataframe row. We will demonstrate how users can start their analysis on small subsection of sky and easily scale up their analysis after initial testing. We will showcase the cross-matching ability across large datasets and demonstrate real-world applications by applying analysis functions on complete wide-sky synoptic datasets. We will highlight our collaborations with our partners (such as STScl, IPAC, Rubin, and CDS) to provide various catalogs in this format and show how we can utilize the Fornax cloud platform to work with diverse datasets in a unified cloud-based framework.

Roadblocks in Astronomical Data Analysis

Meeting Room 101

16:00

PyKOALA, a multi-instrument tool for reducing IFS data

Pablo Corcho-Caballero

Over the past two decades, the advent of Integral Field Spectroscopy (IFS) has revolutionized the field of astronomy, enabling comprehensive analysis of both the spatial and spectroscopic properties of extended objects. However, this technological leap has also introduced significant challenges in the reduction and processing of IFS data. The complex nature of IFS datasets requires meticulous correction and calibration steps, often demanding customized pipelines tailored to specific instruments.

To address these challenges, we present PyKOALA: a cutting-edge, open-source Python package designed to streamline the reduction of IFS data. Originally conceived as an expert pipeline component to complement the outputs of 2dfdr and enhance the data reduction process for the Kilofibre Optical AAT Lenslet Array (KOALA) Integral Field Unit (IFU), PyKOALA's vision has expanded over the past few years from a single-instrument focus to a versatile, multi-instrument framework. It now provides a modular and flexible framework, that allows astronomers to customize their reduction sequences and apply an arbitrary number of corrections across various IFS instruments. PyKOALA offers a streamlined interface that facilitates the ingestion of data from different IFUs, standardizing the fundamental properties of IFS data for consistent processing.

The first official release of PyKOALA is expected during early 2025, though its current beta version already features comprehensive documentation and a suite of Jupyter notebook tutorials to ease the learning curve. In this talk, I will showcase PyKOALA’s powerful capabilities, highlighting key features such as multi-instrument support and advanced correction modules, with examples from the initial results of the HI-KOALA IFS Dwarf Galaxy Survey (HI-KIDS).

Roadblocks in Astronomical Data Analysis

16:15

J-PAS early data release: unique processing challenges of an imaging sky survey in 57 optical filters

Héctor Vázquez Ramió

The Javalambre-Physics of the Accelerating Universe Astrophysical Survey (J-PAS; https://j-pas.org) is a multifilter photometric sky survey using a unique set of 56 narrow-band filters (145Å FWHM) and one broad-band, i, covering the visible range at a magnitude depth of AB~22. It is being carried out at the Observatorio Astrofísico de Javalambre (OAJ), Spain, since 2023. It aims to cover thousands of square degrees of the the observable sky from the OAJ and to determine precise photometric redshifts for around 1.3x10⁸ galaxies . The survey is conducted with the Javalambre Panoramic Camera (JPCam) attached to the 2.5m-class telescope, Javalambre Survey Telescope (JST250). JPCam consists of a mosaic of 14 identical 9.2k x 9.2k CCDs which cover 3.4deg² in total on the sky. On top of each CCD, a different narrow-band filter is positioned through a set of 5 filter trays of 14 slots each.
The J-PAS survey strategy and the instrumentation employed pose important challenges in the data processing, some of them listed hereafter.
Large FoV systems are prone to suffer from stray light that is not properly corrected with the classical flat-fielding. The needed illumination correction is obtained from the 2D residuals of the photometric calibration performed taking as reference Gaia DR3. Other remarkable difficulty is dealing with the variety of background patterns observed through the set of J-PAS narrow-band filters together with other ambient illumination gradients under the presence of the Moon. Another significant challenge is the two-dimensional mixture of PSFs in the stacked images for each filter, due to the fact that the observations are often made on different dates and seeing conditions. This has an important impact on the proper estimation of the photometric aperture corrections on those combined frames. On the electronics side, a specific procedure has to be implemented in order to correct from a time-varying bias signal of the CCDs.
In this talk I will discuss these and other challenges and the solutions we have successfully implemented. I will also announce the upcoming public early data release of J-PAS, scheduled for the end of 2024.

Roadblocks in Astronomical Data Analysis

16:30

Migrating Heterodyne Data Reduction to High-Performance Computing

Christof Buchbender

The data volume produced by astronomical experiments continues to grow with each new generation of instrumentation. This is particularly true for heterodyne receivers transitioning from single-pixel to multi-pixel arrays (hundreds of pixels), as we are doing with the CCAT Heterodyne Array Instrument (CHAI) at the upcoming CCAT observatory. Previous-generation receivers, like GREAT aboard SOFIA, with 21 pixels, generated up to 50-70 GB of data per flight. While challenging, these data volumes could still be
reduced and analyzed on local computers using traditional methods, such as the GILDAS software from IRAM. However, CHAI is expected to produce a peak data rate of 8TB per day. This volume crosses the threshold where traditional single-computer pipelines are insufficient, necessitating a migration to an automated high-performance computing (HPC) environment.

CHAI is one of two instruments at the CCAT observatory. The other instrument, Prime-Cam, a modular receiver with up to seven science modules, will yield a similar data rate. To manage these large data volumes from the CCAT observatory, we are developing the CCAT Data Center at the University of Cologne.

In this presentation, I will discuss the limitations of our traditional in-house single-dish heterodyne data reduction pipelines, such as those used at SOFIA and the NANTEN2 telescope, and how these limitations hinder migration to a distributed, fully automated computational environment. I will also present our aproach for the CCAT Data Center to overcome these challenges. Specifically, we are transitioning to a Python-based pipeline optimized for distributed computing and HPC environments where we aim to to use existing solutions where possible. By employing a central database to track data from planning through observation, data transfer, reduction, and analysis, and by using a workflow management system to orchestrate the data reduction process, we aim to minimize manual interaction and increase efficiency.
However, implementing these solutions is not without challenges. One significant
challenge is that existing solutions from other groups often meet 90% of our needs on
paper, but the specifics of our data formats and processing requirements often prevent
easy integration or native use. My hope is that by sharing our experiences, we can
foster discussions with other groups to make our solutions more general and to learn
from our respective experiences.

Roadblocks in Astronomical Data Analysis

16:45

An AI-driven system for enhancing Astronomical Research workflows

Karthik Mahesh Rathod

The rapid expansion of astronomical literature has created significant challenges for researchers aiming to increase the visibility and impact of their work. To address this challenge, we developed Stellar Forge, an AI-powered platform that optimizes the research publication workflow by integrating the NASA Astrophysics Data System (ADS) and the Unified Astronomy Thesaurus (UAT).

Stellar Forge provides several key services. First, it includes a concept tagger that uses a large language model with a retrieval pipeline built upon a novel hierarchical representation of UAT to automatically tag papers with relevant UAT concepts, improving discoverability across platforms and reducing human-induced errors. The tool also provides proper justifications, hierarchies, and concept positions within a branch, and it also suggests new concepts with branch positions if they emerge, ensuring that research remains accessible and up-to-date. The platform also includes a performance prediction tool that estimates the potential readership of papers within the ADS platform. This feature analyses abstract content and previous trends of similar publications to probabilistically forecast readership for the upcoming months, helping researchers anticipate the impact of their work. In addition, Stellar Forge supports content creation with title and abstract optimization tools that improve the visibility of research, allowing researchers to cater their writing for specific audiences and purposes with precise controls. These tools use task-specific fine-tuned large language models and dynamic one-shot prompting to craft clear, effective titles and abstracts.

Built on a modular architecture, Stellar Forge integrates with existing research systems to provide continuous support and improvements to researchers, from content creation to maximizing research visibility. By bridging AI with astronomical resources and researcher needs, Stellar Forge opens the door to more efficient and impactful advancements in astronomical research.

The Rise of AI for Science and Data Center Operations

17:15

Developing software for modern observatories brings its own challenges. Work on the design and architecture of the software may begin several years before the observatory or its instruments have been fully designed, and certainly long before they have been fully developed and commissioned.

Key parts of the observation control software may be expected to work reliably for the next 20 to 30 years or even longer.

How do experienced teams, who have worked with different telescopes and observatories, manage some of the issues that stem from the inherent unpredictability of technology, changing requirements, and the operational environment over such a long-time frame.

What lessons have been learnt from designing and building the software of the current generation of observatories, telescopes, and instruments and how are these being applied to the next generation of facilities currently in progress?

The list below reflects some of the possible shared challenges for discussion, it is expected that others may emerge during the other sessions:

Living with changing requirements - such as user needs, operational necessities, and the integration of modern technologies, while managing hardware and software obsolescence.

Evolving standards and regulations including a changing security threat landscape.

Long term maintenance and support – best practices for knowledge transfer and maintaining a healthy codebase.

Designing for the unknown – building software for a massively complex project that is itself evolving and may change radically between conceptualization and when it becomes operational.

External economic, financial, political, and environmental factors.

Proposal and Observation Preparation Tools

Meeting Room 103

Software doesn’t write itself: Prioritising Equity, Diversity, & Belonging to improve software output

You might not think that discussions around diversity and inclusion are relevant to you. You might not feel that you are in a privileged position. Yet if you are white or male, or both, you have probably had more or better opportunities than many others. What we are talking about here is not just socioeconomics, but the invisible and institutional systems that extend advantages to some groups over others. This BoF will explore different perspectives of these privileges, with a particular focus on research and software. The fundamental question we aim to address is: "how do you make sure that you and everyone in your team has equal and fair access to opportunities?" Come along to a safe, non-judgemental discussion around how we build a community where ensuring everyone has the opportunity to contribute their creativity leads to more fulfilling careers, excellence in engineering, and better research outcomes.

Meeting Room 102

Strategies for heterogeneous processing and archiving.

By pushing the boundaries of discovery, several modern astronomical observatories are daunted with the challenge of processing and storing up to many petabytes of data per year. To tackle the challenges of processing and storing those massive amounts of data, organisations are increasingly adopting heterogeneous and distributed infrastructures for computing and archiving to leverage the opportunities for scalability and collaboration. Typically, such architectures can be built based on custom-built systems, national infrastructures, up to and including commercial clouds.

In this session, representatives from various facilities will share their experiences, plans and strategies for managing those complex processing and archiving systems. After a brief introduction, we will transition into a moderated discussion, encouraging participants to share their insights, ask questions, and engage in a collaborative exchange of ideas and solutions.

The goal is to foster a productive dialogue that can help the community better understand and address the unique challenges of managing heterogeneous processing systems and archives in the context of modern astronomical research.

Possible topics of the discussions could be
* Architectural design and implementation of distributed computing systems for data processing and analysis
* Effective strategies for integrating and orchestrating heterogeneous resources
* Best practices in data management and archiving approaches
* Emerging technologies and trends that may shape the future of heterogeneous computing in astronomy
* Operational concerns, like scheduling and provisioning

Roadblocks in Astronomical Data Analysis

Meeting Room 101

08:50

08:50

10min

Morning Announcements

Aula Magna

09:00

Beyond the Data: challenges and triumphs in data reduction and analysis

While astronomy data reduction and analysis software faces significant challenges, it has also seen major successes in standardisation, automation and collaboration, ensuring that data are processed efficiently and in ways that are accessible to ever larger fractions of the community.
The future of this field lies in ever greater automation, the incorporation of machine learning techniques for near-real-time data analysis, and more seamless integration of heterogeneous datasets. As the volume of data continues to grow, our challenge is to ensure that pipelines remain scalable, robust, and flexible enough to handle both routine and unusual datasets. Of course advances in algorithms and technology are only part of the solution: collaboration across disciplines - software engineering, astronomy, data engineering and computer science - is key to the success of this field.
In this presentation I will highlight some of the successes, and a few failures, of this field and explore how as a community we are preparing to tackle the challenges of the next generation of projects.

Roadblocks in Astronomical Data Analysis

09:30

The time-series visualization tool in ESASky

The ESASky web interface primary goal is to enable users to access European Space Agency's (ESA) space astronomy mission data. In addition to the searching and downloading functionalities, the ESASky user community has expressed an eagerness for visualization tools that aid in the data inspection process. They are meant to help scientists assess whether datasets are ultimately useful to their science case (e.g. source variability, exoplanet transit, transient events) to then download and analyse them with specific research domain tools.
To that end, we have developed a photometry and spectroscopy time series data viewer API within ESASky. The ESASky time-series viewer scientific use case is the capacity to, from a tabulated catalogue or spectra/imaging view within ESASky, aggregate multi-mission or multi-object time series data and interact with them for detailed inspection. In this talk, I will tease the ESASky time-series viewer and present the challenges that such tool entailed, as well as those that its future development, scaling and exploitation will likely bring.

Roadblocks in Astronomical Data Analysis

09:45

Lowering in-memory footprint of antenna beams via polynomial approximation

With the emergence of new radio telescopes promising larger fields-of-view at lower observation frequencies (e.g., SKA), addressing direction-dependent effects (DDE) (e.g., direction-specific beam responses; sector-based ionosphere corruptions) has become all the more important. Be it through A-projection or major-cycle calibration strategies, addressing DDE often requires reliable representations of antenna/station beams; yet, these require significant amounts of computational memory as they are baseline-, frequency-, time- and polarisation-dependent. A novel prototype is reported here to approximate antenna beams suitable to SKA-MID using Zernike polynomials. It is shown that, beam kernels can be approximated up to 3 lobes with sufficiently few
coefficients, thereby replacing the memory-intensive sampled beams. It is hoped that these results facilitate more efficient beam-dependent solutions and approaches to tackling polarisation leakage; all of which are essential for large-scale radio telescopes.

Roadblocks in Astronomical Data Analysis

10:00

10:00

60min

Coffee Break

Aula Magna

11:00

Declarative Data Management with DaCHS and the VO

Markus Demleitner

Publishing data seems easy: Put it on a web page, obtain a DOI, and
you are done. In practice, this kind of “dead“ data generally is hard
to find, access and hence to reuse, not to mention interoperability.
Hence, the Virtual Observatory defines “active” interfaces to the data:
standard protocols enable uniform querying and access, rich metadata in
standard formats on standard interfaces ensure discoverability. This
means that data publishers need to run non-trivial software. Software,
however, has a fairly short half-life, in particular because of changing
platforms, but also because the standards occasionally evolve.

In this talk, I discuss how the DaCHS data publication package tries to
mitigate this specific sort of bitrot, first and formost by introducing a
declarative layer (“state the problem, not the solution“) in data
publishing from ingestion to service operation to registration. I will
show some examples for how this has enabled us and others to run
data centres over many years with low to moderate effort, while staying
up to date with the evolving VO. I will also delineate where no
suitable declarative approaches have been found and what that meant
during major platform changes like the move from Python 2 to Python 3.

Extending the Life of Software and Data

11:30

Longevity of a treasured database service

The NASA/IPAC Extragalactic Database (NED) is a comprehensive database of multiwavelength data for extragalactic objects, providing a systematic, ongoing fusion of information integrated from hundreds of large sky surveys and tens of thousands of research publications. The contents and services span the entire observed spectrum from gamma rays through radio frequencies. NED has been serving the public for 34 years. Over this period, database and software technologies have advanced by leaps and bounds. To keep providing the invaluable services the community has come to rely on, NED team continues to integrate the science data as they were published, serve the data to community via user interface and APIs, while expand and upgrade the software and database systems. This talk will give an overview of what we have done and what we are doing now to ensure the longevity of this treasured system, both database and software.

Extending the Life of Software and Data

11:45

Curating a 20th century observation log in the 21st century

Sébastien Derriere

The International Ultraviolet Explorer (IUE) was a space mission which operated between 1978 and 1996. The final merged log of IUE observations, published in 2000, contains a vast collection of 110033 spectra, which are still of scientific value decades later.

A special operation was performed in 2000 at the Strasbourg astronomical data center (CDS) to provide links between astronomical objects from the SIMBAD database and the IUE Newly Extracted Spectra hosted at VILSPA by ESA. This resulted in 65872 spectra being linked from 7392 distinct SIMBAD objects.

Despite the considerable growth of the number of objects referenced in SIMBAD between 2000 and 2024 (from less than 3 to more than 18 millions), the links to the IUE archive remained unchanged. We therefore decided to attempt a significant update, trying to provide as many links as possible between SIMBAD objects and the IUE spectra.

We describe in this paper the challenges in trying to improve the discoverability of archival data several decades after the end of the mission. The most time-consuming part is the recovery of information hindered by the use of improper object identifiers, concealed implicit information, and human errors (typos).

Ultimately, we were able to link 99.7\% of the eligible IUE spectra to SIMBAD objects (calibration spectra, and spectra of solar-system objects are not relevant for SIMBAD), compared to only 67.7\% in the 2000 operation. We also generated the space-time coverage of the corresponding spectra using the ST-MOC IVOA standard.
Our work strongly advocates for the Best Practices for Data Publication in the Astronomical Literature (Chen et al., 2022), and also stresses the needs for extra care in the observer's proposal tools and metadata management, in order to facilitate the long-term use and optimal scientific return of astronomical observations.

Extending the Life of Software and Data

12:00

Synergies Unleashed: Bridging the Gap Between Science and Computing teams in the ALMA Observatory software deployments

The ALMA Observatory has been collecting science data for more than 10 years. During the first years of operations (up to the end of Cycle 4, i.e., September 2017) the focus of the ALMA Integrated Computing Team (ICT) and the Department of Science Operation (DSO) was mainly on the data acquisition part (using what is known in the ALMA jargon online software).
Thanks to the stability of the online software, during Cycle 5 (October 2017 – September 2018), the Observatory reached a very high data acquisition performance, but with a negative impact on Data Processing (DP) and Data Delivery (DD), one of the reasons being that the software downstream data acquisition (known as offline software) was not mature enough to cope with such a large amount of incoming data.
One of the major contributors to the immaturity of the offline software was that ALMA underestimated the importance of an integration procedure. The applications were working as expected individually, but not as part of an entity which contains software components with interdependencies.
In the last years, significant resources have been allocated to consolidate the performance of the offline software. The situation was reversed, thanks to an efficient, coordinated, and collaborative plan between ICT (especially the EU part) and DSO. The outcome of this effort is a suite of regression and integration tests for the ALMA offline deployment process. This paper (a) describes in more detail the complete history behind this effort, (b) show the current regression and integration tests in place for the most important ALMA scenarios and (c) presents the cutting-edge technology behind the automation approach. We also discuss how this innovative approach in looking at science operations from the software perspective, with an enhanced and coordinated collaboration between the two mentioned teams (ICT and DSO), can become a game-changer in the improvement of an Observatory's performance. The statistics collected demonstrated that the offline software has become much more robust, as both the occurrence of bugs and the need for patches have significantly diminished during the past four cycles.

Extending the Life of Software and Data

12:15

The Chandra Data System at 25 years — What can it teach us?

The Chandra X-ray Center Data System (CXCDS) software provides the end-to-end software support for Chandra mission operations. The CXCDS Software Team develops, maintains, and supports the software that drive the CXC-based forward (proposer to spacecraft) and return (spacecraft to observer) threads necessary to perform the Chandra observing program. The Data System also includes the CIAO data analysis package and the Chandra Source Catalog (CSC) processing system that recently completed CSC 2.1. The software system consists of ~2 million logical lines of code, including C/C++, Python, SQL, Java, Perl, and a few stray algorithms written in Fortran. The Chandra Data Archive manages the operational threads and serves all data of the system. The data products are written in FITS format and are OGIP and IVOA compliant.
For twenty five years the CXCDS has served Chandra science operations and the user community well. The up-front planning and detailed design, conceived in the mid-1990’s and implemented for the operational system by launch in 1999, has paid off. The Software Team has managed operational changes to algorithms and processing, operating systems, compilers, scripting languages, and most recently a configuration management system migration to Git. All of these upgrades and many more have been possible given the structured architectural design that enabled, among other things, modularity and flexibility to manage change.
In this talk, I’ll provide insight on the longevity of the Chandra Data system, our ability to introduce change successfully, and what a new project can take away from our experience going forward.

Extending the Life of Software and Data

12:30

12:30

90min

Lunch

Aula Magna

14:00

SDHDF: A new file format for spectral-domain radio astronomy data

Lawrence Toomey

Radio astronomy file formats are now required to store wide frequency bandwidths and multiple simultaneous receiver beams and must be able to account for versatile observing modes and numerous calibration strategies. The need to capture and archive high-time and high frequency-resolution data, along with the comprehensive metadata that fully describe the data, implies that a new data format and new processing software are required. This requirement is suited to a well-defined, hierarchically-structured and flexible file format. In this paper we present the Spectral-Domain Hierarchical Data Format (‘SDHDF’) — a new file format for radio astronomy data, in particular for single dish or beam-formed data streams. Since 2018, SDHDF has been the primary format for data products from the spectral-line and continuum observing modes at Murriyang, the CSIRO Parkes 64-metre radio telescope, and we demonstrate that this data format can also be used to store observations of pulsars and fast radio bursts.

Metadata, Semantics and Data Models Applicable to Data Formats

14:30

Using Felis to Represent the Semantics and Metadata of Astronomical Data Catalogs

Jeremy McCormick

Data catalogs are a fundamental part of modern astronomical research, allowing scientists to view, search, and filter data according to their requirements. Tabular data models described by SQL Data Definition Language (DDL) are a common way to represent such catalogs. However, DDL does not provide a way to describe the semantics of the data, such as the meaning of a data column, units of measurement, or the relationships between columns. The International Virtual Observatory Alliance (IVOA) has developed several standards in this area, including VOTable and Table Access Protocol (TAP), which are widely used within astronomy for representing such information.

The Data Engineering group of the Vera C. Rubin Observatory has developed a data description language and toolset, Felis, for defining the semantics of its Science Data Model schemas, which represent its public-facing data catalogs. Felis uses a rich Pydantic data model for describing and validating catalog metadata, represented as a human-readable and -editable YAML format. Felis provides a Python library and application for working with these data models. The metadata is used to populate the TAP_SCHEMA tables for the IVOA TAP services that power the table UI of the Rubin Science Platform (RSP). The toolset is also being used to assist in data migrations and will be utilized in testing the conformance of LSST data products to the data model. Felis's current capabilities will be discussed, as well as recent developments and future plans.

Metadata, Semantics and Data Models Applicable to Data Formats

14:45

A data model to connect the ESO Data Processing System (EDPS) to ELT data archives

Hugo Buddelmeijer

C͢o͢n͢t͢e͢x͢t͢
The METIS and MICADO instruments for ESO's Extremely Large Telescope passed their Final Design Review mid 2024. That heralds an era of new data flow challenges for optical to mid-infrared astronomy. Reasons: (1) the instrumental complexity of the Adaptive Optics assisted ELT and its PSF Reconstruction, (2) nightly data rates up to several Terabytes, already during the laboratory testing phase. To address the laboratory testing phase challenge for METIS we will build an METIS AIT (Assembly Integration Testing) Archive database. The database contains the metadata of data items plus a pointer to the bulk FITS files. An interface between this database and the ESO Data Processing System would allow it to initiate processing laboratory data by querying the database. A similar approach could be adopted for MICADO.

T͢a͢l͢k͢
In this talk we present a prototype AIT Archive populated with simulated data. The archive is based on a data model. This model contains formalised descriptions of data items, of data processing recipes and pipelines (see [Buddelmeijer et al 2020]). The database tables are generated automatically from this data model.

We specify an interface between the ESO Data Processing System (EDPS) and the database. The interface is currently one-way: processed data can be automatically ingested into the database. We outline the plan to make the interface two-way such that the EDPS can also automatically retrieve necessary data.

Similarly, we specify an interface between the EDPS and the simulator. This ensures 𝗮𝗻 𝗲𝗻𝗱-𝘂𝘀𝗲𝗿 𝗼𝗻 𝘁𝗵𝗲 𝗘𝗗𝗣𝗦 𝗰𝗮𝗻 𝗿𝗲𝘀𝘁𝗿𝗶𝗰𝘁 𝗶𝘁𝘀𝗲𝗹𝗳 𝘁𝗼 𝘀𝗽𝗲𝗰𝗶𝗳𝘆𝗶𝗻𝗴 𝘁𝗵𝗲 𝗱𝗲𝘀𝗶𝗿𝗲𝗱 𝗲𝗻𝗱 𝗱𝗮𝘁𝗮 𝗽𝗿𝗼𝗱𝘂𝗰𝘁 𝗮𝗻𝗱 𝗹𝗲𝗮𝘃𝗲 𝗶𝘁 𝘁𝗼 𝘁𝗵𝗲 𝘀𝘆𝘀𝘁𝗲𝗺 𝘁𝗼 𝗱𝗲𝗰𝗶𝗱𝗲 𝗶𝘁𝘀𝗲𝗹𝗳 𝗵𝗼𝘄 𝘁𝗼 𝗮𝗰𝗾𝘂𝗶𝗿𝗲 𝘁𝗵𝗲 𝗻𝗲𝗰𝗲𝘀𝘀𝗮𝗿𝘆 𝗶𝗻𝗽𝘂𝘁 𝗱𝗮𝘁𝗮: 𝗿𝗲𝗮𝗱 𝗳𝗿𝗼𝗺 𝗹𝗼𝗰𝗮𝗹 𝗱𝗶𝘀𝗸, 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗲𝗱 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲, 𝗼𝗿 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗲𝗱 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝘁𝗵𝗲 𝘀𝗶𝗺𝘂𝗹𝗮𝘁𝗼𝗿.

We outline the plan for these two interfaces, present the progress towards achieving it, and discuss how they will allow this AIT setup to evolve into a platform to support ELT data flow during ELT science operations (i.e., a “ELT Research Data Platform”).

Metadata, Semantics and Data Models Applicable to Data Formats

15:00

15:00

60min

Coffee Break

Aula Magna

15:15

XRADIO: Xarray Radio Astronomy Data Input Output

Jan-Willem Steeb

The advent of next-generation radio interferometers—ALMA-WSU (Atacama Large Millimeter Array Wide Band Sensitivity Upgrade), ngVLA (Next Generation Very Large Array), and SKA (Square Kilometre Array)—will increase astronomical data volumes by orders of magnitude. Current pipelines for ALMA and the VLA rely on CASA (Common Astronomy Software Applications) and store data in MSv2 (Measurement Sets v2). This approach, utilizing considerable custom software, faces maintenance challenges and scaling limitations. To address these issues, we present a new data schema, MSv4 (Measurement Set v4), implemented in the open-source Python package XRADIO (Xarray Radio Astronomy Data IO). This initiative represents a collaborative effort between the National Radio Astronomy Observatory (NRAO), European Southern Observatory (ESO), National Astronomical Observatory of Japan (NAOJ), and Square Kilometre Array Observatory (SKAO), combining expertise from leading astronomical institutions.

The MSv4 contains data for a single spectral window, polarization setup, and observation setup within a fully self-describing structure, allowing for finer partitioning as needed. Collections of MSv4, termed PS (Processing Set), facilitate deployment across distributed computing environments. Departing from MSv2's relational tables, MSv4 employs labeled n-dimensional arrays.

XRADIO leverages off-the-shelf technology to ensure scalability and maintainability. It relies on Zarr for efficient storage and serialization, while Xarray provides in-memory data representation as NumPy arrays or lazy Dask arrays, complete with dimensions, coordinates, and attribute labels.

This focus demo, presented through interactive Jupyter Notebooks, will:
- Explore the PS and MSv4 schemas
- Demonstrate how to easily convert legacy MSv2 datasets to a PS
- Demonstrate efficient data selection techniques
- Showcase data visualization methods
- Illustrate parallel processing capabilities

By adopting these modern tools and approaches, we aim to equip the radio astronomy community with a robust framework capable of handling the data challenges of the next generation of interferometers.

Metadata, Semantics and Data Models Applicable to Data Formats

Meeting Room 101

16:00

DASCH: Bringing 100+ Years of Photographic Data into the 21st Century and Beyond

Peter K. G. Williams

The Harvard College Observatory was the preeminent astronomical data center of the early 20th century: it gathered and archived an enormous collection of glass photographic plates that became, and remains, the largest in the world. For nearly twenty years DASCH (Digital Access to a Sky Century @ Harvard) has been actively digitizing this library using a one-of-a-kind plate scanner. Earlier this year, after 470,000 scans, the DASCH project finished. Now, this unique analog dataset can be integrated into 21st-century, digital analyses. The key DASCH data products include ~350 TB of plate images, ~50 TB of calibrated lightcurves, and a variety of supporting metadata and calibration outputs. Virtually every part of the sky is covered by thousands of DASCH images with a time baseline spanning more than 100 years; most stars brighter than B ~ 15 have hundreds or thousands of detections. I will present the DASCH data release and discuss some of the lessons learned while trying to make data from the previous century accessible in the next century and beyond.

Extending the Life of Software and Data

16:15

Optimized Open-Source Tools for Scalable Solar System Science

Asteroid Institute is developing a suite of open-source tools tailored for solar system science, with a focus on precision, accuracy, and scalability. Key functionalities include orbit propagation, ephemeris generation, orbit fitting, residual calculation, coordinate transformations, arc extension, Monte Carlo simulations, and impact analysis. The libraries are all rigorously unit-tested and benchmarked as well as compatible with existing Python tools like astropy and pandas.

Our science workloads run compute-intensive tasks on billions of data points across thousands of VMs with a low memory footprint. We outline the design decisions of our libraries that significantly reduce memory consumption while enabling seamless multiprocessing. Additionally, we highlight the advantages of our type-safe dataframes and integer-based datetimes over traditional approaches. We also share our Python data packages for convenient access to SPICE kernels and more.

Roadblocks in Astronomical Data Analysis

16:30

A Reproducible Science Workflow System: DALiuGE in Action

Andreas Wicenec

The DALiuGE science workflow system has been introduced to the ADASS audience in 2022 and since then it has evolved into a sophisticated tool allowing the construction, scheduling and execution of arbitrarily complex workflows on single machines and clusters with thousands of compute nodes. Almost any software package exposing a Python binding can be automatically introspected, including the in-line documentation and argument types. The extracted individual components, classes, methods and functions, can then be used to construct workflows in a graphical editor. Unlike most other workflow systems, in DALiuGE application and data components are represented as nodes on a workflow graph. Fundamentally, this concept enables the extreme scalability as well as the separation of I/O from the algorithms. Data components can reside in memory, even across a compute cluster. Application components can be as complex as full MPI applications or as small as a single line function call. Along the whole workflow design, scheduling and execution chain, DALiuGE is recording hash codes of components and data artefacts into a Merkle tree and enables complex comparisons of the equivalence of graphs, software components, data artefacts and complete execution runs. Workflows and component descriptions are stored in user configurable GitHub or GitLab repositories and are thus fully version controlled and can be shared with collaborators or the world. DALiuGE also supports workflows, containing sub-workflows. These sub-workflows can be scheduled and executed at run-time, either on the same platform as the main workflow or somewhere else. When using existing software packages, users don't need to write any at all and can fully concentrate on the workflow design. The parameterisation of existing, established graphs to run on different datasets or re-run with slightly changed configuration of the individual components has been streamlined into a single table interface for entire graphs, exposing pre-selected so-called graph configuration parameters.

Roadblocks in Astronomical Data Analysis

16:45

Bridging the Gap: Enhancing Astronomical Data Analysis with Software Engineering Best Practices

Shvetha Chynoweth

Extracting meaningful information from large amounts of astronomical data requires not only a clear understanding of how it was collected and the physics involved, but also a disciplined approach to the analysis. As an example, in my current research I work with EPRV (Extreme Precision Radial Velocity) data, which - like other areas of astronomy - requires disentangling instrumental and physical effects. Unfortunately, this process is often hindered by the absence of best practices which makes subsequent analyses and models inherit these complications, compromising their accuracy. As a part of my research, I apply my background in software engineering by building a standard framework for processing data that can be reused across research projects. In this talk, I will discuss recommendations for bridging the gap between software engineering and astronomical data analysis, how I’m applying these practices to my research, and methods for integrating these enhancements in other works.

Roadblocks in Astronomical Data Analysis

18:00

18:00

270min

Conference Dinner with Mdina Walking Tour

Aula Magna

08:50

08:50

10min

Morning Announcements

Aula Magna

09:00

From Daniel Dennett to Transformers: The Computational Evolution of Human Intelligence in AI

In this talk we explore the philosophical and technological advancements that shape our understanding of artificial intelligence. The discussion begins with an examination of the late philosopher Daniel Dennett's views, particularly his assertion that human intelligence is Turing computable and can be replicated through computational procedures. Dennett's perspective finds potential vindication in the capabilities of large language models (LLMs), such as ChatGPT and Gemini, which exhibit 'emergent properties' — complex behaviors arising from simpler underlying processes. While the mathematical foundations of these models are well-understood, the sheer scale, involving trillions of parameters, challenges our ability to predict or even explain their behaviors.

We also consider contrasting views from other philosophers, notably David Chalmers, who offers alternative insights into the nature of intelligence and consciousness. The talk culminates with a brief discussion on the applications of Transformer models in fields beyond traditional AI, such as cosmology and astronomy. These models, through their sophisticated use of the attention mechanism and deep architectures, open new avenues for understanding and exploring the universe. This talk aims to bridge philosophical theories and cutting-edge AI technologies, illustrating the computational evolution of human-like intelligence in machines.

The Rise of AI for Science and Data Center Operations

09:30

AI Agents for Ground-Based Gamma Astronomy

Dmitriy Kostunin

The Cherenkov Telescope Array Observatory (CTAO) represents the next generation in ground-based gamma astronomy, marked by a substantial increase in complexity with dozens of telescopes. This leap in scale introduces significant challenges in managing system operations and offline data analysis. Traditional methods, which depend on advanced personnel training and sophisticated software, become increasingly strained as the system's complexity grows, making it more challenging to effectively support users in such a multifaceted environment.

To address these challenges, we propose the development of AI agents based on instruction-finetuned large language models (LLMs). These agents align with specific documentation and codebases, understand the environmental context, operate with external APIs, and communicate with humans in natural language. Leveraging the advanced capabilities of modern LLMs, which can process and retain vast amounts of information, these AI agents offer a transformative approach to system management and data analysis by automating complex tasks and providing intelligent assistance.

We present prototypes aimed at integrating with CTAO pipelines for operation and offline data analysis. The first prototype is a plugin that implements chatting and function calling for the Configuration Database of Array Configuration and Data Acquisition (ACADA). This plugin enhances operational efficiency by providing intuitive, context-aware assistance and automating routine tasks. The second prototype is an open-access custom ChatGPT tailored for the gammapy-based data analysis, which offers interactive support for researchers.

The Rise of AI for Science and Data Center Operations

09:45

Goal-Oriented Stacking: an novel approach to statistical image-domain inference below the noise threshold

A commonly used approach to explore astrophysical sources below the detection threshold is image-domain stacking or co-adding. This uses known positions of a source population sample identified at one observing wavelength to make statistical measurements of the sample at a different wavelength, where the images are not sufficiently deep for direct detections of the individual objects. These samples are typically selected through intrinsic or observed properties such as stellar mass or optical colours in an attempt to limit biases, maximise completeness, or separate out sub-populations of interest. We explore the utility of an alternative approach by designing an algorithm (using a non-linear neutral controller) to select subsets of the input parent sample of galaxies based on what we refer to as “goal-orientated stacking” objectives. In this case, we set the goal as identifying a subset of galaxies with physically correlated properties (e.g. stellar mass, redshift, star formation rate) that maximise the radio continuum signal-to-noise level. We explore a few applications of this alternative approach and discuss possible extensions.

The Rise of AI for Science and Data Center Operations

10:00

10:00

60min

Coffee Break

Aula Magna

10:15

JupyterLab extension: FireFly

Data visualization plays a crucial role in the everyday work of data scientists and ML engineers. It is important to present data in different visual forms at all stages of the ML process: when data scientists prepare the training data set, inspect how well the model converges during the training, and then when it is time to validate the trained model and analyze the results of inference. Existing data visualization tools available in JupyterLab are mostly limited to a static data representation and do not provide enough interactivity.

In this demo, we will look at how FireFly integrates with JupyterLab to enhance your AI/ML and astronomical data science processes by providing interactive visualization components integrated into the JupyterLab notebook. We will see how FireFly visualizations can be helpful for preliminary data inspection and cleaning, exploratory data analysis, feature engineering, and model inference results analysis. The demo will showcase JupyterLab FireFly extensions for displaying tabular data, FITS images, plotting charts, visualizing HiPS maps, or overlaying data on the reference images.

The Rise of AI for Science and Data Center Operations

Meeting Room 101

11:00

Self-supervised learning of radio data for source detection, classification and peculiar object searches

New advancements in radio data post-processing are underway within the SKA precursor community, aiming to facilitate the extraction of scientific results from survey images through a semi-automated approach. Several of these developments leverage deep learning (DL) methodologies for diverse tasks, including source detection, object or morphology classification, and anomaly detection. Despite substantial progress, the full potential of these methods often remains untapped due to challenges associated with training large supervised models, particularly in the presence of small and class-unbalanced labeled datasets.
Self-supervised learning has recently established itself as a powerful methodology to deal with some of the aforementioned challenges, by directly learning a lower-dimensional representation from large samples of unlabeled data. The resulting model and data representation can then be used for data inspection and various downstream tasks if a small subset of labeled data is available.
In this study, we explored contrastive learning methods to learn suitable radio data representation from unlabeled images taken from the ASKAP EMU and MeerKAT GPS surveys. We evaluated trained models and the obtained data representation over smaller labeled datasets, also taken from different radio surveys, in selected analysis tasks: source detection and classification, and search for objects with peculiar morphologies.

The Rise of AI for Science and Data Center Operations

11:15

Transforming Data into Insights: AI-Driven X-Ray Source Classification within the NADC Framework

The advent of AI has revolutionized the field of astronomy, particularly in the realm of time-domain astronomy. This talk focuses on the application of AI within the framework of the National Astronomical Data Center of China (NADC), which encompasses its data infrastructure and science platform. The NADC framework plays a pivotal role in converting raw astronomical data into valuable scientific insights. The Einstein Probe (EP) serves as a case study, exemplifying the integration of AI with the NADC framework to enhance the discovery and analysis of transients and variable sources. The Time Domain Information Center (TDIC) science platform within the NADC facilitates the application of AI for science and enabling the efficient handling and interpretation of vast datasets generated by astronomical satellites like the EP.

The core of this talk focuses on the development and implementation of a classification algorithm within the NADC framework. The algorithm, a Random Forest classifier, leverages features extracted from light curves, energy spectra, and spatial information to autonomously classify observed X-ray sources. Demonstrating remarkable accuracy rates of approximately 95% on EP simulation data and an impressive 98% on observational data from the EP pathfinder Lobster Eye Imager for Astronomy (LEIA). The integration of this AI classifier into the data processing pipeline not only accelerates the manual validation process but also serves as a testament to the NADC's commitment to advancing scientific research through technological innovation. The talk concludes with an exploration of the implications of the most effective features for X-ray source classification and the broader application of these AI techniques to other X-ray telescope data, thereby setting the stage for future advancements in time-domain astronomy. By showcasing the successful application of AI within the NADC framework, this talk aims to inspire further integration of technologies in astronomical research, paving the way for new discoveries and a deeper understanding of the universe.

The Rise of AI for Science and Data Center Operations

11:30

Machine Learning Enhancements for Real-Time Scientific Analysis of Cherenkov Telescope Data

The Cherenkov Telescope Array Observatory (CTAO) will provide incredible opportunities for the future of ground-based very-high-energy gamma-ray astronomy. To optimise its scientific output, the CTAO will have a Science Alert Generation (SAG) system, which as part of the Array Control and Acquisition (ACADA) system will perform reconstruction, data quality monitoring and scientific analysis in real-time to detect and issue candidate science alerts. As part of the continuous research and development activity for improvements of future versions of the ACADA/SAG product, this work aims at implementing machine learning enhancements for the scientific analysis. In real-time technical and observational variability, as well as performance requirements, can highly impact the overall sensitivity of the automated pipelines. We developed two prototypes of Convolutional Neural Network based models with the aim of removing any a priori knowledge requirements that standard scientific tools have. The first model is an autoencoder trained to remove background noise from counts maps of a given observation, without requiring inputs on target position, background templates or instrument response functions (IRFs). The second model is a 2-dimensional regressor that extracts hotspots for the localisation of candidate sources in the field of view, without requiring inputs on the background template or IRFs. To verify both models we use the current version of ACADA/SAG (rel1), finding that they achieve comparable results with the additional benefit of not requiring a priori knowledge.

The Rise of AI for Science and Data Center Operations

11:45

Classification of HI Galaxy Profiles Using Unsupervised Learning and Convolutional Neural Networks: A Comparative Analysis and Methodological Cases of Studies

Gabriel Andres Jaimes Illanes

Hydrogen is the most abundant element in the universe, making it essential to the formation and evolution of galaxies. The 21 cm radio wavelength neutral atomic hydrogen (HI) line maps the distribution and dynamics of gas within galaxies. The emission from this spectral line is an important tracer for galaxy interaction studies and understanding galactic structure, star formation processes and general behavior of the Interstellar Medium. The application of Machine Learning (ML) and BigData tools algorithms are assets to tackle the enhancement of the quality and efficiency of scientific analysis in this field, especially when it involves large radio astronomy databases and the study of spectrum classification.

Within this context, our work aims to propose a framework for the classification of HI spectral profiles using ML techniques. Several methodologies integrating unsupervised ML techniques and Convolutional Neural Networks (CNN) have been implemented. To carry out this approach, we have focused on HI datasets used in the AMIGA (Analysis of the interstellar Medium in Isolated GAlaxies) research group with a sample of 318 CIG (Catalog of Isolated Galaxies) spectral profiles and 30780 profiles from the ALFALFA (Arecibo Legacy Fast ALFA) survey.

To design this classification framework the first step was data preprocessing, using the Busyfit package (Westmeier et al, 2014) for HI spectrum profile fitting. A second data set was generated using iterative fitting with polynomial, Gaussian, and double-Lorentzian models. This approach also involved a multi-faceted strategy for profile clustering based on temporal shapelet transformation for features detection algorithms: K-means, spectral clustering, DBSCAN, agglomerative clustering, among others, as bootstrap for the extraction of features. Furthermore, we considered a series of classification techniques that include K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Random Forest classifiers. In order to optimize the performance of such models, CNN model was probed, where we made an in-depth evaluation for various configurations of the model with regard to their impact on classification accuracy.

The second part of this work is focused on the generation of an additional dimension to the profiles in order to improve the classification. This 2D analysis is based on the application of CNN techniques to determine the degree of asymmetry by carrying out the classification of the sample of CIG galaxies. The original data was modified by adding a new dimension to the profiles in order to improve the classification. Three distinct 2D image models were generated for the symmetry study: the first is a rotation of the fitted spectrum, the second involves rotating the spectrum after subtracting its right and left profiles to accentuate asymmetry features, and the third is a normalized version of the previous image, with pixel intensity adjusted to further emphasize specific image features. We explain the methodology with current ML techniques and discuss the extrapolation to the ALFALFA survey. The resulting classification was compared with a profile classification previously made by the AMIGA scientific group (Espada, 2011).

The study presents the application of ML techniques for classifying HI profiles, including an approach to extract profile asymmetries classification with HI profiles transformation into 2D images, to improve the accuracy and depth of future analyses. With this, we also have the intention to build and verify a minimal methodology that could potentially be applied to the ongoing Square Kilometre Array (SKA) precursor surveys such as MeerKAT (MIGHTEE HI) or Apertif, where the number of detections will be higher, thus laying the foundation for building a full-scope methodology in the SKA era. All material, code and models have been produced following the FAIR principles and have been published in an open access public repository.

The Rise of AI for Science and Data Center Operations

12:00

Astronomy Data and Computing Services: Changing the way research software is developed, supported and maintained

Gregory Brian Poole

Astronomy Data and Computing Services (ADACS) was established in 2017 by the Australian Astronomy community - with the leadership of Astronomy Australia Ltd. (AAL) - to provide astronomy-focused training, support and expertise to maximise scientific return on Australia's investments in astronomical data & computing infrastructure. One of our flagship services is the ADACS Merit Allocation Program (MAP). This program provides an opportunity for any Australian astronomer to compete (in the same way they compete for computing or telescope resources) for the time of dedicated software professionals to deliver:

Bespoke training in skills related to software development, &/or
Dedicated software development or design effort for new or established codebases.

To date we have successfully delivered over 120 projects involving over 100 unique applicants across all areas of astronomy through this program. The ADACS MAP is solving problems for researchers that otherwise could not have been solved; accelerating old science and enabling new science. More broadly however, we are working to shift the culture of computing in our community; moving to a more collaborative model, where wider skill-sets of multifunctional teams can be exploited.

12:15

A Multi-Wavelength Data Viewer Realized through the Enhancement of hscMap

hscMap serves as the imaging data viewer for the HSC-SSP (Hyper Suprime-Cam Subaru Strategic Program). The SSP data is obtained using five broad-band filters and several narrow-band filters. In hscMap, users can assign any of these filters to the RGB channels to display pseudo-color images. This study aims to enhance hscMap to create a more flexible and multifunctional viewer.
To achieve this enhancement, two primary functionalities were added. Firstly, support for HiPS (Hierarchical Progressive Surveys) data in FITS (Flexible Image Transport System) format allows data obtained by instruments other than HSC to be easily managed through hscMap's simple interface. Secondly, the ability to freely increase or decrease the data to be displayed (in the case of HSC-SSP, data per band) facilitates experimentation under diverse conditions. Additionally, automating part of the color synthesis process makes it easier to generate beautiful images.
These enhancements enable hscMap to flexibly synthesize and display various wavelength data, contributing to efficiency improvements in astronomical research and data analysis. The new features provided by this study facilitate the visualization of multi-wavelength data, serving as a valuable tool for researchers to make new discoveries.

12:30

12:30

90min

Lunch

Aula Magna

14:00

How do you use yours? The Evolution of Proposal and Observing Preparation Tools.

In the past few decades the way in which astronomers use telescopes (or observatories) has changed dramatically. These changes have brought significant efficiency upgrades (science per hour) and have generally improved the professional lives of both users and observatory staff. In this review I will describe a personal view of this history highlighting some significant developments and their impacts along the way.

I will also briefly consider other, wider, impacts, and offer some thoughts on the future evolution of how we use our telescopes.

Proposal and Observation Preparation Tools

14:30

The proposal evaluation process: A unified user experience supporting different workflows.

At the beginning of 2020 ESO approved a project to create a new tool supporting its proposals evaluation processes, called P1Flow. The main idea was to upgrade a series of old and standalone tools that were used to handle ESO peer reviews, into one flexible interface that could offer a user-friendly experience to the reviewers and a complete overview of the process to the ESO operator.

Due to the pandemic and the impossibility to hold panel meetings in-person, the reference model of ESO peer review, i.e., face-to-face panel meetings with about 80 people invited every six months to work for some days altogether in the same location became unfeasible. The sudden need to hold online panel discussions put the project under significant time pressure to quickly provide basic interfaces so that the ESO peer review system could continue. This implied a staggered implementation and release of interfaces supporting different phases of the process. However, the old infrastructure had to continue working in parallel with the new system, to cover for the remaining phases of the review.

The first version of the tool was released in April 2021. The project will reach its completion by the end of 2024.

In the article we will describe the project, how we kept the legacy system and the new tools working together by migrating data from one domain to the other, the strategies we used to keep complexity low, how we introduced new workflows in the system (e.g. Distributed Peer Review), the current status of the project and its possible evolutions in the future.

The description of the technologies, the project plan, the team evolution with time according to the different needs will also be provided.

Proposal and Observation Preparation Tools

14:45

STARS: A scheduling software for Space Missions and Ground-Based Observatories

The efficient automatic planning and scheduling of astronomical observations from space and ground-based observatories has become essential for large astronomical surveys. It facilitates the coordination of multiple instruments and observatories located at different sites and enables a fast reaction to changes of the environmental conditions while maximizing the scientific return.

The main challenge for astronomical planning tools is to maximize the number of observed targets, taking into account available resources, time, and observation constraints. These constraints are mainly related to target visibility, observation cadence, the ephemerides of astronomical events, and the phase of the moon, among others. The resulting observational plans shall fulfill all the constraints and shall be optimal in terms of the telescope time used for science observations, that is to say, minimize the idle time of the instrument, and maximize the scientific return.

In this talk, we will present the STARS library (Scheduling Telescopes as Autonomous Robotic Systems), which provides the tools needed to generate optimal schedules for space and ground-based observatories. It includes the means to define the tasks to be planned, their constraints and the observational resources, the scheduling algorithms based on Artificial Intelligence techniques, and their optimization goals (figures of merit). We also provide a web-based graphical user interface to easily visualize and analyze the generated schedules. The future development plans include, among other things, the ability to generate observation schedules directly from the browser, thus providing a graphical interface throughout the scheduling process.

STARS is being successfully used in several ground and space-based observatories. The CARMENES instrument at the Calar Alto Observatory and the TJO robotic telescope at the Montsec Observatory are already in operation and using the STARS library. Additionally, the library will be employed for the operations of the CTA Observatory, the scheduler of the Ground Observations Program of the PLATO M3-ESA mission, and the planning tool for the ARIEL M4-ESA mission, all of which are currently in development.

Proposal and Observation Preparation Tools

15:00

15:00

60min

Coffee Break

Aula Magna

15:15

Exploring Space Weather connections with the ASPIS prototype archive

Advancing Space Weather Science and predictions requires the understanding of the linkage of the different physical processes that occur simultaneously or sequentially in many application domains (Sun, Interplanetary space, Earth’s environment, other planets, hazards). In order to meet the needs of the Space Weather community, the ASI SPace weather InfraStructure (ASPIS) prototype has been designed and set up by a partnership at the national level in Italy, with efforts from INAF, ASI, INGV, INFN, and seven Universities (Aquila, Calabria, Catania, Genova, Perugia, Rome Tor Vergata, Trento).
This infrastructure includes a database of many heterogeneous resources/products (calibrated data, derived and L3 data, models) pertaining to different domains, a user interface and a Python package that can be used to explore Space Weather phenomena and “chain” them into events that span from the Sun through the interplanetary space to the Earth’s atmosphere up to the ground (and other bodies in the solar system).
Connecting and exploring different kinds of data and model outputs related to different phenomena occurring in the Heliosphere and driven by solar activity is not trivial due to the different formats, scopes and resource collectors available. In this sense, the ASPIS prototype has homogenized data and metadata and provides a new way to explore and interact with Space Weather data.
The goal of this focus demo is to show the audience the various contents of ASPIS, guiding the attendees in exploring the resources and tools provided by the infrastructure.
The audience will be shown how to find, access and explore data and model resources in the Space Weather domains using a new infrastructure designed on purpose and the related tools.
Going from discovery through visualization and then access to the data, investigation of the Space Weather phenomena and events will be showcased using a graphical interface and Python scripting/coding through the dedicated ASPISpy package and general, interoperable tools.

Meeting Room 101

16:00

How did we build ours? A modern proposal tool for a modern telescope

The LOFAR (Low Frequency Array) telescope, one of the largest low frequency telescopes in the world, is currently undergoing a major upgrade dubbed LOFAR2. This includes new hardware capabilities and improved software to use it with.

However, the end-of-life status of the telescope’s old proposal tool, NorthStar, means a new one is needed for LOFAR2. Applying the lessons learned from NorthStar and other existing tools like HEDWIG, a conscious choice was made to develop the new proposal tool in a way that is also (relatively) new to the world of research software development. So that we can build it with emphasis on, and balance between, the aspects of user experience, software maintainability, and development speed.

This presentation will demonstrate how the above is achieved by means of:
* stakeholder involvement in short iterations
* design system implemented as an in-house component library
* code analysis and software monitoring tools

Last but not least, in order to showcase the resulting product, there will be a walkthrough of the proposal tool from the perspective of our users – both proposal writers and telescope operators.

Proposal and Observation Preparation Tools

16:15

Observation Scheduling Software Framework for Distributed Telescope Arrays in Time-Domain Surveys

Telescope arrays are increasingly valued for their higher resource utilization, broader survey areas, and more frequent space-time monitoring compared to single telescopes. This new observation mode poses a challenging demand for efficient coordination of distributed telescopes while coherently modeling abstract environmental constraints to achieve scientific goals.
We propose a multilevel scheduling model and a flexible software framework for distributed time-domain survey telescope array. This framework is constructed from both global and site levels, successfully solves the telescope array scheduling problem considering the projected volumes of constraints and objectives. A remarkable feature of the framework is its ability to achieve global control of generic large-scale surveys through multi-level scheduling, dynamically responding to unexpected interruptions with robustness and scalability. Also, a Python simulator is built to model telescope array observations, including the creation of scheduling blocks obtained from global scheduler, observation conditions, telescope equipment status, and observation fields, enabling the evaluation of scheduling algorithms under various settings.
Using China's Sitian project as an example, telescope array scheduling algorithms and time-domain survey evaluation metrics are designed and implemented within the proposed framework. We envision this prototype framework being used to develop automated scheduling schema that support multi-telescope, multi-site coordinated observations. By integrating novel artificial intelligence techniques and solvers, further performance optimizations can be easily supported.

Proposal and Observation Preparation Tools

16:30

Asteroid Discovery with THOR on the Noirlab Source Catalog: An Engineering Perspective

In this presentation, we discuss the deployment of the THOR asteroid discovery algorithm on the NOIRLab Source Catalog Data Release 2 (NSC DR2). We will focus on the technical challenges encountered and the software, architecture and infrastructure used to address them. We discuss the data extraction and filtering challenges associated with the NSC DR2 dataset. We cover memory management and handling the computational demands of THOR. We will also cover the execution strategy using Dagster for orchestration, Kubernetes spot instances, result validation, and our open-source tools: quivr, adam_core, and mpcq. The talk will conclude with future work implications, notably in relation to the Vera C. Rubin Observatory’s LSST.

16:45

Data processing and preservation for CTAO

We present the design and status of the data processing and data preservation software and infrastructure for CTAO, the next-generation very-high-energy (VHE) gamma-ray observatory under construction in Chile and the Canary Islands. The unprecedented size and complexity of CTAO and the need for open and reproducible data present new challenges for the storage and treatment of data products. We have developed open pipeline software to process petabytes of data generated by the observatory after on-site data volume reduction, including the treatment of tens of petabytes of simulated data required to model the instrumental response. This software produces reduced, standard data products usable by observers, in a format aimed to be compatible with other high-energy observatories. The data will be processed in parallel, data-driven workflows across six CTAO data centers (two on-site and four off-site in Europe). We will describe the big-data infrastructure we use for ensuring the preservation of the raw data and for executing the processing workflows. The technologies include Rucio, FTS, CVMFS, and significant contributions to expand the capabilities of the DIRAC middleware for complex workflows. We will also discuss our efforts to ensure both open software, standardized data models and formats, and FAIR data products. This includes also the ongoing efforts to create standard data models and formats for high-level data products that allow interoperability of CTAO data with existing X-ray, gamma-ray and neutrino observatories (i.e. GADF and VODF), as well as extending IVOA standards to meet the needs of this waveband for discoverability.

17:15

General-Purpose Spectroscopic Data Reduction and Analysis Tools

Motivated by recommendations from the Astro2020 Decadal Survey, developers across many software projects have met over the past year to discuss current and future needs for general-purpose spectroscopic data reduction and analysis tools. Specifically for data reduction software, the discussion has centered on the development of more general-purpose tools (e.g., the AstroPy-affiliated packages specutils and specreduce) to enable users to more easily build pipelines for unsupported and newly commissioned instruments. This is needed to address the fact that most available packages focus on specific instruments or surveys.

In this BoF, we will briefly review the recommendations from Astro2020, the discussions from follow-up meetings, and ongoing development efforts stemming from these discussions. We will then hold an open discussion to collect information about parallel efforts and potential new collaborations that can be formed to address specific, timely needs of the community.

Meeting Room 102

Usability and User Experience in astronomical Software

The most crucial requirement to scientific software is output correctness. Large amounts of development resources are necessary to achieve this goal, leading aspects like interface design, usability and user experience to be put in second place. Ironically, not paying attention to user-friendliness creates the monster that we swore to defeat: an error-prone system. A user that suffers from cognitive overload, confusion or being irritated due to bad software design is more likely to make careless mistakes or to misuse the system.
During this BoF session, we invite you to discuss the role of usability and user experience in astronomical software. We will point out aspects that affect the user-friendliness of a system in a positive or negative way, and review our current software portfolio regarding these aspects. Furthermore, we will speak about how to address usability and user experience in our development procedures to ensure future software pieces for astronomers to be user-friendly.

User Experience

Meeting Room 101

What New Data Formats is the Community Using? What New Data Models does the Community Need?

FITS is in its 5th decade, and ADASS has traditionally had a FITS BoF. That is not this BoF. This BoF aims to find out what other formats are out there: the good, the bad, and the ugly. Whether it is VOTable encoded in to Parquet or simulations written to HDF5, this BoF aims to discover what formats and models the community is using, thereby fostering collaborations and driving interoperability.

Metadata, Semantics and Data Models Applicable to Data Formats

Meeting Room 103

08:50

08:50

10min

Morning Announcements

Aula Magna

09:00

The MeerKAT Science Data Processing Pipeline

Ludwig Schwardt

The MeerKAT radio telescope has been producing high-quality scientific data for more than six years. Its Science Data Processor (SDP) subsystem produces signal displays, calibration products, continuum images and spectral cubes in a fully automated and pipelined fashion from the outputs of the correlator and tied-array beamformer. I review the major parts of the MeerKAT SDP pipeline with a focus on the software architecture, design choices and lessons learnt.

Real-time and Near Real-time Processing Pipelines

09:30

NEOCC’s Aegis pipeline in asteroid orbit determination and impact monitoring.

Francesco Gianotto

The Near-Earth Objects Coordination Centre (NEOCC) is the main component of the Planetary Defence Office (PDO) within ESA's Space Safety Programme (S2P). Its mission is to support and coordinate the observations of small Solar System bodies and assess and track the threats they may pose to Earth. Central to this mission is Aegis (1), an automated orbit determination and impact monitoring system developed by SpaceDyS s.r.l. under ESA contract and operated by NEOCC. Aegis plays a critical role in the daily operations of the NEOCC, providing up-to-date information on orbital properties, impact probabilities, and risk analysis for near-Earth objects.

Aegis operates on an hourly basis, continuously downloading new astrometric data from the Minor Planet Center. It is primarily based on two components: orbit determination and impact monitoring. The orbit determination component maintains a dynamic catalogue of near-Earth asteroids, which includes orbital parameters with associated uncertainties, physical properties such as visual and absolute magnitude, observation details, residuals, close approaches, and ephemerides. The impact monitoring component computes the impact probabilities of near-Earth asteroids over the next 100 years. Objects with non-zero impact probabilities are listed in the NEOCC Risk List (2). When an object’s impact probability exceeds a certain threshold, Aegis also computes the associated impact corridor, further refining the risk assessment.

This presentation will focus on the Aegis processing pipeline, which leverages Docker services for the automated download and integration of observational data. Aegis begins by downloading observational data from the Minor Planet Center, after which it employs a dedicated weighting scheme to prioritize observations. This scheme considers the observatory, technology, and program codes associated with each observation. This allows us to apply different rules based on the historical relevance of each observatory. The system’s ability to adjust the significance of data inputs ensures robust and accurate orbit determination.
A crucial output of this pipeline is the “Residual Weights Observations” (rwo) format, an ad-hoc data structure that encapsulates not only the astrometric information, but also detailed data used in the orbit computation, such as weights and observational residuals. This data is accessible through the NEOCC webportal and its APIs (3).
This presentation will also cover the Aegis processing pipeline for the impact monitoring component. This method relies on sampling the Line of Variation, a 1-dimensional differentiable curve in the orbital elements space that identifies the direction with the largest uncertainty. The output of this sampling is then propagated for 100 years from the current epoch, searching for close approaches and potential impacts with Earth. Both sets of information are published in the NEOCC webportal on a daily basis.

By leveraging real-time processing capabilities and advanced data analysis through the rwo format, Aegis provides an essential service in the continuous monitoring of NEOs, contributing to the safety and security of our planet. This presentation will delve into the technical aspects of the Aegis pipeline, emphasizing its real-time functionality and the use of the rwo format and weighting scheme in astronomical data analysis.

(1) https://ui.adsabs.harvard.edu/abs/2023sndd.confE..73F/abstract
(2) https://neo.ssa.esa.int/risk-list
(3) https://neo.ssa.esa.int/computer-access

Real-time and Near Real-time Processing Pipelines

09:45

Leveraging FPGAs as accelerators in real-time astronomical data-processing pipelines

Mitchell Mickaliger

Given the big-data regime in which we work today, real-time processing is a strict requirement to get the most out of data while it's available. Over the years, there have been many advances in processing power, from multi-core CPUs to off-the-shelf GPUs. However, both of these examples are best suited for certain situations, given their hardware architectures. FPGAs (field-programmable gate arrays), on the other hand, have an open, almost undefined architecture, allowing the user to define it, through the creation of image files that are flashed to the device. As well as having a highly-configurable architecture, FPGAs also use much less power than other accelerator boards like GPUs, while still delivering very good performance (e.g. 38 TFLOPS for an Intel Agilex 7), making FPGAs a useful resource when power is a consideration, or to reduce cooling requirements in high-packing-density situations. While these devices are becoming more commonplace and off-the-shelf versions are available, there is still some effort required to successfully integrate these seamlessly into a pipeline. In this talk, I will describe our efforts to integrate FPGAs into our real-time pipeline for pulsar and fast transient searching for the SKA (named cheetah), and show the performance benefits we have gained from doing so.

Real-time and Near Real-time Processing Pipelines

10:00

FRELLED : An Astronomical Data Visualisation Package for Blender

I present FRELLED, the FITS Realtime Explorer of Low Latency in Every Dimension. This is a data visualisation package specifically designed for examining 3D FITS files, primarily (but not exclusively) intended for HI and higher-frequency radio data sets such as those from ALMA. It provides a number of different visualisation techniques to maximise the scientific returns from the data. Users can view their data volumetrically, as isosurfaces, or as a traditional series of 2D images with the option to use displacement maps, or even in virtual reality. The display can be rapidly toggled between different viewing methods. Multi-volume rendering is possible, both by overlaying two volumetric data sets directly, or plotting contours or isosurfaces from an unlimited number of data cubes over one volumetric display. FRELLED incorporates tools to allow rapid visual cataloguing of data sets of up to 1500^3 voxels, as well as performing basic analysis tasks : comparing their data with the SDSS, querying NED, plotting integrated flux and velocity maps, and measuring the spectral features of the data. Users can create different virtual objects to mask and catalogue data, interactively rescaling and positioning them and using different colours to indicate different object types. These objects can be exported either to produce simple catalogues or for direct processing of the data. In an era where automatic techniques are increasingly dominant, I will demonstrate that the correct cools can still ensure that visual examination can still play an important role even in cataloguing large data sets.

10:15

Visualization of Astropy objects and Multi-Order Coverage Maps (MOCs) with the iPyAladin Jupyter widget

Abstract

ipyaladin allows to view astronomical data interactively within Jupyter notebook.

In this presentation, we will demonstrate the new capabilities of ipyaladin and its new compatibility with the Astropy ecosystem.

We will highlight the new features of the latest versions of the ipyaladin widget. A few lines of code allows one to visualize any catalog downloaded thanks to astroquery. It is then possible to make a visual sub-selection of this catalog by drawing a circle or a rectangle in the interactive widget. This sub-table can then be retrieved in a new astropy Table. ipyaladin also allows to display FITS files from disk or from astropy's objects. The other way around also works: the image survey currently displayed in the widget can be cutout into a new FITS file. An other new functionality is the display of sky regions -- with astropy-regions support. Approximated sky regions (Multi-Order Coverage) can also be overlayed onto the view by the supporting the Astropy-affiliated module MOCpy.

We will showcase how these new methods enable a workflow in which the programmatic approach within Python benefits from constant visual checks in a widget on the side of the notebook's cells.

Running the demo!

You can download the files:
- ADASS2024_ipya_BBgDrSO.ipynb
- requirements_24vnFxP.txt
- chandra_fQoZFZw.fits

In a folder with these files, create a new python (>=3.8, <=3.12) environment:

python -m venv ./.adass-ipyaladin
source .adass-ipyaladin/bin/activate (or .adass-ipyaladin\Scripts\activate.batin the Windows command window)

And install the required dependencies with:

pip install -r requirements_24vnFxP.txt

You can launch Jupyter and execute the cells:

python -m jupyterlab

When you're done, deactivate the environment with

deactivate

User Experience

10:30

10:30

30min

Coffee Break

Aula Magna

11:00

Celebrating SAOImageDS9

Kenny Glotfelty

In this talk we will recognize SAOImageDS9 (DS9) and its creators William Joye and Eric Mandel for their vital role in astronomy and visualization. We will start with a brief history of DS9 and its predecessors. We will review some impressive metrics that demonstrate the breadth of DS9 usage and showcase some unexpected, yet wholly remarkable, applications of DS9. (Did you know DS9 was used to support Covid-19 research?) Along the way we will provide some little know facts and helpful tips that can change the way users interact with the application. Finally we will discuss current work on DS9 and discuss the uncertain prospects for future development.

11:30

User facing tutorials as code: reproducible and reliable tutorials with CI/CD

Brigitta Sipőcz

User-facing tutorials typically combine code, narrative text, execution results, and visualization. However, the target audience of these tutorials can differ significantly, tutorials are often served to be part of the documentation to be accessed by individual users as part of their asynchronous learning, while at other times tutorials are presented at one-off workshops while they are deployed on specific science platforms.

This talk presents the best practices for reliable and reproducible tutorials assembled as part of a Scientific Python Ecosystem Coordination (SPEC) document. These practices distinguish between the different flavours of tutorials and offer guidance for each of them.
The talk specifically showcases how we have implemented and used these best practices at IRSA, and focuses on the ecosystem we adopted for maintaining, testing and deploying our tutorials to the scientific user community. In our approach, we treat these tutorials as library code, and test them in an automated and regular CI/CD setup we serve them in an aesthetically pleasing, user-friendly way.

User Experience

11:45

Processing LISA's data as a human: The GlobalFit Framework user experience

Space missions have always recorded electromagnetic signals, from infrared light to gamma rays. Expected to launch in 2037, ESA's large-class mission LISA (Laser Interferometer Space Antenna) will survey gravitational wave signals from space. As the world's first in-orbit instrument to probe space-time itself, this is one of the most ambitious science missions ever. LISA promises a wealth of new science, allowing us to test our understanding of general relativity and to open a new window for astrophysics and cosmology. The data analysis for this mission will have to disentangle superposed signals from a variety of astrophysical sources, as well as modeling the instrumental noise. At the heart of a distributed data analysis system lies a gigantic Bayesian inference pipeline: the Global Fit. The computational challenge will be massive, expected to be about an order of magnitude heavier than the data processing of the recent ESA mission Euclid, in optimistic scenarios.

The inference of the parameters of each source will require source separation, complicating the estimation of their posterior distributions which is already challenging for isolated gravitational events. When separation is not possible, the number of superimposed sources becomes an unknown and the signals themselves form a confusion background comparable to noise; trans-dimensional analysis is then required, which yields additional complexity. To tackle the challenge of the Global Fit, the currently envisioned approach relies on a Markov chain Monte Carlo (MCMC) strategy, with block Gibbs sampling across the classes of sources (and the noise level) to reduce the complexity. Even using this trick, existing pipeline prototypes are computationally expensive and scale badly. As a consequence, the scientific community is looking for technological and algorithmic breakthroughs, e.g. relying on GPUs, sparsity-based modeling or artificial intelligence.

The so-called GlobalFit Framework will provide an abstraction layer between the distributed system components in charge of pipeline orchestration and execution on one hand, and the scientific modules on the other hand. This layer will offer to the scientific development team a convenient way to interact with the underlying components and hence focus on the algorithm development. Among others, the Framework will guarantee that the scientific modules can be integrated with low coupling, thus allowing dozens of labs to contribute with various languages and technologies. It will also handle the module scheduling, including the iteration logic, scalability and adaptation to available resources. It will feature check-pointing and resuming capabilities, and communicate in real time with concurrent pipelines running in different computing centers across the world. All of those technical requirements already make the Framework development an engineering challenge. Yet, the most complex features to be implemented relate to the user experience: Given such an algorithmic and computational beast, how to unleash its scientific potential? To do so, our Framework prototype is equipped with two dashboards which will be presented in more details: an Operation Dashboard for operators and a Monitoring Dashboard for science experts.

The proposed talk will address the following questions: How to support debugging and investigation at runtime, when thousands of sources are being processed in parallel? How to actively monitor the progress of the estimators and how to display a relevant synthesis of the source catalog in live? How to provide interactivity to the operators without wasting resources?

User Experience

12:00

12:00

10min

ADASS 2025 Announcement

Aula Magna

12:10

12:10

20min

Conference Closing

Aula Magna