Astronomical Data Analysis Software & Systems XXXIV

XRADIO: Xarray Radio Astronomy Data Input Output
2024-11-12 , Meeting Room 101

The advent of next-generation radio interferometers—ALMA-WSU (Atacama Large Millimeter Array Wide Band Sensitivity Upgrade), ngVLA (Next Generation Very Large Array), and SKA (Square Kilometre Array)—will increase astronomical data volumes by orders of magnitude. Current pipelines for ALMA and the VLA rely on CASA (Common Astronomy Software Applications) and store data in MSv2 (Measurement Sets v2). This approach, utilizing considerable custom software, faces maintenance challenges and scaling limitations. To address these issues, we present a new data schema, MSv4 (Measurement Set v4), implemented in the open-source Python package XRADIO (Xarray Radio Astronomy Data IO). This initiative represents a collaborative effort between the National Radio Astronomy Observatory (NRAO), European Southern Observatory (ESO), National Astronomical Observatory of Japan (NAOJ), and Square Kilometre Array Observatory (SKAO), combining expertise from leading astronomical institutions.

The MSv4 contains data for a single spectral window, polarization setup, and observation setup within a fully self-describing structure, allowing for finer partitioning as needed. Collections of MSv4, termed PS (Processing Set), facilitate deployment across distributed computing environments. Departing from MSv2's relational tables, MSv4 employs labeled n-dimensional arrays.

XRADIO leverages off-the-shelf technology to ensure scalability and maintainability. It relies on Zarr for efficient storage and serialization, while Xarray provides in-memory data representation as NumPy arrays or lazy Dask arrays, complete with dimensions, coordinates, and attribute labels.

This focus demo, presented through interactive Jupyter Notebooks, will:
- Explore the PS and MSv4 schemas
- Demonstrate how to easily convert legacy MSv2 datasets to a PS
- Demonstrate efficient data selection techniques
- Showcase data visualization methods
- Illustrate parallel processing capabilities

By adopting these modern tools and approaches, we aim to equip the radio astronomy community with a robust framework capable of handling the data challenges of the next generation of interferometers.