Processing large Radio Astronomy data cubes within an Objectstore
2023-11-06 , Posters

The future Square Kilometre Array (SKA) telescope and its current precursors such as the Australian SKA Pathfinder and the Murchison Widefield Array are changing the way in which we handle large data. Typical ASKAP data cubes can be on the scale of a terrabyte or so; SKA data cubes may be larger by two orders of magnitude or more.

Reduction of these data can only efficiently occur in High Performance Compute (HPC) facilities. Modern HPC centres are moving to object storage for long-term storage of data, as opposed to the traditional POSIX-based file systems. They offer virtually limitless scalability, greater searchability (via metadata attributes), resilliency and cost efficiency. However, virtually all algorithms used by radio astronomers assume an underlying POSIX file system, with its familiar file methods of open(), write(), seek() etc. To work with objectstores, data must firstly be staged out to short-term POSIX file-system storage, prior to processing the data. This is not a trivial exercise; staging multi-terrabyte data sets may take several hours to days.

I present an alternative methodology to avoid this double-handling of data. A python wrapper requests cutouts from the datacube in the objectstore and converts the received stream into arrays to be fed directly into the process (in this case the source-finder SoFiA-2). This is shown to be considerably faster than staging out data to a scratch file system and then processing.

See also: Poster (486.7 KB)

Dr Gordon WH German is a CSIRO Senior Engineer for the Australian SKA Regional Centre, located in Perth Western Australia. He specialises in data reduction pipelines for ASKAP science projects, and is involved in the global SRCNet effort to provide data handing and reduction for the upcoming Square Kilometre Array Observatory.