Stimela 2, kubernauts, and dask-ms: radio interferometry data reduction in the cloud
11-09, 11:45–12:00 (US/Arizona), Talks

Radio interferometry has been slow in adopting cloud-based technologies, despite some of their apparent advantages. I argue that it has been difficult to make radio interferometry on the cloud cost-effective for a number of reasons, chief among them: (a) awkward legacy data formats ill-suited to object store, (b) complex and heterogeneous software stacks with a heavy reliance on legacy code, and (c) awkward and complicated "thick/thin" workflows with very different resource requirements at different stages of the pipeline.

Recent software developments, however, offer a way forward. I will showcase some of these, including the Stimela 2 workflow management and containerization framework, which streamlines the orchestration of complex workflows on a Kubernetes cluster, and the dask-ms library, which maps legacy data formats onto diverse storage backends, providing support for object store. A new generation of software packages leverages these technologies, providing cloud-efficient implementations of the basic processing steps, which are able to exploit the auto-scaling capabilities inherent to cloud architectures. I will demonstrate a full data reduction workflow running on AWS. I will also argue that cloud-compatible pipelines go a long way to providing fully reproducible workflows.

Distinguished Professor Oleg Smirnov holds the SARAO Research Chair in Radio Astronomy Techniques & Technologies (RATT) at Rhodes University, and also heads the Radio Astronomy Research Group at the South African Radio Astronomy Observatory (SARAO). He is an expert in observational radio interferometry, calibration and imaging algorithms, data processing and software infrastructure for the new generation of radio telescopes, including South Africa’s MeerKAT telescope, a precursor for the Square Kilometre Array. His RATT group has produced some of the most spectacular MeerKAT images published to date.