Synergies Unleashed: Bridging the Gap Between Science and Computing teams in the ALMA Observatory software deployments
The ALMA Observatory has been collecting science data for more than 10 years. During the first years of operations (up to the end of Cycle 4, i.e., September 2017) the focus of the ALMA Integrated Computing Team (ICT) and the Department of Science Operation (DSO) was mainly on the data acquisition part (using what is known in the ALMA jargon online software).
Thanks to the stability of the online software, during Cycle 5 (October 2017 – September 2018), the Observatory reached a very high data acquisition performance, but with a negative impact on Data Processing (DP) and Data Delivery (DD), one of the reasons being that the software downstream data acquisition (known as offline software) was not mature enough to cope with such a large amount of incoming data.
One of the major contributors to the immaturity of the offline software was that ALMA underestimated the importance of an integration procedure. The applications were working as expected individually, but not as part of an entity which contains software components with interdependencies.
In the last years, significant resources have been allocated to consolidate the performance of the offline software. The situation was reversed, thanks to an efficient, coordinated, and collaborative plan between ICT (especially the EU part) and DSO. The outcome of this effort is a suite of regression and integration tests for the ALMA offline deployment process. This paper (a) describes in more detail the complete history behind this effort, (b) show the current regression and integration tests in place for the most important ALMA scenarios and (c) presents the cutting-edge technology behind the automation approach. We also discuss how this innovative approach in looking at science operations from the software perspective, with an enhanced and coordinated collaboration between the two mentioned teams (ICT and DSO), can become a game-changer in the improvement of an Observatory's performance. The statistics collected demonstrated that the offline software has become much more robust, as both the occurrence of bugs and the need for patches have significantly diminished during the past four cycles.