Empowering SKA Data Challenges: A homogeneous platform for enhanced collaboration and scalability fully aligned with Open Science.
2023-11-09 , Focus Demos

The Square Kilometre Array Observatory (SKAO) is an international collaborative effort focused on constructing and operating the world's most advanced radio telescope. The SKAO Science Data Challenges (SDCs) are a series of competitions that are designed to help scientists and engineers develop new techniques for analysing the vast amounts of data that the SKAO will generate. These SDCs have traditionally been conceived to use computing resources kindly provided by scientific institutions and facilities. The method of allocating computing resources for participants in the Data Challenges has varied among resource providers, resulting in a heterogeneous user experience where the users have access to Virtual Machines (VMs) with differing configurations, while others provide HPC-type resources. Providing an uniform platform for computing resources for SDC brings fairness, scalability, enhanced collaboration and consistency. Participants work with equal tools and streamlined collaboration. A standardised setup simplifies resource management, support, and evaluation, leading to enhanced efficiency and reliable results.

JupyterHub provides a platform for provisioning compute resources through a container orchestration service such as Kubernetes, in addition to providing user demand scaling, and enabling centrally managed authentication. The advantages of this approach include ease of deployment through Helm, homogenisation of the customisation for software and compute environment needed for the SDC, and horizontal scalability by allowing resources to be allocated to users by the Kubernetes cluster based on demand and availability.

With this contribution we want to present a highly portable, interactive and fully OpenScience-aligned analysis service for future participants in different Science Data Challenges to develop solutions on a horizontally scalable platform within the infrastructures of the SKA Regional Centres Network (SRCNet) and other IT facilities. In this context, we will show the process of configuring the Kubernetes cluster, the installation and preparation for BinderHub/JupyterHub, as well as a use case for a data analysis and workflow in radio astronomy, using Dask (a Python library for parallel and distributed computing) to take advantage of the capabilities of large distributed clusters in the cloud on Kubernetes. To ensure portability, two SRCNet cloud platforms such as ESPSRC (Spain) and CHSRC (Switzerland) have been used in addition to the infrastructure of a supercomputing centre (CESGA).

See also:

Manuel Parra Royón is a postdoctoral researcher at the Instituto de Astrofísica de Andalucía (IAA-CSIC) in Spain. His research interests include data mining, machine learning, and big data analytics. He is currently working on the development of the SKA Regional Centres Network for the Square Kilometre Array Observatory (SKAO).

Manuel received his Ph.D. in computer science from the University of Granada in 2019. His dissertation focused on the use of machine learning for data mining in cloud computing. He also holds a Master's degree in Data Science and Intelligent Systems and a degree in Computer Science.

Before joining the IAA-CSIC, Parra Royón worked as researcher at the University of Granada, where he was involved in several projects on big data analytics, Machine Learning, IA and Science platforms. He also worked as a data engineer at the European Organization for Nuclear Research (CERN) in Geneva, Switzerland.

Parra Royón is passionate about using data science to solve real-world problems. He is particularly interested in using data mining to improve the efficiency and effectiveness of large-scale scientific experiments.

This speaker also appears in: