Manuel Parra-Royón ADASS2023

Manuel Parra-Royón
.ical

Manuel Parra Royón is a postdoctoral researcher at the Instituto de Astrofísica de Andalucía (IAA-CSIC) in Spain. His research interests include data mining, machine learning, and big data analytics. He is currently working on the development of the SKA Regional Centres Network for the Square Kilometre Array Observatory (SKAO).

Manuel received his Ph.D. in computer science from the University of Granada in 2019. His dissertation focused on the use of machine learning for data mining in cloud computing. He also holds a Master's degree in Data Science and Intelligent Systems and a degree in Computer Science.

Before joining the IAA-CSIC, Parra Royón worked as researcher at the University of Granada, where he was involved in several projects on big data analytics, Machine Learning, IA and Science platforms. He also worked as a data engineer at the European Organization for Nuclear Research (CERN) in Geneva, Switzerland.

Parra Royón is passionate about using data science to solve real-world problems. He is particularly interested in using data mining to improve the efficiency and effectiveness of large-scale scientific experiments.

Sessions

11-06

08:30

0min

Prototyping access from visualisation tools to SKA science images and cubes stored in a rucio DataLake through IVOA discovery and access services

Pierre Fernique, Matthieu Baumann, Thomas Boch, Manuel Parra-Royón, Marco Molinaro, François Bonnarel, Vincenzo Galluzzi, Jesus Salgado, Susana, caroline bot, Mark Allen, Alessandra Zanichelli

Prototyping access from visualisation tools to SKA science images and cubes stored in a rucio DataLake through IVOA discovery and access services.

M.Allen, R.Barnsley, M.Baumann, F.Bonnarel, T.Boch, C.Bot, R.Butora, J.Collinson, P.Fernique, V.Galluzzi., R Joshi, M.Molinaro, M. Parra-Royon, J. Sanchez-Castaneda , S. Sanchez-Exposito, G.Tudisco, F .Vitello A.Zanichelli.

SKA is the major low frequency radioastronomy project of the future with several major scientific applications: It will upgrade the amount of available science data by several orders of magnitudes reaching eventually more than 700 petabytes of storage per year. The SKA observatory will proceed to the initial data processing to deliver observatory data products while the SKA Regional Center network (SRC) will provide storage for those and processing capabilities to deliver and store advanced data products for the user community.
Within the scope of the SRC network, Orange (visualisation), Magenta (data management) and Coral (node implementation) teams have prototyped the discovery acces and visualisation of science data. Our visualisation tools VisiVO and Aladin discover, access and visualize test science data produced by SKA pathfinders stored in the rucio DataLake. Science metadata functionality has been implemented by the Magenta team to the Rucio data lake prototype to demonstrate a means of enabling IVOA-compliant data discovery and server-side processing.
VisiVo, Aladin Desktop and Aladin Lite are able to query the Discovery service built on ObsCore and SCS IVOA protocols.
This allows them to load DataLink responses providing links towards a SODA cutout service developed by the Orange team able to extract subcubes or images directly from the datasets stored in the rucio DataLake.
The Rucio Storage Element and SODA developments have been deployed and configured on the Spanish SRC node, providing computing and storage resources, managed by the Coral Team members. This prototype paves the way to collaborative development in the SKA regional center network and shows the possible integration of VO services and visualisation tools in DataLakes and science platforms.

Science with data archives: challenges in multi-wavelength and time domain data analysis

Empowering SKA Data Challenges: A homogeneous platform for enhanced collaboration and scalability fully aligned with Open Science.

Manuel Parra-Royón

The Square Kilometre Array Observatory (SKAO) is an international collaborative effort focused on constructing and operating the world's most advanced radio telescope. The SKAO Science Data Challenges (SDCs) are a series of competitions that are designed to help scientists and engineers develop new techniques for analysing the vast amounts of data that the SKAO will generate. These SDCs have traditionally been conceived to use computing resources kindly provided by scientific institutions and facilities. The method of allocating computing resources for participants in the Data Challenges has varied among resource providers, resulting in a heterogeneous user experience where the users have access to Virtual Machines (VMs) with differing configurations, while others provide HPC-type resources. Providing an uniform platform for computing resources for SDC brings fairness, scalability, enhanced collaboration and consistency. Participants work with equal tools and streamlined collaboration. A standardised setup simplifies resource management, support, and evaluation, leading to enhanced efficiency and reliable results.

JupyterHub provides a platform for provisioning compute resources through a container orchestration service such as Kubernetes, in addition to providing user demand scaling, and enabling centrally managed authentication. The advantages of this approach include ease of deployment through Helm, homogenisation of the customisation for software and compute environment needed for the SDC, and horizontal scalability by allowing resources to be allocated to users by the Kubernetes cluster based on demand and availability.

With this contribution we want to present a highly portable, interactive and fully OpenScience-aligned analysis service for future participants in different Science Data Challenges to develop solutions on a horizontally scalable platform within the infrastructures of the SKA Regional Centres Network (SRCNet) and other IT facilities. In this context, we will show the process of configuring the Kubernetes cluster, the installation and preparation for BinderHub/JupyterHub, as well as a use case for a data analysis and workflow in radio astronomy, using Dask (a Python library for parallel and distributed computing) to take advantage of the capabilities of large distributed clusters in the cloud on Kubernetes. To ensure portability, two SRCNet cloud platforms such as ESPSRC (Spain) and CHSRC (Switzerland) have been used in addition to the infrastructure of a supercomputing centre (CESGA).

Cloud infrastructures for astronomical data analysis

Focus Demos

Manuel Parra-Royón .ical

Sessions

Manuel Parra-Royón
.ical