EuroSciPy 2024

Reproducible workflows with AiiDA - The power and challenges of full data provenance
08-29, 16:00–16:30 (Europe/Berlin), Room 6

AiiDA is a workflow manager with a strong focus on reproducibility through automated data provenance. In this talk we discuss what it means to have full “data provenance” for scientific workflows, the advantages it offers, but also the challenges it represents for new users and how we deal with them.


AiiDA is a robust open-source Python package to help researchers automate, manage, persist, share, and reproduce complex workflows. A defining feature of AiiDA is the automatic recording of the calculations' history, or “provenance”, including relevant data inputs and outputs. This allows for designing detailed interfaces of processes and workflows, using advanced queries to look for relevant results or share data. This makes AiiDA particularly suitable for building a sustainable computational infrastructure for running high-throughput workflows and facilitates sharing data and provenance in a FAIR way for publication.

On the other hand, writing workflows while keeping in mind the requirements of tracking the full provenance can be cumbersome for new users. Until very recently, running a new external (i.e. non-Python) code required developing a dedicated plugin, connecting processes in a workflow using advanced Python concepts. For high-throughput performance on HPC systems, AiiDA inherently depended on services that are not always trivial to install, such as a PostgreSQL database and a RabbitMQ message broker. In the past year, several improvements have been made to improve its usability, with a particular focus on getting new users up and running as quickly as possible.

In this talk, we’ll start with a brief overview of AiiDA’s philosophy and core features as a workflow manager and discuss solutions to the challenges described above. The new plugin package aiida-shell makes running any shell executable easy, without the need to develop a custom plugin, while preserving basic provenance. Adding support for SQLite databases and making RabbitMQ optional allows users who don’t need high performance or scalability to run without the need to install and configure these services. Furthermore, a new WorkGraph feature provides a powerful framework for designing flexible node-based workflows with basic Python knowledge. This flexibility is essential to allow users to piece together a workflow for their scientific use case quickly. The WorkGraph also allows management and visualization of workflows in web browsers and Jupyter notebooks.


Public link to supporting material

https://aiida.net/

Project Homepage / Git

https://github.com/aiidateam/aiida-core

Abstract as a tweet

New developments in the AiiDA workflow manager make it easier than ever to run complex workflows with full data provenance.

Category [Scientific Applications]

Material Sciences

Expected audience expertise: Domain

none

Expected audience expertise: Python

some

I obtained my M. Sc. with a major in nanophysics from the University of Antwerp in 2015 and continued at the same institution as a Doctoral Candidate under the supervision of Prof. Lamoen. Since obtaining my PhD with distinction in 2020, I have first worked as a postdoctoral researcher in the THEOS group of Prof. Marzari at EPFL, and since September 2023, I have worked in the group of Giovanni Pizzi at the Paul Scherrer Institute.

My main interests include designing materials - especially in superconductors, batteries, and solar cells - using quantum simulations run by automated high-throughput workflows. Much of my recent work has focused on developing open-source tools to facilitate fully reproducible and shareable workflows and improve their robustness in the Quantum ESPRESSO plugin for AiiDA. In addition to software development, I'm also involved in managing and running large-scale HTC projects for generating databases of materials properties.

Dr. Xing Wang is a postdoctoral researcher in chemical and materials engineering at the Paul Scherrer Institute (PSI) in Switzerland. He earned his PhD from ETH Zürich and his Bachelor's and Master's degrees from Central South University, China. Dr. Wang's research focuses on high-throughput computing, scientific data management, computational materials science, heterogeneous catalysis, and spectroscopy computations. He has led several open-source projects, such as the development of AiiDA-WorkGraph and AiiDAlab QEApp, and contributed to platforms like Materials Cloud.