EuroSciPy 2024

Building robust workflows with strong provenance
2024-08-27 , Room 5

In computational science, different software packages are often glued together as scripts to perform numerical experiments. With increasing complexity, these scripts become unmaintainable, prone to crashes, hard to scale up and to collaborate on. AiiDA solves these problems via a powerful workflow engine and by keeping provenance for the entire workflow. In this tutorial, we learn how to create dynamic workflows combining together different executables that automatically can restart from failed runs and reuse results from completed calculations via caching.


Have you ever built a computational script for running calculations and lost track of the data you produced? Have you submitted your script to a high-performance cluster (HPC) and your job failed so you needed to restart the whole workflow? Did you want to streamline the production and access of computational experiment results? By writing your workflow in AiiDA, intermediate and final results are stored in a structured manner in a database. In addition, you can restart from the last checkpoint and reuse results from duplicated calculations via caching. As such, AiiDA not only helps you with your personal data management, but also enables easy sharing with other collaborators.

This is a hands on session which is structured in the following way:

Part 1: Introduction to AiiDA - what problems can it help you to solve (20 mins)
- Provenance, a robust solution for process management and data traceability
- Scalability, interoperability, and high-throughput performance

Part 2: How to quickly create a workflow from a set of executables (40 mins)
- Quickly set up a running instance
- Concatenating several scripts to one workflow
- Parsing output files to filter out meaningful results from outputs

Part 3: How to create more complex workflows (30 mins)
- Implementing concurrent jobs in graph-like dependencies
- Generate on-the-fly a workflow from input
- Querying results from the AiiDA database

By the end of the tutorial, you will have learned how to use AiiDA to quickly create workflows that leverage its restart and caching capabilities. You will learn how to implement workflows with graph-like dependencies to run their calculations concurrently, and how to access and share their results. You can follow this tutorial by using the development environment provided by https://nanohub.org/tools/aiida.. Because nanohub changes the path when making the environment publically available, you need to run the following command in one of the jupyter cells to run notebook 2 and 3

!echo "export PATH=$PATH:$(realpath ../../data/euro-scipy-2024/diag-wf):$(realpath ../../data/euro-scipy-2024/diag-wf/bin/default)" >> ~/.bash_profile

The support thread for the tutorial on Discourse can be found at the following link:
https://aiida.discourse.group/t/euroscipy-2024-support/456


Abstract as a tweet

Learn how to use AiiDA to create workflows from arbitrary executables that track full data provenance, can automatically restart from failed runs, and reuse completed calculations via caching.

Category [Scientific Applications]

Material Sciences

Expected audience expertise: Domain

none

Expected audience expertise: Python

some

Project Homepage / Git

https://github.com/aiidateam/aiida-core/

Public link to supporting material

https://nanohub.org/tools/aiida

See also: Development environment to follow tutorial

I am currently working as a Software Engineer at the Paul Scherrer Institut in Switzerland focusing on the development of AiiDA a workflow engine specialized on managing high-throughput calculations. Before this I earned my PhD at the École Polytechnique Fédérale de Lausanne (EPFL) in Materials Science and Engineering in the Laboratory of Computational Science and Modeling. My research focused on studying features of machine learning models used for the prediction of atomistic properties. I am passionate about developing software that helps researchers to push the boundaries of materials science research. In my free time, I enjoy tennis and running outdoors. In addition to my research, I am also skilled in programming languages such as Python and C and am interested in diving more into Rust and F#. I have experience managing high-performance computing systems and have contributed to several open-source software projects in the field of computational materials science. I am always looking for opportunities to collaborate with others and learn from their experiences.

I currently work as a postdoctoral researcher / Research Software Engineer at the Materials Software and Data group led by Dr. Giovanni Pizzi at the Paul Scherrer Institute (PSI) in Switzerland. I studied chemistry at the Friedrich Alexander University Erlangen-Nuremberg, Germany, and was interested in atomistic modelling early on. After an Erasmus stay at the University of Cambridge and the completion of my Master thesis at the company BASF in Germany, I did a PhD at the Catalan Institute of Chemical Research (ICIQ) in Tarragona, Spain. During my doctoral studies, I investigated electron transfer dynamics in ceria-based single-atom catalysts (SACs) using Density Functional Theory (DFT). Here, AiiDA was very helpful to orchestrate and automate the complex workflows that were required for my research. Thus, I am very happy that my current position enables me to be involved in driving the development of AiiDA and contribute to augmentative tools such as the WorkGraph.

I completed a PhD in computational condensed matter physics in SISSA (Italy) and published various research papers (https://orcid.org/0000-0002-6933-3642) on different topics.
During this time, I discovered my interest in programming and decided to pursue it professionally.