Reproducibility, Julia, and Renku

Reproducibility should be a central consideration for data science processes, but it requires some support to achieve. In this talk, we present the Renku reproducibility platform and how to take advantage of it from Julia focusing two examples: 1. Using Renku to build reproducible workflows in Julia and 2. Facilitating the teaching of Julia-based courses with Renku.


Doing data science effectively means working reproducibly. Although commitment to reproducibility as an ideal is non-controversial, in practice, it can be challenging to achieve. Many know this from their own experience, and the disconnect between ideal and reality has been well documented in recent years (see the Fall 2020 issue of the Harvard Data Science Review for just one recent example).

Renku is open-source software being developed at the Swiss Data Science Center as a solution for making reproducible data science easier to achieve. It builds upon established software such as git, Docker, and Kubernetes to provide tools necessary to work reproducibly, while offering the flexibility to support a variety of use cases and users working in any language they choose, including Python, R, and, of course, Julia.

In this presentation, we will explain the architecture of the Renku platform and show how it can interact with Julia tools, in particular the Pkg package manager, to provide scaffolding for portable, reproducible Julia projects. Once we have a project set up, we will work through an example of building a reproducible workflow in Julia using Renku.

The same tools that support an individual working reproducibly can be used to solve some of the hurdles that are encountered teaching a class to many students. As an example, we will present a set-up that could be used for teaching a class with Julia and provide a comparison of Renku vs. alternatives like Binder for this purpose.

We will conclude by looking at some more advanced topics for customizing a Julia-based environment for reproducibility by combining Renku with tools like VSCode, Dr.Watson, or Pluto.jl.

The speaker’s profile picture
Chandrasekhar Ramakrishnan

Chandrasekhar Ramakrishnan studied mathematics at the University of California, Berkeley (B.A. 1997) and art and computer science at the University of California, Santa Barbara (M.A. 2003). He has worked as a software developer and data-science consultant for companies, research institutions, and NGOs in the US, Germany, and Switzerland. Since 2009, he has been at ETH Zürich supporting projects by developing software solutions for data management, analysis, and visualization.

The speaker’s profile picture
Gavin Lee