JuliaCon 2020 (times are in UTC)

Reproducible data science with the RENKU platform

Communities and funding sources are increasingly demanding reproducibility in scientific work. We present RENKU (https://renkulab.io/): an open-source platform for reproducible data science. From an end-user perspective, through a unique authentication mechanism, the platform provides a seamless integration of an RStudio Server or a Jupyter notebook, git version control, git LFS to handle data, continuous integration via GitLab, containerization via Docker images that can be reused and shared,


With RENKU, every step of the data science research that generates new code or data is preserved. This allows scientists to step backwards through the history of their research and retrieve earlier versions of their methods and results. RENKU materializes data science recipes and data lineage into a knowledge representation based on the Common Workflow Language standards (CWL), and PROV-O ontology. Data lineage is automatically recorded and workflow captured within and across RENKU projects, allowing derived data and results to be unambiguously traced back to original raw data sources through all intermediate processing steps.

A template for using Renku with a Jupyter Julia kernel is available at: https://renkulab.io/projects/cchoirat/julia-template/.