Reproducible data science with the RENKU platform JuliaCon 2020 (times are in UTC)

Reproducible data science with the RENKU platform
.ical

Communities and funding sources are increasingly demanding reproducibility in scientific work. We present RENKU (https://renkulab.io/): an open-source platform for reproducible data science. From an end-user perspective, through a unique authentication mechanism, the platform provides a seamless integration of an RStudio Server or a Jupyter notebook, git version control, git LFS to handle data, continuous integration via GitLab, containerization via Docker images that can be reused and shared,

With RENKU, every step of the data science research that generates new code or data is preserved. This allows scientists to step backwards through the history of their research and retrieve earlier versions of their methods and results. RENKU materializes data science recipes and data lineage into a knowledge representation based on the Common Workflow Language standards (CWL), and PROV-O ontology. Data lineage is automatically recorded and workflow captured within and across RENKU projects, allowing derived data and results to be unambiguously traced back to original raw data sources through all intermediate processing steps.

A template for using Renku with a Jupyter Julia kernel is available at: https://renkulab.io/projects/cchoirat/julia-template/.

Christine Choirat

Dr. Choirat is the Chief Innovation Officer of the Swiss Data Science Center (SDSC, https://datascience.ch/), an initiative to accelerate the use of data science and machine learning techniques within academic disciplines and the industrial sector, in Switzerland and internationally. She was trained as a statistician (PhD, Paris Dauphine). At SDSC, she provides leadership over the lifecycle of sponsored projects and partnerships in the domains of environmental science, health IT, health science and technology, personalized medicine, and open science. She also fosters engagement with partners to facilitate the adoption of FAIR and reproducible data science with the Renku platform (https://renkulab.io/).

https://scholar.harvard.edu/cchoirat

Reproducible data science with the RENKU platform .ical

Reproducible data science with the RENKU platform
.ical