PackagingCon

Serving and Managing Reproducible Conda Environments via Conda-Store
11-10, 19:55–20:15 (UTC), Room 3

End users think in terms of environments not packages. The core philosophy of conda-store is to serve reproducible conda environments in as many ways as possible to users and services. Conda-store was developed due to a significant need we found in enterprise architectures. There are many ways to serve environments and each plays an important role. Thus conda-store serves the same environment via a filesystem, lockfile, pinned yaml specification, conda pack archive, and docker image. This logic could easily be extended to also support the creation of VM iso's and singularity containers

During this talk I will highlight some common problems with environments we have seen while consulting and show how conda-store aims to solve them:
- Friction between IT and end users in controlled environments where new packages are needed
- Enabling a given notebook developed within jupyterlab to be reproducibly run in workflows reliably for years to come
- Helping to removing the need for specially crafted docker containers

This talk will be full of demos along with a site that everyone in the talk can try out.


End users think in terms of environments not packages. The core philosophy of conda-store is to serve identical conda environments in as many ways as possible. Conda Store controls the environment lifecycle: management, builds, and serving of environments.

It manages conda environments by:
- watching specific files or directories for changes in environment filename specifications
- provides a REST api for managing environments
- provides a command line utility for interacting with conda-store conda-store env [create, list]
- provides a web ui to take advantage of many of conda-store's advanced capabilities

It builds conda specifications in a scalable manner using N workers communicating via Celery to keep track of queued environment builds.

It serves conda environments via a filesystem, lockfiles, tarballs, and soon a docker registry. Tarballs and docker images can carry a lot of bandwidth which is why conda-store integrates optionally with s3 to actually serve the blobs.

Below are highlighted some common problems with environments we have seen while consulting and show how conda-store aims to solve them.

IT and End User Friction

We saw tension between the IT/sysadmins and end users who use the environments that they build. When IT gets a request for a new package in an environment, they need to rebuild the environments and check that the package satisfies their constraints. This process may take several days and at best will not be immediate. While developers need packages in their environments as soon as possible to do interesting new research. This situation often led to a lot of frustration on both sides for good reason. Conda-store aims to address this by allowing users to control a set of environments in their namespace while allowing IT to having all environments under their control.

Reproducibly Productionizing Environments

Another issue we saw was the need to quickly productionize workflows and ensure that they may run for many years to come. Often times developers will experiment with a given environment and create a notebook to run a given workflow. They will want to “submit” this notebook with the given environment and run it on a cron job. The only problem is that this creates a huge burden on IT. How is IT supposed to ensure that the environment that that notebook ran with is preserved indefinitely? Conda-store addresses this by building all environment separately(including updates). There is a unique key that identifies any given environment. Furthermore this environment is available in many different forms: yaml, lockfile, conda tarball, and docker image. The advantage here is that the workflow orchestration framework may run significantly different from the developer environment and we need a way to ensure that environments are the same.

Scientific Software developer at Quansight.

Devops curious scientific software developer, now focusing on Python packaging.