From Implementation to Ecosystem: The Journey of Zarr
2023-08-17 , Aula

Zarr is an API and cloud-optimized data storage format for large, N-dimensional, typed arrays, based on an open-source technical specification. In the last 4 years it grew from a Python implementation to a large ecosystem. In this talk, we want to share how this transformation happened and our lessons learned from this journey. Today, Zarr is driven by an active community, defined by an extensible specification, has implementations in C++, C, Java, Javascript, Julia, and Python, and is used across domains such as Geospatial, Bio-imaging, Genomics and other Data Science domains.


This talk covers the following points:

  • What is Zarr & how does it work?
    • Illustrated Mechanisms of Zarr & Examples
    • When and Why should you use Zarr?
    • Cloud-optimized file/object-storage systems
  • Early Development of Zarr and Adaption Across Implementations & Domains
  • Implementations in C++, C, Java, Javascript, Julia, and Python
  • Usage across Geospatial, Bio-imaging, Genomics and other Data Science domains
  • The Zarr Enhancement Proposal (ZEP) process
  • Zarr v3 & ZEP0001: From Implementation-driven Development to Spec first
  • Lessons learned while developing Zarr v3

In this talk you will

  • understand the basics of Zarr and its specification,
  • find inspiration for processes and tools in growing projects and ecosystems, and
  • get essential takeaways regarding OSS project transitions from a young to a mature stage.

Expected audience expertise: Python

none

Category [Community, Education, and Outreach]

Other

Expected audience expertise: Domain

some

Project Homepage / Git

https://zarr.dev

Abstract as a tweet

Zarr is an API and data storage format for n-dimensional arrays. In the last years it grew from a Python implementation to a large ecosystem. In this talk we share how this transformation happened and our lessons learned from this journey.

Jonathan is a ML software engineer at Aignostics in Berlin, Germany. He works on machine-learning pipelines for medical image analysis, ensuring scalability and maintainability. Also, he’s an active member of the Zarr community, and one of the authors of the Zarr v3 specification.