EuroSciPy 2024

The joys and pains of reproducing research: An experiment in bioimaging data analysis
2024-08-28 , Room 6

The conversation about reproducibility is usually focused on how to make research workflows (more) reproducible. Here, we consider it from the opposite perspective, and ask: How feasible is it, in practice, to reproduce research which is meant to be reproducible? Is it even done or attempted? We provide a detailed account of such an attempt, trying to reproduce some segmentation results for 3D microscopy images of a developing mouse embryo. The original research is a monumental work of bioimaging and analysis at the single-cell level, published in Cell in 2018, alongside with all the necessary research artifacts. Did we succeed in this attempt? As we share the joys and pains of this journey, many questions arise: How do reviewers assess the reproducibility claims exactly? Incentivizing reproducible research is still an open problem, since it is so much more costly (in time) to produce. And how can we incentivize those who test reproducibility? Not only is it costly to set up computational environments and execute data-intensive scientific workflows, but it may not appear as rewarding at first thought. In addition, there is a human factor: It is thorny to show authors that their publication does not hold up to their reproducibility claims.


In this presentation, I would like to share my personal experience trying to reproduce a bioimaging data analysis workflow. The starting point was my fascination with the live imaging of developing mouse embryos in the accompanying videos of a 2018 research paper [1]. These incredible images show the organism's individual cells (nuclei being marked with a fluorescent protein) dividing and migrating over time, in three spatial dimensions.

The authors of this research developed not only imaging but also analysis tools, in order to achieve accurate cell segmentation, cell tracking, detection of cell divisions, registration across multiple embryos, and more. As a maintainer of scikit-image, and just for starters, I was curious to try out some of our classical (as opposed to machine-learning) segmentation workflows on these 3D biomedical images.

After all, the paper is open access, it comes with supplementary materials which point to data and software repositories... and its authors are kind and helpful! The latter point proved to be extremely important, especially when it came to getting and using the data. I would like to commend these researchers, who are so conscientious and generous in their sharing.

In April 2023, the scikit-image team selected two Outreachy interns [2] to work on two different projects, including one on narrative documentation to expand our gallery with biomedical examples. The additional workforce was welcome, considering the challenges we encountered every step along the way. The datasets are available in the Image Data Resource (IDR) [3], which is fairly standard practice in the field, but it turns out that downloading data from the IDR is not trivial [4, 5, 6 and references therein].

For long-term cell tracking, the authors of [1] developed a framework named TGMM (Tracking with Gaussian Mixture Models). If you pass it a single frame, it computes a cell segmentation for this frame. Admittedly, the question of comparing this segmentation result with another one, say, obtained in pure Scientific Python, goes beyond that of reproducibility. We present a segmentation workflow based on SciPy and scikit-image, and compare its results with the TGMM's (although this would rather fall under the "Scientific Applications" track).

The datasets are available in KLB format [7], which is not much in use anymore. Nowadays, a similar study would probably publish their data in the Zarr format, which is popular in bioimaging [8]. This would make it much easier to load the data into various analysis tools. For example, the dask.array.from_zarr function [9] might be convenient to work in Python. But is it fair to ask that the data be available in Zarr, just because the ecosystem has changed since 2018? Who would take care of converting (large) published datasets, assuming there is an actual demand for re-using them?

To read a dataset in Python, I used the pyklb package [10], which provides a Python wrapper for the KLB file format but is not maintained anymore. Unsurprisingly, I had to use a custom-made environment just to be able to install pyklb. The documentation of all the tweaking that ensued lives in GitHub issues [11] for now. I will compile it, along with the rest of my logbook, and share it in a dedicated repository [12], so the reproducibility loop is complete.

Getting the TGMM software to run on my PC proved painful but ultimately possible (which made it almost joyful)! Research-style documentation tends to be scattered and out-of-date (e.g., [13]), which is completely understandable: In research groups, people and funding cycles come and go, so who really would be able to maintain software published (shared) as part of a reproducible study? For exemple, will these tiny pull requests [14] ever be seen, let alone merged?

Should there be some kind of community responsibility here? Shifting the perspective, should we consider that research software is significantly different from 'regular' software, in the sense that it goes through an indefinite code freeze from the moment it is 'released?' It seems fair (and it is sufficient for reproducibility) that, for a given study, research artifacts would be published as a snapshot only. Note that the TGMM software provides Docker images [15].

This presentation definitely brings more questions than it offers answers. We look forward to hearing what the audience could share in terms of ideas, resources, and experiences. We would love to know of other attempts at reproducing published open research. We suspect there is a special scenario in which the person reproducing the work is the future self of the original author!

[1] https://doi.org/10.1016/j.cell.2018.09.031
[2] https://www.outreachy.org/ (accessed 2024-05-25)
[3] "The Image Data Resource (IDR) is a public repository of image datasets from published scientific studies, where the community can submit, search and access high-quality bio-image data." https://idr.openmicroscopy.org (accessed 2024-05-25)
[4] https://idr.openmicroscopy.org/about/download.html (accessed 2024-05-25)
[5] https://github.com/IDR/idr.openmicroscopy.org/pull/193 (accessed 2024-05-25)
[6] https://forum.image.sc/t/permission-denied-when-trying-to-download-idr-openmicroscopy-at-fasp/88907 (accessed 2024-05-25)
[7] KLB 2.0 block-based lossless image compression file format https://bitbucket.org/fernandoamat/keller-lab-block-filetype (accessed 2024-05-25)
[8] https://gerbi-gmb.de/2023/10/02/next-generation-file-formats-for-bioimaging/ (accessed 2024-05-25)
[9] https://docs.dask.org/en/latest/generated/dask.array.from_zarr.html (accessed 2024-05-25)
[10] https://github.com/bhoeckendorf/pyklb (accessed 2024-05-25)
[11] https://github.com/bhoeckendorf/pyklb/issues/11 (accessed 2024-05-25)
[12] https://github.com/mkcor/repro-tgmm (accessed 2024-05-25)
[13] https://bitbucket.org/fernandoamat/tgmm-paper/src/master/doc/ (accessed 2024-05-25)
[14] https://bitbucket.org/fernandoamat/tgmm-paper/pull-requests/ (accessed 2024-05-25)
[15] https://bitbucket.org/fernandoamat/tgmm-paper/src/master/doc/new/docs/user-guide/docker.md (accessed 2024-05-25)


Abstract as a tweet:

Have you ever tried reproducing data-intensive scientific workflows published as open research?

Category [Data Science and Visualization]:

Reproducibility

Expected audience expertise: Domain:

some

Expected audience expertise: Python:

some

Public link to supporting material:

https://idr.openmicroscopy.org/webclient/?show=project-502

Project Homepage / Git:

https://github.com/mkcor/repro-tgmm

Marianne Corvellec is a core developer of scikit-image, a popular Python library for scientific image processing, where she specializes in biomedical applications. Her technical interests include data science workflows, data visualization, and best practices from testing to documenting. She holds a PhD in statistical physics from École normale supérieure de Lyon, France. Since 2013, she has been a regular speaker and contributor in the Python, Carpentries, and FLOSS communities.

This speaker also appears in: