GigaSOM.jl: Huge-scale, high-performance flow cytometry clustering in Julia
2019-07-25 , Elm A

Flow cytometry clustering for several hundred million cells has long been hampered by software implementations. Julia allows us to go beyond these limits. Through the high-performance GigaSOM.jl package, we gear up for huge-scale flow cytometry analysis.


Recent advances in single-cell technologies offer an unprecedented opportunity to comprehensively characterize the immune system, revealing a previously unparalleled complexity in the phenotype and function of immune cells. Mass cytometry, also known as CyTOF, was recently implemented to measure up to 40 different markers in several million single cells. A typical clinical study with hundreds of patients can therefore include billions of single cells (rows) and up to 40 markers (features).
Different dimension reduction methods have been implemented in commercial and open-source software, mainly written in R. The machine learning algorithm FlowSOM [1] is based on the famous Kohonen Self Organising Feature Maps (SOM) [2] and has shown various advantages over other methods.
However, all current implementations have a critical limitation on the total number of cells to be analyzed . This limitation often blocks the analysis of large-scale clinical studies with several hundred million cells.
Here, we present the open-source, high-level, and high-performance package GigaSOM.jl (https://github.com/LCSB-BioCore/GigaSOM.jl), which is HPC-ready and is written to handle very large datasets without limits. Julia is the natural language of choice when it comes to performing huge-scale cytometric analyses. With the GigaSOM.jl package, the possibilities for flow cytometry analysis are further broadened. The quality of the software package is assured using ARTENOLIS (https://artenolis.lcsb.uni.lu) [3]. Biological validation of the results will be performed on downsampled datasets by comparison to conventional implementations of the FlowSOM package and manual hierarchical analysis.

References

[1] Sofie Van Gassen, Britt Callebaut, Mary J. Van Helden, Bart N. Lambrecht, Piet Demeester, Tom Dhaene and Yvan Saeys. FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 2015, volume 87.7 (p. 636-645)
[2] Kohonen T. The self-organizing map. Proc IEEE 1990;78:1464–1480
[3] Heirendt, Laurent; Arreckx, Sylvain; Trefois, Christophe; Yarosz, Yohan; Vyas, Maharshi; Satagopam, Venkata P.; Schneider, Reinhard; Thiele, Ines; Fleming, Ronan M. T., ARTENOLIS: Automated Reproducibility and Testing Environment for Licensed Software, arXiv:1712.05236.


Co-authors:

oliver.hunewald@lih.lu, antonio.cosma@lih.lu, christophe.trefois@uni.lu, fanny.hedin@lih.lu, vasco.verissimo@uni.lu, reinhard.schneider@uni.lu, markus.ollert@lih.lu, feng.he@lih.lu

Laurent Heirendt was born in 1987 in Luxembourg City, Luxembourg (Europe). He received his BSc in Mechanical Engineering from the Ecole Polytechnique Fédérale de Lausanne, Switzerland in 2009. A year later, he received his MSc in Advanced Mechanical Engineering from Imperial College London in the UK, where his research and thesis focused on developing a general dynamic model for shimmy analysis of aircraft landing gear that is still in use today. He received his PhD in 2014 in Aerospace Science from the University of Toronto, Canada. He developed a thermo-tribomechanical model of an aircraft landing gear, which led to a patent pending design of a critical aircraft landing gear component. He then worked in industry and oversaw the structural analysis of large aircraft docking structures.

Laurent currently works as a Research Associate at the Luxembourg Centre for Systems Biomedicine. His work focuses on responsible and reproducible research science and scientific computing applications using Julia. Besides his mother tongue Luxembourgish, he is fluent in English, French and German, and he is actively learning Brazilian Portuguese.