JuliaCon 2022 (Times are UTC)

A Data Integration Framework for Microbiome Research
07-27, 15:40–15:50 (UTC), Red

Standardized data objects can greatly support the collaborative development of new data science methods. In particular, commonly agreed data standards will provide improved efficiency and reliability in complex data integration tasks. We demonstrate the application of this framework in the context of microbiome research.


Microorganisms shape every aspect of our life: from the soil of our farmland to the human gut, from the ocean to the municipal wastewater of our cities, microorganisms seem to inhabit and even dominate most ecosystems of this planet. As we expand our knowledge on the role that microbes play within and beyond our bodies, the need arises to store and analyze such information in a systematic and reproducible manner.

Standardized data objects can greatly support the collaborative development of new data science methods. In particular, commonly agreed data standards will provide improved efficiency and reliability in complex data integration tasks. We implement this approach in microbiome research in the a new Julia package, MicrobiomeAnalysis.jl (MIA), which introduces a new approach for microbiome data integration and analysis based on state-of-the-art data containers designed for robust data integration tasks: SummarizedExperiments.jl (SE) and MultiAssayExperiment.jl (MAE).

Our approach provides a general framework to study complex microbiome profiling data sets. Not only do the data containers make it instinctive to work with abundance assays, but they also integrate those assays with the corresponding metadata into a comprehensive data object. We demonstrate the approach based on common analysis tasks in microbial ecology, including alpha and beta diversity analysis and visualization of microbial community dynamics. The proposed approach is inspired by closely related and active efforts in R/Bioconductor. Developing a similar framework in the Julia language is a promising endeavour that can provide drastic performance improvements in certain computational tasks, such as dimension reduction and time series analysis while taking advantage of a shared conceptual framework.

Overall, our environment offers the starting point for developing effective standardized methods for microbiome research. The methodology is general, thus it can be easily applied to other multi-source study designs and data integration tasks.

Bioengineering student at Rhine-Waal University, Germany, and former research trainee at Turku University, Finland, where he contributed to FdeSolver.jl and MicrobiomeAnalysis.jl under the supervision of the Turku Data Science Group. Interested in bioinformatics and data science.