Juliacon 2024

Scientific Data Minisymposium
2024-07-10 , Function (4.1)

Julia's elegance and speed lends itself to applications in scientific computing that requires a strong scientific data ecosystem. Building upon prior successful minisymposia on tabular data, the Scientific Data minisymposium will extend the scope to dealing with annotated, hiearchical, and n-dimensional data. Additionally, the symposium will invite talks on processing large datasets using advanced chunking techniques, distribution of big data via cloud computing and data formats.


A robust scientific data ecosystem is critical to Julia's applications in scientific computing. The ability to handle scientific data interchange allows Julia to interoperate with existing scientific projects which may be using code in other languages such as MATLAB, Python, C, C++, or Rust. Prior minisymposia on data have emphasized the use of Julia with tabular data in CSV files or web friendly data formats such as JSON. Julia's strong support for tabular data and JSON provide a strong foundation to build upon. Scientific data also extends into n-dimensional arrays with dimensions describing space, time, and other parameters such as wavelength that may not be well served these data formats. These datasets are organized into nested hiearchial groups and are annotated with standardized regular vocabulary.

Potential topics for the scientific data minisymposium are as follows.
1. Advanced input and output APIs for scientific data. This includes HPC interfaces such as Message Passing Interface (MPI), cloud data APIs, and interfaces with databases.
2. Data formats for scientific data interchange. Discussion of data formats such as HDF5, NetCDF, FITS, Zarr, ADIOS2, and ASDF or their incoporation into data schemes such as scverse would be welcome.
3. Data exchange between threads and (distributed) processes. This especially includes techniques that transfers data while retaining annotations and context.
4. The implementation of abstract interfaces over similar data interfaces. For example, implementing a general n-dimensional chunk based interface supporting multiple data formats.

Software Engineer, Scientific Computing Software at the Howard Hughes Medical Institute Janelia Research Campus

This speaker also appears in:

I am a physicist by training and am currently studying Global Biogeochemical Cycles in the Earth System using Remote Sensing, Meteorological and other data sets based at the Max-Planck-Institute for Biogeochemistry, Jena, Germany.
My first commit to my first Julia package dates back to the year 2012 and since then I have authored and contributed to packages in the Julia Geodata and processing ecosystem, examples are NetCDF.jl, Zarr.jl, DiskArrays.jl, YAXArrays.jl EarthDataLab.jl and others. Some may know me under my github tag @meggart

This speaker also appears in: