2025-07-24 –, Main Room 6
Functions like map
or filter
in Julia perform well on containers with concrete element types, such as Vectors-of-NamedTuples or StructArrays, used in typical tabular data. However, dealing with hundreds or thousands of columns can overwhelm the compiler. DictArrays aims to get the best of both worlds by delivering the familiar, efficient collection API to type-unstable collections, optimizing both compilation and runtime performance.
Julia code often leverages map
or filter
for processing collections, performing well with containers that have concrete element types. Common tabular data structures include Vectors-of-NamedTuples for row-based storage and StructArrays for columnar storage. Yet, with a large number of columns, concretely typed collections become unwieldy for the compiler.
DictArrays is an attempt to provide the same performant collection API Julia users know and love for type-unstable collections. Think of it like StructArrays, but with Dictionaries instead of NamedTuples.
In this talk, I'll explore the design and the implementation strategy that ensure compilation speeds akin to Dictionaries or DataFrames while maintaining fast runtime performance for elementwise operations, like in Vectors or StructArrays. I'll also address fundamental challenges, current limitations, and potential future enhancements, inviting community feedback for further refinement.
Astrophysicist – Postdoctoral Fellow at Harvard University.