JuliaCon 2025

Using arrays as lightweight tables: Base and DataManipulation.jl
2025-07-24 , Main Room 3

Julia stands out by enabling convenient tabular data manipulation without specialized types: built-in arrays, with their versatility, fit perfectly. This approach seamlessly extends beyond flat tables and to out-of-memory datasets while maintaining simplicity and performance. In this talk, I'll explore the tabular-like functions available in Julia, from foundational map to advanced pivoting and joins, and their design.


Base Julia provides foundational tools for data processing, such as map and filter. DataManipulation.jl, along with its companion packages, builds upon these by extending functionality: more general mapping, grouping, reshaping, pivoting, and more. These functions remain composable and support datasets represented as basic Julia collections – most commonly, arrays.

Using simple data structures like arrays and tuples for data manipulation has numerous benefits. It leverages generic Julia functions, enhancing code readability and avoiding the focus on container-specific solutions. With the diversity of Julia array types, decisions on storage formats, such as row- or column-based, can be made independently of data pipeline design. Additionally, the collection interface is highly generic, enabling natural extensions beyond flat in-memory tables to nested structures and SQL tables.

I'll also address some rough edges in the current landscape, such as missing functionalities or suboptimal performance in specific cases. Many of these issues are not intrinsic to the design and can be improved while maintaining generality and extensibility.