2021-07-28, 13:30–14:00 (UTC), Purple
Dictionaries.jl presents an alternative interface for dictionaries in Julia, for improved productivity and performance. During this talk we'll learn how to use Julia's data manipulation tools (such as indexing, broadcasting,
reduce, etc) with dictionaries and explore some implementation decisions made in this package. We will end with applications, including recent work on tabular data with primary and/or grouping keys.
This talk will be divided into roughly three sections.
Julia is an awesome language for manipulating data, with excellent built-in functionality that is easy to extend by packages and users. Arrays, sets and dictionaries form the basis of core data structures necessary for a wide range of workloads. Of the three, Julia's
AbstractArray interface is most extensive — supporting a wide range of data structures and a rich set of operations, designed around the same core set of functionality (primarily, indexing and iteration).
On the other hand,
AbstractDict do not support such a wide range of operations (like broadcasting,
reduce) and the interface for a user to create a new, fully-functional
AbstractDict is not as clear cut or simple as it is for an array. Dictionaries.jl is an attempt to remedy this situation, by applying the learnings of the
AbstractArray interface to create a new
AbstractDictionary interface. By using Dictionaries.jl, users can experience improved programmer productivity as well as signficantly faster execution for many operations (especially with analytics-style workloads).
AbstractDictionary differs from Julia's
AbstractDict in three main ways:
- Dictionaries iterate values, like an array, instead of key-value pairs.
- The indices (or
keys) of dictionary are a special kind of dictionary, much like the indices (or
keys) of an array is a special kind of array.
- Dictionaries (and their indices) by default iterate in a well-defined order based on insertion, rather than quasi-randomly based on how the hashmap was constructed.
Since the indices of a dictionary are distinct, they naturally form a set (in the mathematical sense). In Dictionaries.jl this type is represented by
AbstractIndices <: AbstractDictionary (unlike for
Dict, there was no obvious alternative spelling of
Set available). Every
AbstractIndices has the special property that it's values are the same as it's keys, so if
i ∈ indices then
indices[i] is just
From these alone, natural definitions of
reduce follow, for both indices (i.e. "sets") or dictionaries. For example, the
map operation preserves the indices and maps the values to new values. You can even
map a set of indices/keys into a new dictionary.
This property leads to Dictionaries.jl's primary efficiency gain. The indices of dictionaries (i.e. the expensive part) can be shared between different dictionaries. For example, the provided hashmap
Dictionary can share its hash
Indices with other dictionaries. The values are stored in a (mostly) dense array, so operating on all the values with an operation like
reduce is as fast as it is for a similarly sized array. One can use
broadcast with multiple similar dictionaries (i.e. those that share compatible "tokens"), co-iterating values together at speed and with zero hash lookups.
We will turn our attention to some example applications of this interface, first highlighting the convenience and speed of
Dictionary for some common analytics tasks.
We will then see how Dictionaries.jl is used in conjunction with other packages, for example how SplitApplyCombine.jl's
group operation now supports a simple and easy split-apply-combine workflow.
Finally, we will explore recent work in TypedTables.jl which uses Dictionaries.jl to provide a table with a primary key (enabling easy lookup of rows based on data rather than an array index). The idea is similarly extended to grouped/partitioned data tables and their grouping keys.
I first used Julia to perform research during postdocs in quantum simulation, before using it in industry with a team working on various geospatial applications. Nowadays we use Julia at ELARA AI to simulate, analyze and optimize real-world businesses. I have contributed packages including StaticArrays, CoordinateTransformations, Rotations, TypedTables, SplitApplyCombine, AcceleratedArrays and Dictionaries.