Shaped Data with Acsets
07-29, 17:00–17:30 (UTC), Blue

Acsets are a novel infrastructure for handling data of different shapes, based on category theory and implemented in Catlab.jl. Acsets generalize both graphs and dataframes, and allow a much more general approach to data manipulation than was previously available. We will discuss both the mathematics of acsets and some of the metaprogramming techniques we used to implement them in Julia. Finally, we will give examples of how acsets have been key in developing many projects in AlgebraicJulia.


Any practicing data scientist can tell you that all the munging going on between data acquisition and mathematical algorithm is a huge time sink. This is especially evident when the data does not fall into the traditional model of the dataframe. If one is lucky, it is shaped like a graph, and one can use a graph data structure and graph algorithms to analyze it. However, more generally, there are many more "shapes" of data, that must either be put into adhoc data structures or shoehorned into general-purpose data structures.

In Catlab, we have built a general infrastructure for differently-shaped data based on a category-theoretic framework for databases as functors that we call "Attributed C-Sets" (acsets for short).

The acset infrastructure is made possible by a novel use of the Julia macro and type system, which would be difficult-to-untenable in most other languages. First "schemas" for acsets are generated by macros. Then, more macros are used to transform these schemas into custom structs. Finally, we use @generated functions to specialize generic operations to these custom structs.

This approach gives us performance comparable to popular data solutions like DataFrames.jl and LightGraphs.jl, while remaining fully generic. The acset infrastructure is used pervasively throughout the AlgebraicJulia ecosystem because of the flexibility, expressivity, and performance features.

In our talk, we will give an overview of the mathematical and computational innovations necessary to implement the acset infrastructure, as well as examples of practical applications of acsets, and a reflection on how acsets have become an essential part of AlgebraicJulia.

I am a master's student at Utrecht University, and a contributor for Catlab.jl