JuliaCon 2022 (Times are UTC)

Declarative data transformation via graph transformation
07-29, 17:00–17:30 (UTC), Blue

SQL is far from the only declarative paradigm for specifying the dynamics of data! The field of graph transformation formalizes a generalization of term rewriting systems that is visual, intuitive, and applicable to a wide array of data structures, including Catlab's ACSet datatypes. We will describe the basic theory of graph transformation and show how our implementation in Catlab.jl can be applied to e-graph equality saturation and general agent-based model simulations.

What data structure is able to characterize the process of transforming data? Certainly a pointer to a function is sufficient, but this is opaque, making static analysis difficult and lowering code intelligibility. SQL statements offer more transparency at the cost of some expressivity, yet this is still difficult to read and interpret as complexity scales. The graph transformation paradigm expresses data update, addition, and deletion in terms of rewrite rules, which are combinatorial structures characterizing patterns of a data for matching or replacement.

Graph rewriting techniques are specified in the notoriously abstract language of category theory, although historically concrete implementations have been restricted to labelled graphs. Catlab.jl is an applied category theory package which has recently added a performant implementation of graph rewriting (double, single, and sesqui pushout paradigms) at a novel level of generality, allowing a generic interface to the major application areas of graph rewriting: graph languages (rewriting defines a graph grammar), graph relations (rewriting is a relation between input and output data structures), and graph transition systems (rewriting evolves a system in time).

The scientific community particularly benefits from its code being interpretable and having a straightforward semantics. Just like many low level data manipulations are made safer and more transparent when redescribed as SQL operations, we argue that many data transformation applications would benefit from replacing arbitrary code with a set of declarative rewrite rules. We demonstrate this through some complex applications constructed from these building blocks. This includes an extended example of an epidemiological agent based model (in collaboration with epidemiologist Sean Wu, a research scientist at the Institute for Health Metrics and Evaluation) and an example of equational laws defining an e-graph data structure, with rewrite rules inducing an equality saturation procedure for free. The pattern-based API is easy to use and interpret - no category theory is needed to understand or use these tools!

Postdoctoral researcher at UF working with James Fairbanks.