Transducers: data-oriented abstraction for sequential and parallel algorithms on containers
Transducers are composable algorithms that operate on collections of inputs. This concept is first introduced in Clojure language by Rich Hickey for a fully reusable code for mapping, filtering, concatenation, and similar operations that can be modeled a succession of steps. By this nature, transducers superficially look like iterators that are used by the majority of programming languages for a similar purpose. However, the protocol used by transducers is quite different from iterators and results in different characteristics:
(1) Transducers are driven by a "generalized" foldl
function. It can implement a specialized looping strategy that is most friendly to the way the data is laid out in memory for a given collection (e.g., two nested loops for vector-of-vectors).
(2) Some transducers like Map
, Filter
, Cat
and Scan
can support parallel execution. Importantly, this is done without re-writing any of the code for those transducers.
(3) The code composed by transducers is close to the way code is written manually using raw loops. It seems to result in a good machine code generation. This also means that enabling SIMD using the @simd
macro is straight forward.
In this talk, I explain the formalism of the transducers and discuss the pros and cons for Julia ecosystem based on my experience in implementing Transducers.jl.