2021-07-28 –, Blue
Modern databases can choose between two approaches to evaluating queries with high performance: Query Compilation compiles each query to optimized machine code, while Vectorization interprets queries using BLAS-style primitives.
Query compilation offers more optimization potential for LLVM, while vectorization doesn’t require runtime compilation.
We explain how these techniques work and how we combine them, showcasing how Julia lets us have the best of both.
In modern (SQL) database query engines, there are two major approaches on how to evaluate user-provided queries in a highly performant manner (see e.g. [1]):
Query Compilation: Each pipeline of a query plan gets compiled into a single function that effectively fuses operators into a single (nested) for-loop. This function is then compiled to highly-optimized machine code. Operators process data tuple-at-a-time.
Vectorization: The query plan is interpreted, and each operator in the plan is mapped to a pre-compiled function. To offset the arising interpretation cost, each operator evaluates batches ("vectors") of, say, 1000 values in bulk on each step.
Query compilation offers more optimization potential for LLVM and is often effective at keeping values in registers, while vectorization enables shorter compilation times — for better support of interactive queries. As part of the production-grade RelationalAI Knowledge Graph Management System, we implemented both approaches in Julia.
In this presentation, we explain in greater detail how both of these fundamentally different techniques work, why we are implementing them, and how we aim to combine them. We showcase where Julia enabled us to implement highly performant code with ease, but also reveal where we had to spend non-trivial amounts of engineering effort to arrive at the desired performance.
Richard Gankema is a Computer Scientist at RelationalAI, working on various systems-related topics such as data structures, memory management and query execution. Before joining RelationalAI he worked as a PhD candidate at CWI’s Database Architectures group in Amsterdam, which sparked his interest in vectorization and other techniques for optimizing the performance of data-processing.