2025-08-29 –, Room 1 (Main Room)
ScramVM compiles SQL to bytecode and executes queries via a specialized virtual machine with morsel-based parallelism. We'll explore how this Rust-built engine achieves high performance through intelligent work distribution, core-pinned thread pools, and optimized memory access patterns, sharing lessons learned implementing a production-grade query engine.
When we started building ScramDB, we knew that the heart of any database system is its query execution engine. Rather than following a traditional interpreter approach, we designed ScramVM - a bytecode-based execution engine written in Rust that prioritizes throughput and CPU utilization.
ScramVM transforms SQL queries through multiple compilation stages: parsing, logical planning, physical planning, and finally bytecode generation. This approach gives us two major advantages. First, we can cache the compiled bytecode, dramatically speeding up repeated queries. Second, we can optimize execution at multiple levels, from query planning to the way we schedule work across CPU cores.
The most interesting challenge we tackled was parallelization. We implemented a morsel-driven execution model where the system breaks data into right-sized chunks ("morsels") and distributes them across worker threads. Our custom thread pool pins workers to specific CPU cores for better cache locality and NUMA awareness. This approach let us maximize hardware utilization without the complexity of managing distributed state across nodes.
One unexpected lesson was the importance of balancing abstraction with performance. For our CPU-bound query execution, we found that a specialized approach with careful thread management gave us better performance characteristics. We took our time testing various synchronization approaches and ultimately settled on an architecture where a coordinator dispatches work to workers via a custom scheduler that maintains awareness of the underlying hardware.
I'll share our experience with real-world queries, showing how these design decisions impact performance across different workload types. I'll also discuss the hardest bugs we encountered (stack management issues, anyone???) and how Rust's safety guarantees helped us avoid entire classes of common database engine errors while still allowing the low-level control we needed for optimal performance.
As a Rust pioneer since 2011, I've been deeply involved with the language from its earliest days before version 1.0. My compiler contributions have focused particularly on SIMD optimizations and MIRI improvements, helping strengthen Rust's performance and safety guarantees. I've also worked extensively on Rust's internationalization (I18N) efforts, making the project site's language translations available to global users and helping expand Rust's accessibility worldwide.
I have authored numerous open source Rust libraries focused on distributed systems and database management, providing the community with battle-tested solutions for production environments. These libraries reflect my deep expertise in building reliable, high-performance distributed systems in Rust.