In Layman’s TermsQuery: A walk through the life of a Search Query
06-14, 10:40–11:20 (Europe/Berlin), Palais Atelier

Developers usually approach Apache Lucene with a black box mindset- queries go in, ranked search results come out. Most of us start simple with term queries, then move to boolean queries made up of many sub-queries. It doesn't take long for a well-intentioned Search Engineer to end up delivering painfully slow search queries returning horrifyingly irrelevant results.

In this talk, I will walk us through the internal execution of a Lucene search query from start to finish.

First, we'll learn the internal data structures unique to Lucene. We'll go over inverted indices, columnar storage, Finite State Transducers, and more. We'll dive into how they're optimized and stored for maximum performance.

We’ll now put our newly acquired knowledge to the test. Starting with an IndexSearcher, we'll see how Lucene optimizes our query through query rewriting. Next, we'll see how concurrent query execution takes advantage of the "embarrassingly parallel" task of iterating over index segments. Finally, we'll learn how result collection, ranking, and re-ranking of TopN search results give us what we were looking for in a nicely organized list.

After this talk, you will better understand and appreciate the moving parts behind modern search engines. You can bring this knowledge to your work, creating faster and more relevant search results of your own.

Conor is a Senior Software Engineer in Montreal, Canada currently working on real-time ledgers at Wealthsimple. Previously they worked on scaling scaling search infrastructure & building discovery experiences at Shopify.

Conor is enjoys diving deep into database internals, learning as much as possible, then telling the world about it.