The future of Lucene's MMapDirectory: Why use it and what's coming with Java 19 and later?
2022-06-13 , Maschinenhaus

Since version 3 of Apache Lucene and Solr and from the early beginning of Elasticsearch, the general recommendation was to use MMapDirectory as the implementation for index access on disk. But why is this so important?

This talk will first introduce the user about the technical details of memory mapping and why using other techniques slows down index access by a significant amount. Of course we no longer need to talk about 32/64bit Java VMs - everybody uses now 64 bits with Elasticsearch and Solr, but with current Java versions, Lucene still has some 32bit-like limitations on accessing the on-disk index with memory mapping. We will discuss those limitations especially with growing index size up to terabytes, and afterwards, Uwe will give an introduction to the new Java Foreign Memory Access API (JEP 370, JEP 383, JEP 393, JEP 412, JEP 419), that first appeared with Java 14, but still incubating.

This talk will give an overview of the the foreign memory API to be finalized and released to general availability in Java 19 and will present the current state of implementation in Lucene 10. Uwe will show how future versions of Lucene will be backed by next generation memory mapping and what needs to be done to make this usable in Solr and Elasticsearch - bringing you memory mapping for indexes with tens or maybe hundreds of Terabytes in the future!

The Search track is presented by OpenSource Connections


Get your ticket now!

Register for Berlin Buzzwords in our ticket shop! We also have online tickets and reduced tickets for students available and you can find more information about our Diversity Ticket Initiative here!

Uwe is committer and PMC member of Apache Lucene and Apache Solr. His main focus is on development of Lucene Core. He implemented fast numerical search and is maintaining the new attribute-based text analysis API. He studied Physics at the University of Erlangen-Nuremberg and works as managing director for SD DataSolutions GmbH in Bremen, Germany, a company that provides consulting and support for Apache Lucene, Elasticsearch, and Apache Solr. He also works for “PANGAEA – Publishing Network for Geoscientific & Environmental Data” where he implemented the portal's geo-spatial retrieval functions with Lucene Java. Uwe had talks about Lucene at various international conferences like the previous Berlin Buzzwords, ApacheCon EU/US, Lucene Revolution, Lucene Eurocon, and various local meetups.