Neural Search Comes to Apache Solr: Approximate Nearest Neighbor, BERT and More (Buzzwords)!
2022-06-13 , Kesselhaus

The first integrations of machine learning techniques with search allowed to improve the ranking of your search results (Learning To Rank) - but one limitation has always been that documents had to contain the keywords that the user typed in the search box in order to be retrieved.
For example, the query “tiger” won’t retrieve documents containing only the terms “panthera tigris”.
This is called the vocabulary mismatch problem and over the years it has been mitigated through query and document expansion approaches.
Neural search is an Artificial Intelligence technique that allows a search engine to reach those documents that are semantically similar to the user’s query without necessarily containing those terms; it avoids the need for long lists of synonyms by automatically learning the similarity of terms and sentences in your collection through the utilisation of deep neural networks and numerical vector representation.
This talk explores the first Apache Solr official contribution about this topic, available from Apache Solr 9.0.
During the talk we will give an overview of neural search (Don’t worry - we will keep it simple!): we will describe vector representations for queries and documents, and how Approximate K-Nearest Neighbor (KNN) vector search works.
We will show how neural search can be used along with deep learning techniques (e.g, BERT) or directly on vector data, and how we implemented this feature in Apache Solr, giving usage examples!
Join us as we explore this new exciting Apache Solr feature and learn how you can leverage it to improve your search experience!

The Search track is presented by OpenSource Connections


Get your ticket now!

Register for Berlin Buzzwords in our ticket shop! We also have online tickets and reduced tickets for students available and you can find more information about our Diversity Ticket Initiative here!

Alessandro Benedetti is director and R&D Software Engineer at Sease Ltd.
His focus is on information retrieval, information extraction, natural language processing, and machine learning.
At Sease Alessandro is working on Search/Machine learning R&D and consultancies.
When he isn't on clients' projects, he is actively contributing to the open-source community and presenting the applications of leading-edge techniques in real world scenarios at meet-ups and conferences such as ECIR, the Lucene/Solr Revolution, ApacheCon, Haystack, FOSDEM, and Open Source Summit.