Daniele Antuzi

Daniele Antuzi is a software engineer passionate about high-performance data structures and algorithms. He has been working for 4 years in finance (List spa) and 2 years in cloud services (Amazon Web Services) but the curiosity to learn more about information retrieval brings him to join Sease Ltd.
He likes studying and experimenting with new technologies trying to reduce the gap between academia and industry.


Session

06-14
14:40
40min
Word2Vec model to generate synonyms on the fly in Apache Lucene
Daniele Antuzi, Ilaria Petreti

If you want to expand your query/documents with synonyms in Apache Lucene, you need to have a predefined file containing the list of terms that share the same semantic.
It's not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match with your contextual domain.
The term "daemon" in the domain of operating system articles is not a synonym of "devil" but it's closer to the term "process".

Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary.
Two words with similar meanings are identified with two vectors close to each other.

This talk explores our contribution to Apache Lucene that integrates this technique with the text analysis pipeline.
We will show how you can automatically generate synonyms on the fly from an Apache Lucene index and how you can use this new feature along with Apache Solr with practical examples!

The Search track is presented by OpenSource Connections

Search
Kesselhaus