PyCon Lithuania 2025

David Batista

I’m an experienced machine learning engineer and software developer, with a strong background in Natural Language Processing. I hold a Ph.D from 2016 where I focused on semantic relationship extraction. Currently, I'm based in Berlin and I work as a NLP Engineer and Software Developer at deepset, where I contribute to the development of Haystack, an open-source framework for building end-to-end production-ready LLM-based applications


Notable open source projects that you contribute to. Add URLs, one per line.

https://github.com/deepset-ai/haystack
https://github.com/deepset-ai/haystack-core-integrations
https://github.com/deepset-ai/haystack-experimental
https://github.com/MantisAI/nervaluate


Session

04-24
14:00
25min
Smarter Retrieval, Better Generation: Improving RAG Systems
David Batista

Good retrieval performance is key to an effective RAG system, as it ensures relevant information is selected, directly impacting augmentation and generation quality. My presentation focuses on RAG indexing and retrieval, exploring methods to convert text into searchable formats, comparing techniques, and analyzing their advantages, disadvantages, and performance on an annotated dataset to enhance document retrieval based on user queries.

Data Day - Apr 24
203