PyConDE & PyData Berlin 2024

RAG for a medical company: the technical and product challenges
2024-04-22 , B05-B06

RAG (Retrieval Augmented Generation) is the process of querying a (large) set of documents with natural language, leveraging vector search and llms. While it has recently become widely accessible to develop a Proof-Of-Concept RAG using OpenAI and one of the various open-source contributions (e.g. langchain), building a performant RAG that brings value to users is challenging.
This talk will focus on learnings from building a RAG for a medical company, to allow doctors to query drug documentation with natural language, using tools like Chainlit, Qdrant and Langsmith.
Naturally, a product question emerged: how to effectively leverage LLMs that can never guarantee 100% accuracy in the health sector?
We will explain how we addressed this challenge, as well as the various technical improvements implemented to enhance both the retrieval (vector search) and generation (llm) metrics of our RAG.


RAG works as follows:

  • An embedding model is used to create representations of all documents. These representations are then stored in a vector database.
  • A user poses a question. The same embedding model is used to create a representation of this question, enabling the retrieval of the most similar documents through a similarity search.
  • These documents are incorporated into a prompt along with the question to generate an answer based on the documents' content.

Many open-source tools, such as Langchain, enable the creation of such pipelines in just few lines of code. However, without specific adjustments, such systems often do not perform well enough to gain user adoption.

In this talk, we will cover the challenges and learnings encountered while building a RAG for the drug documentation of a medical company. More specifically, we will:

  • Cover the basics of RAGs.
  • Present the use case we faced and showcase the resulting product.
  • Show how we significantly improved our retrieval and generation metrics with techniques such as leveraging LLMs to add extra context to the user's question to enhance retrieval accuracy.
  • Discuss how we designed the product to effectively utilize LLMs while ensuring that doctors are not misled by potentially erroneous information, such as hallucinations. We achieved this mostly by displaying the sources: while many RAG pipelines cite their sources, we went a step further by inserting HTMLs of the sources directly within the generated answers, along with highlighted citations.
  • Highlight the tooling aspect of the project, e.g. Langsmith (a logging tool for LLMs), allowed us to easily augment our initial dataset and ensure that users were interacting correctly with the product. Furthermore, the ability to replay/alter a prompt on the interface allowed the product owner to iterate on prompt engineering and assist with technical iterations using their field knowledge.

Expected audience expertise: Domain:

Advanced

Expected audience expertise: Python:

Novice

Abstract as a tweet (X) or toot (Mastodon):

While developing a Proof-Of-Concept RAG is widely accessible, creating a performant version that truly adds value remains a challenge. We willl share our learnings from building a RAG for a medical company, aiding doctors with drug documentation.

I am a Lead Data Scientist at Sicara, where I worked on a wide range of projects mostly related to vector databases, computer vision, prediction with structured data and more recently LLMs.
I am currently leading the GenAI development in the company.

Here the list of the talks I did:

Great Practices for RAG in Production @GenAI London Meetup

How to Choose a Vector Database in 2023 @DVC Meetup

Advanced Visual Search Engine with Self-Supervised Learning (SSL) @PyconDE et Pydata Berlin 2023

Great Practices for RAG in Production @GenAI Paris meetup

Generating Millions of text boxes with a GAN @Meetup Computer Vision Paris