PyCon DE & PyData 2025

Lessons learned in bringing a RAG chatbot with access to 50k+ diverse documents to production
2025-04-24 , Titanium3

Retrieval-Augmented Generation (RAG) chatbots are a key use case of GenAI in organizations, allowing users to conveniently access and query internal company data. A first RAG prototype can often be created in a matter of days. But why are the majority of prototypes still in the pilot stage? [1]

In this talk we share our insights from developing a production-grade chatbot at Merck. Our RAG chatbot for R&D experts accesses over 50,000 documents across numerous SharePoint sites and other sources. We identified three key success factors:
1. Developing a robust data pipeline that syncs documents from source systems and that handles enterprise features such as replicating user permissions.
2. Establishing a comprehensive evaluation framework with a clear optimization metric.
3. Driving adoption through an onboarding training and ongoing user engagement, such as regular office hours.

We think that many of these lessons are broadly applicable to RAG chatbots, making this talk valuable for practitioners aiming to implement GenAI solutions in business contexts.


Building a prototype RAG chatbot with frameworks like LangChain can be straightforward. However, scaling it into a production-grade application introduces complex challenges. In this talk, we share our lessons learned from developing a RAG chatbot designed to assist research and development (R&D) experts.

Our chatbot was developed to effectively handle and provide access to a large collection of unstructured knowledge, consisting of over 50,000 documents stored across more than 20 SharePoint sites and other sources. We faced significant hurdles in:
- Data Pipeline Engineering: Crafting a modular and scalable pipeline capable of periodically syncing documents, handling dynamic user permissions, and efficiently processing large volumes of unstructured data.
- Evaluation Framework Development: Implementing an effective testing strategy without the availability of static ground truth data. We employed automated testing with frameworks like pytest, utilized LLM-as-a-judge, and integrated tracing to iteratively refine our dataset and maintain high answer quality.
- RAG Design and Prompting Strategies: Addressing challenges in document chunking, citation integration, reranking retrieved results, and applying permission and PII filters to ensure compliance and accuracy in responses.
- User Adoption: Driving user adoption through onboarding training and ongoing engagement, such as regular office hours and feedback mechanisms.

We emphasize the importance of applying data science principles to GenAI projects:
- Start Simple and Iterate: Begin with a basic implementation as a baseline and iteratively enhance functionality based on testing and user feedback.
- Test-Driven Development: Identify key test scenarios early and use them to drive development, ensuring that improvements are measurable and aligned with growing user needs.
- Focus on Key Metrics: Establish clear metrics to optimize against, aiding in making informed decisions throughout the development process.

Main Takeaways for the Audience:
- Understand the critical role of robust, modular data pipelines in handling dynamic and unstructured data sources for LLM applications.
- Learn strategies for developing effective evaluation frameworks in complex domains where traditional ground truth data may be lacking.
- Gain insights into advanced RAG design techniques that enhance chatbot performance and reliability.
- Recognize the substantial data engineering and software development efforts required to transition a prototype to a production-grade LLM solution.

By sharing our experiences, attendees will gain practical insights into deploying robust RAG chatbots, transforming a functional prototype into a reliable, scalable application that fulfills enterprise requirements.


Expected audience expertise: Domain:

Advanced

Expected audience expertise: Python:

Novice

Bernhard is a Senior Data Scientist at Merck. For more information you can connect with him on LinkedIn. :-)