2026-03-21 –, Yuchengco Hall 5th Flr. Y509 (Workshop Room 3)
Running AI inference locally, processing AI models on an organization’s own hardware, such as on-premises servers or devices, rather than relying on cloud-based services, has become an increasingly popular choice across various industries. The primary appeal lies in the enhanced control and security it offers over sensitive data. In this workshop session, we will learn how to build a completely local Agentic RAG without any external APIs, using an open source LLM such as Gemma or Mistral, along with the Qdrant vector search engine, and monitor the application using Comet OPIK.
With all the improvements and releases made in AI, should your data ever leave your machine? For many Python developers with an ML background, the answer is a resounding "no". Similarly, many organizations struggle with privacy and security while working with Large Language Model applications.
With open source/weights LLMs getting slowly better, in this workshop session, we will build a complete hands-on prototype of a local-first Agentic RAG system where you have complete sovereignty over your data, your models, and your entire tech stack. The end goal for this workshop is to understand how to manage sensitive data without relying on external APIs. I will also address how one can utilize the memory tradeoffs technique from the Qdrant, and further have the monitoring and tracing pipeline for the same application using Comet OPIK.
Tarun Jain is a Founding Engineer, Google Developer Expert in AI, and Qdrant Distinguished Ambassador. Tarun has contributed to Google Summer of Code 2024 at Red Hen Lab and Google Summer of Code 2023 at caMicroscope. He is a content creator on YouTube with the channel name: AI with Tarun.
