PyData Boston 2025

No Cloud? No Problem. Local RAG with Embedding Gemma
2025-12-10 , Thomas Paul

Running Retrieval-Augmented Generation (RAG) pipelines often feels tied to expensive cloud APIs or large GPU clusters—but it doesn’t have to be. This session explores how Embedding Gemma, Google’s lightweight open embedding model, enables powerful RAG and text classification workflows entirely on a local machine. Using the Sentence Transformers framework with Hugging Face, high-quality embeddings can be generated efficiently for retrieval and classification tasks. Real-world examples involving call transcripts and agent remark classification illustrate how robust results can be achieved without the cloud—or the budget.


Large language model workflows often rely on expensive cloud services or powerful GPUs, which can be a barrier for smaller teams and individual practitioners. Embedding Gemma changes that by offering a compact, high-quality embedding model that runs efficiently on local machines.

This session demonstrates how to build practical Retrieval-Augmented Generation (RAG) and text classification pipelines using Embedding Gemma—no cloud infrastructure required. The Sentence Transformers library is used to generate embeddings through Hugging Face, making it simple to plug the model into existing Python workflows.

This talk will cover:
• Introduction to Embedding Gemma and how it differs from larger models like Gemma.
• Local embedding generation using Sentence Transformers with Hugging Face.
• Practical applications for RAG and Classification on Call Transcripts and Agent Remarks.
• Comparative insights on performance, memory usage, and trade-offs versus larger cloud-based models.
• Practical takeaways for designing cost-effective NLP systems without relying on the cloud.


Prior Knowledge Expected: Previous knowledge expected

As a Principal Data Scientist at Verizon, I deliver innovative and impactful data solutions for various business units and functions. I have over seven years of experience in data science, with a focus on Machine Learning, Artificial Intelligence, NLP, Gen AI, Time Series analysis, Visualization, Geospatial analysis, and Statistical Analysis (A/B Testing).

My mission is to leverage data and analytics to solve complex and challenging problems, optimize processes and performance, and generate actionable insights and recommendations. I use Python, SQL, GCP, Tableau, and Git as my main tools to develop, deploy, and monitor data models and pipelines. I also collaborate with cross-functional teams and stakeholders to understand their needs, communicate results, and provide data-driven guidance. I am passionate about learning new skills and technologies, and sharing my knowledge and expertise with others.