PyData Boston 2025

LLMOps in Practice: Building Secure, Governed Pipelines for Large Language Models
2025-12-10 , Thomas Paul

As organizations move from prototyping LLMs to deploying them in production, the biggest challenges are no longer about model accuracy - they’re about trust, security, and control. How do we monitor model behavior, prevent prompt injection, track drift, and enforce governance across environments?

This talk presents a real-world view of how to design secure and governed LLM pipelines, grounded in open-source tooling and reproducible architectures. We’ll discuss how multi-environment setups (sandbox, runner, production) can isolate experimentation from deployment, how to detect drift and hallucination using observability metrics, and how to safeguard against prompt injection, data leakage, and bias propagation.

Attendees will gain insight into how tools like MLflow, Ray, and TensorFlow Data Validation can be combined for ** version tracking, monitoring, and auditability**, without turning your workflow into a black box. By the end of the session, you’ll walk away with a practical roadmap on what makes an LLMOps stack resilient: reproducibility by design, continuous evaluation, and responsible governance across the LLM lifecycle.


As Large Language Models move from experimentation to enterprise deployment, organizations face new challenges around governance, observability, and trust. Traditional MLOps pipelines aren’t designed for the dynamic, prompt-driven behavior of LLMs - making them prone to drift, compliance risks, and unmonitored model changes.

This talk introduces a three-tier LLMOps framework that integrates experimentation, monitoring, and governance into a single, reproducible system. We’ll discuss how to isolate model environments for safe iteration, track lineage through MLflow, detect drift with TensorFlow Data Validation, and scale inference using Ray. We’ll also examine how to integrate security and compliance layers, including prompt-injection prevention, bias checks, and access controls.

By the end, attendees will have a practical blueprint for operationalizing LLMs responsibly—ensuring each deployment is transparent, auditable, and production-ready.

Outline

  • Overview of the three-tier LLMOps environment model (sandbox → runner → production) for controlled experimentation and deployment.

  • The observability stack: MLflow for experiment tracking and lineage, TFDV for drift detection, Ray for distributed scaling.

  • The governance and security layer: prompt-injection prevention, access control, bias monitoring, and compliance logging.

  • Case illustration showing data flows, feedback loops, and monitoring dashboards in a live LLM lifecycle.

  • Key takeaways and best practices for responsible, reproducible deployment.

Prior Knowledge Expected

Intermediate Python and machine learning familiarity; basic understanding of model deployment workflows or MLOps concepts.

Keywords

LLMOps, observability, governance, prompt injection, model monitoring, MLflow, Ray, TensorFlow Data Validation, responsible AI, compliance


Prior Knowledge Expected: No previous knowledge expected

Siddharth Shankar is a Machine Learning Engineer working at Mphais.AI. His current work focuses on multimodal fine-tuning for mortgage and investment banking. Before entering financial AI, he worked on optimization modeling for aviation operations and developed MLOps pipelines that enabled scalable, reproducible machine learning deployment across complex systems.

He earned his Master’s in Computer Science and Information Systems from the University of Maryland, where his research interests lied in the intersection between Machine Learning and Human Computer Interaction.

Siddharth is passionate about designing AI systems that are not just accurate or efficient, but also trustworthy, compliant, and production-ready.