PyData London 2026

Arghyadeep Sarkar

Arghyadeep Sarkar is a Senior Data Scientist at Red Hat with ~8 years of experience in data science and artificial intelligence. His career has evolved from traditional machine learning to architecting large-scale Generative AI and LLM-based production systems.

He built strong foundations in statistical modeling, ML pipelines, and applied AI, later specializing in deep learning, NLP, transformers, and Generative AI. He has designed and deployed LLM agents, RAG-based systems, and enterprise conversational platforms, covering the full lifecycle from training and fine-tuning to scalable deployment.

Current Focus
  • Building reliable agentic AI systems
  • Improving retrieval grounding and RAG quality
  • Deploying LLMs and SLMs in production
  • Delivering scalable, cost-efficient enterprise AI solutions

He brings a system-first engineering mindset, translating cutting-edge AI research into robust real-world products.


Session

06-07
11:00
45min
The Silent Crash: Why Your RAG Evaluation Metrics Are Lying to You
Hitendri Bomble, Arghyadeep Sarkar

We rely on dashboards to tell us if our RAG system is working. But most standard metrics, Cosine Similarity, BLEU, and even BERTScore, are fundamentally broken for measuring factual correctness. They measure text overlap or semantic drift, not truth.

This means you can have a "90% Accurate" system on paper that hallucinates dangerous misinformation in production. This talk dismantles the current state of RAG evaluation. We will look at why "Golden Datasets" are often contaminated, why "LLM-as-a-Judge" is biased towards its own output, and how to build a robust, adversarial evaluation pipeline that actually catches failures before your users do.

Grand Hall 1