Arghyadeep Sarkar PyData London 2026

Arghyadeep Sarkar
.ical

Arghyadeep Sarkar is a Senior Data Scientist at Red Hat with ~8 years of experience in data science and artificial intelligence. His career has evolved from traditional machine learning to architecting large-scale Generative AI and LLM-based production systems.

He built strong foundations in statistical modeling, ML pipelines, and applied AI, later specializing in deep learning, NLP, transformers, and Generative AI. He has designed and deployed LLM agents, RAG-based systems, and enterprise conversational platforms, covering the full lifecycle from training and fine-tuning to scalable deployment.

Current Focus

Building reliable agentic AI systems
Improving retrieval grounding and RAG quality
Deploying LLMs and SLMs in production
Delivering scalable, cost-efficient enterprise AI solutions

He brings a system-first engineering mindset, translating cutting-edge AI research into robust real-world products.

Session

06-07

11:00

45min

The Silent Crash: Why Your RAG Evaluation Metrics Are Lying to You

Hitendri Bomble, Arghyadeep Sarkar

We rely on dashboards to tell us if our RAG system is working. But most standard metrics, Cosine Similarity, BLEU, and even BERTScore, are fundamentally broken for measuring factual correctness. They measure text overlap or semantic drift, not truth.

This means you can have a "90% Accurate" system on paper that hallucinates dangerous misinformation in production. This talk dismantles the current state of RAG evaluation. We will look at why "Golden Datasets" are often contaminated, why "LLM-as-a-Judge" is biased towards its own output, and how to build a robust, adversarial evaluation pipeline that actually catches failures before your users do.

Grand Hall 1

Arghyadeep Sarkar .ical

Current Focus

Session

Arghyadeep Sarkar
.ical