Ophelie Bleu PyData London 2026

Ophelie Bleu
.ical

I am a Senior Machine Learning Scientist at Monzo, where my main focus is around Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and sophisticated data augmentation strategies. With 6 years of experience specializing in Natural Language Processing (NLP), I have a proven track record of building scalable AI systems for high-stakes environments.

Prior to joining Monzo, I was a Machine Learning Engineer at Bumble, leading Trust and Safety initiatives by developing LLM-powered moderation pipelines to ensure platform safety at scale. I also worked as a Senior Data Scientist at ComplyAdvantage, where I applied NLP to financial crime detection, and as a consultant at Sia, focusing on complex question-answering tasks.

I am passionate about the intersection of LLM infrastructure and practical data engineering, specifically solving the "cold-start" problem for niche domains through synthetic data and rigorous validation frameworks

Session

06-07

16:15

45min

When Your Dataset Has Blind Spots: Practical LLM-Based Data Augmentation

Ophelie Bleu

Learn practical techniques for using LLMs to solve the data scarcity problem that plagues real-world ML projects. This talk demonstrates three production-ready approaches: synthetic generation, LoRA fine-tuning, and LLM-powered annotation to augment training datasets when you have abundant data for common cases but almost nothing for edge cases or emerging categories. Using a food review classification scenario, you'll see how to generate high-quality training data, when each technique works best, and critically, how to validate synthetic data to avoid amplifying errors. Perfect for practitioners facing the "we have 10k examples of X but zero for Y" problem.

Target Audience: Data scientists and ML engineers working on classification, NLP, or content moderation tasks who struggle with imbalanced or incomplete training datasets.

Takeaway: A decision framework for choosing between synthetic generation, fine-tuning, and LLM annotation, plus validation strategies to ensure data quality before retraining models.

Doddington Forum

Ophelie Bleu .ical

Session

Ophelie Bleu
.ical