PyData Boston 2025

Daina Bouquin

Daina Bouquin is Senior Developer Relations Engineer at Anaconda with over 12 years of experience bridging technical and non-technical contexts across astrophysics, library science, data science, and federal government work. Her past experience as a Data Scientist for the US Administration for Children and Families gave her firsthand experience with the gap between technical AI evaluation and real-world impact, particularly in high-stakes government systems affecting vulnerable populations. This work fundamentally changed how she thinks about responsible AI development and sparked her ongoing interest in how Python developers can build systems that truly help people. At Anaconda, she works to strengthen connections between Anaconda's engineering teams and the broader developer community, creating resources and fostering relationships that help people solve important problems with AI and open source tools.


Session

12-10
13:30
40min
Is Your LLM Evaluation Missing the Point?
Daina Bouquin

Your LLM evaluation suite shows 93% accuracy. Then domain experts point out it's producing catastrophically wrong answers for real-world use cases. This talk explores the collaboration gap between AI engineers and domain experts that technical evaluation alone cannot bridge. Drawing from government, healthcare, and civic tech case studies, we'll examine why tools like PromptFoo, DeepEval, and RAGAS are necessary but insufficient and how structured collaboration with domain stakeholders reveals critical failures invisible to standard metrics. You'll leave with practical starting points for building cross-functional evaluation that catches problems before deployment.

Horace Mann