Susan Shu Chang
Susan Shu Chang is a Principal Data Scientist at Elastic (Elasticsearch). She has spoke at 6 PyCons around the world, and is the author of Machine Learning Interviews (O'Reilly).
Session
12-10
14:15
40min
Evaluating AI Agents in production with Python
Susan Shu Chang
This talk covers methods of evaluating AI Agents, with an example of how the speakers built a Python-based evaluation framework for a user-facing AI Agent system which has been in production for over a year. We share tools and Python frameworks used (as well as tradeoffs and alternatives), and discuss methods such as LLM-as-Judge, rules-based evaluations, ML metrics used, as well as selection tradeoffs.
Thomas Paul