PyCon DE & PyData 2025

Dr. Homa Ansari

Lead AI/ML scientist at ZEISS Meditec with 10+ years of experience in algorithm design for multimodal unstructured data (image, time series, geospatial data). Expert in developing innovative algorithms with statistical methods, shallow and deep
machine learning, and pre-trained Large Language Models (LLMs); specifically for satellite data and niche medical sensors. Recipient of innovation awards from the German Aerospace Center (DLR) and IEEE for novel algorithms and data products for satellite missions. Previous work experience at German Aerospace Center (DLR) and DataRobot Inc.


LinkedIn

https://www.linkedin.com/in/homaansari/


Session

04-23
17:10
30min
Generative-AI: Usecase-Specific Evaluation of LLM-powered Applications
Dr. Homa Ansari

This talk addresses the critical need for usecase-specific evaluation of Large Language Model (LLM)-powered applications, highlighting the limitations of generic evaluation benchmarks in capturing domain-specific requirements. It proposes a workflow for designing more reliable evaluatios to optimize LLM-based applications, consisting of three key activities: human-expert evaluation and benchmark dataset curation, creation of evaluation agents, and alignment of these agents with human evaluations using the curated datasets. The workflow produces two key outcomes: a curated benchmark dataset for testing LLM applications and an evaluation agent that scores their responses. The presentation further addresses the limitations, and best practices to enhance the reliability of evaluations, ensuring LLM applications are better tailored to specific use cases.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Platinum3