Devconf.US

Building Trust with LLMs
08-14, 16:40–17:15 (US/Eastern), Conference Auditorium (capacity 260)

Have you ever questioned the reliability of Large Language Models (LLMs)? In today’s open source world, Large Language Models (LLMs) are revolutionizing how we innovate and build applications. However, before fully embracing them in our projects and applications, it's essential to evaluate their performance. This talk is designed to be your guide through the intricate process of LLM evaluation, equipping you with practical insights to navigate the complexities of implementing LLMs in real-world applications.

We will go over the fundamentals of LLM evaluation, beginning with an examination of existing traditional metrics such as ROUGE and BLEU scores and highlighting their significance in assessing model efficacy. We will then delve into more specialized techniques such as model based evaluation using LangChain criteria metrics. In addition, we will also cover human based evaluation and different evaluation benchmarks. Using a text generation demo application, we’ll compare the different evaluation techniques, highlighting their pros and cons. Throughout the session, we will address common challenges that you may face when assessing the quality of your LLMs and how to overcome them.

By the end of the talk, attendees will gain a comprehensive understanding of LLM evaluation techniques.

This speaker also appears in:

Hema Veeradhi is a Principal Data Scientist working in the Emerging Technologies team part of the office of the CTO at Red Hat. Her work primarily focuses on implementing innovative open AI and machine learning solutions to help solve business and engineering problems. Hema is a staunch supporter of open source, firmly believing in its ability to propel AI advancements to new heights. She has been a previous speaker at Open Source Summit NA, KubeCon NA, DevConf CZ and FOSSY.

This speaker also appears in: