Surya Pathak
Sessions
Have you ever questioned the reliability of Large Language Models (LLMs)? In today’s open source world, Large Language Models (LLMs) are revolutionizing how we innovate and build applications. However, before fully embracing them in our projects and applications, it's essential to evaluate their performance. This talk is designed to be your guide through the intricate process of LLM evaluation, equipping you with practical insights to navigate the complexities of implementing LLMs in real-world applications.
We will go over the fundamentals of LLM evaluation, beginning with an examination of existing traditional metrics such as ROUGE and BLEU scores and highlighting their significance in assessing model efficacy. We will then delve into more specialized techniques such as model based evaluation using LangChain criteria metrics. In addition, we will also cover human based evaluation and different evaluation benchmarks. Using a text generation demo application, we’ll compare the different evaluation techniques, highlighting their pros and cons. Throughout the session, we will address common challenges that you may face when assessing the quality of your LLMs and how to overcome them.
By the end of the talk, attendees will gain a comprehensive understanding of LLM evaluation techniques.
Are you curious to learn about Large Language Models (LLMs), but unsure how and where to begin? This workshop is designed with specifically you in mind. LLMs have emerged as powerful tools in natural language processing, yet their implementation poses challenges, particularly in managing computational resources effectively.
During this workshop, we will delve into the fundamentals of LLMs and guide you in selecting the appropriate open source models for your requirements. We will discuss the concept of self-hosted LLMs and introduce containerization technologies such as Kubernetes, Docker, and Podman. Through illustrative use-cases like RAG application, text generation or speech recognition, you will learn how to set up LLMs locally on your laptop and build container images for the models using Podman. We will also be exploring model serving and inference methods, including interaction with the model via a simple UI application. Moreover, the workshop will cover model evaluation techniques and introduce various metrics that can be utilized to effectively measure the performance and quality of model outputs.
Attendees will gain practical knowledge and skills to effectively harness the capabilities of LLMs in real-world applications. They will understand the challenges associated with managing computational resources and learn how to overcome them. By the end of the workshop, participants will be equipped with the tools to set up and deploy LLMs, evaluate model performance, and implement them in various natural language processing tasks.