Extracting Structured Data from LLMs with LangChain and Pydantic PyCon JP 2024

Extracting Structured Data from LLMs with LangChain and Pydantic
.ical
2024-09-27 13:50–14:20, 4F Track3

This talk dives into the exciting realm of enriching your Large Language Model (LLM) interactions with structured data extraction. We'll explore how LangChain, in conjunction with Pydantic, empowers you to retrieve not just plain text from LLMs but also reusable Python objects like lists, dictionaries, and even pandas DataFrames.

Join me in this journey to understand and implement structured data extraction. With this powerful combination of tools, you'll learn how to craft data models using Pydantic's BaseModel for seamless integration with LangChain's output parser. Next, how to extract valuable information from LLM responses in structured formats like( lists, data frames), enabling further analysis and manipulation. And finally build supercharge LLM applications that require structured data transformations, parsing, or integration with machine learning models.
Whether you're a data scientist, developer, or just curious about the possibilities of LLMs, this talk equips you with the skills to unleash the structured power of LLMs and build innovative applications.

Introduction:
Highlighting limitations of plain text LLM responses.
Introducing structured data extraction from LLMs.
Exposure to LangChain and Pydantic's power.
Building the Data Model:
Demo of defining data models with Pydantic's BaseModel.
Exploring lists, dictionaries, and pandas DataFrames.
Understanding data model interaction with LangChain's parser.
Structured Data Extraction:
Live examples of querying LLMs for structured data.
Transforming data for analysis.
Integrating data with ML models.
Real-World Applications:
Practical use cases of structured data extraction.
Benefits and potential applications discussion.
Conclusion:
Recap of key learnings and future directions.
Exciting possibilities ahead.

Why did you choose this topic?:

Extracting valuable information from text can be a tedious and error-prone process, especially when dealing with large amounts of data. Manually going through text and picking out relevant details is time-consuming and prone to mistakes. However, there's a powerful solution! Combining the power of large language models (LLMs) with tools like LangChain and Pydantic offers an automated approach to extract structured data from text.

My passion for streamlining data extraction and my experience with the challenges of manual processing led me to explore this cutting-edge approach with LLMs and LangChain. I believe it has the potential to revolutionize how we handle unstructured text data.

Knowledges and know-how the audience can get from your talk:

Whether you're a data scientist, developer, or just curious about the possibilities of LLMs, this talk equips you with the skills to unleash the structured power of LLMs and build innovative applications.

Prior knowledges speakers assume the audience to have:

Prerequisites:
Basic understanding of Python programming.
Familiarity with the concept of Large Language Models(Optional, but beneficial)
Prior exposure to Pydantic and its functionalities.

Audience experiment: Intermediate Language of presentation: English Language of presentation material: English

Kalyan Prasad

Hello, this is Kalyan from India. I started my career as a newspaper delivery boy, and through hard work and determination, I evolved into a self-taught data scientist and analytics manager. And, I lead a talented data science and analytics team at my workplace.
I'm deeply passionate about open-source communities and actively contribute to them. Over time, I've established myself as a respected global speaker and influential community leader, delivering talks at prestigious conferences and educational institutions such as PyData Global, Data Science Global Summit 2022, JupyterCon, PyCon JP, PyCon India, Devfest Hyderabad, PyCon APAC, PyCon Hong Kong, PyCon ZA, Pyjamas, Conf42, Developer Conference Telangana 2021, BelPy & KLS Gogte Institute of Technology, Belagavi, Karnataka, India.
I also worked as Reviewer and Mentor for reputed conferences & hackathons including EuroPython, SciPy, PyData, PyData Seattle, JupyterCon, PyCon US, PyCon India, PyConfHyderabad, and many others. (At the moment, assisting the EuroPython 2024 Proposal Mentorship program.
Kalyan is also contributing to various open-source communities. He enjoys being involved with these communities and helping them grow. Currently I am associated with the following organizations below:
NUMFOCUS - Small Development Grants Review Committee
PyCon India – Conference Co-chair
PyConf Hyderabad – Conference Co-chair
Kaggle X Bipoc Mentorship - Mentor
PyData Global Impact Mentoring Program - Mentor
Hyderabad Python Users Group – Core Member/ Meetups Organizer
Humans for AI – Program Manager for AI learning Community

Extracting Structured Data from LLMs with LangChain and Pydantic .ical 2024-09-27 13:50–14:20, 4F Track3

Extracting Structured Data from LLMs with LangChain and Pydantic
.ical
2024-09-27 13:50–14:20, 4F Track3