PyCon Lithuania 2025

Open-source Multimodal AI
2025-04-25 , 101

Multimodal AI is booming this year with models capable of seeing, reading, hearing. Models advancing in this field unlocks many production use cases in robotics, document AI, computer/web automations and more!

In this talk we will go through everything multimodal and open-source: a bit of background, libraries, very basic APIs to get you started with open-source models, popular open-source models, use cases (multimodal agents, multimodal RAG, automated browser use and more!)

Merve works at Hugging Face open-source team on computer vision and multimodal AI, and at times, agents, contributing to/developing Hugging Face libraries. Prior to this, she worked as machine learning engineer working on NLP, chatbots and information retrieval.