Merve Noyan
Merve works at Hugging Face open-source team on computer vision and multimodal AI, and at times, agents, contributing to/developing Hugging Face libraries. Prior to this, she worked as machine learning engineer working on NLP, chatbots and information retrieval.
All the Hugging Face libraries, but I have my own project with 1.2k+ stars https://github.com/merveenoyan/smol-vision
Twitter handle. For example (@handle-name) –mervenoyann
Session
Multimodal AI is booming this year with models capable of seeing, reading, hearing. Models advancing in this field unlocks many production use cases in robotics, document AI, computer/web automations and more!
In this talk we will go through everything multimodal and open-source: a bit of background, libraries, very basic APIs to get you started with open-source models, popular open-source models, use cases (multimodal agents, multimodal RAG, automated browser use and more!)