PyCon GR 2025

Decoding Emotions from Art to Memes: A Multimodal Image and Text System
2025-08-29 , Innovathens - Main stage

The world around us is full of information of many types, and although many of us communicate verbally, we often lose information about context or intent if it's not accompanied by images. The same thing happens with models when we try to solve very complex problems with models that can only contain information of a specific type. For this reason, solutions that can integrate different types of data, such as images, text, video, or audio, are needed: multimodal models. These systems combine multiple inputs of different data types and make systems more robust, helping in areas such as health, security, criminology, business, and more.

The objective of this talk is to show how to create a system capable of identifying emotions from various types of images and, in parallel, extracting emotions from texts using that result to get a more global understanding of the scene emotion (a system without GenAI and another with GenAI).

In this talk, attendees will learn step-by-step how to implement a multimodal system that includes images and text, using open-access data, and understand how multimodal systems are a good option for solving complex problems. In addition, the role of Python in the process will be shared, as well as the application of external modules for a simple implementation, to facilitate the multimodal process.

Important:
General knowledge of Data Science is recommended to follow the topic easily, although the explanation will be simplified, going through all the necessary steps to understand how it is done.

She is a former scientist, working in the Data Science and AI world for the last 7 years.
In her spare time loves to develop projects that can help people around our own planet.
She is a co-organizer of PyladiesCon conference and trying to give a hand in other conferences !