Deep Learning for Understanding Human Multi-modal Behavior
09-05, 15:45–16:00 (UTC), Track 1 (Mitxelena)

Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks.

Multimedia automatic learning has drawn attention from companies and governments for a significant number of applications for automated recommendations, classification, and human brain understatement. In recent years, and an increased amount of research has explored using deep neural networks for multimedia related tasks.
Some government security and surveillance applications are automated detections of illegal and violent behaviors, child pornography and traffic infractions. Companies worldwide are looking for content-based recommendation systems that can personalize clients consumption and interactions by understanding the human perception of memorability, interestingness, attractiveness, aesthetics. For these fields like event detection, multimedia affect and perceptual analysis are turning towards Artificial Neural Networks. In this talk, I will present the theory behind multi-modal fusion using deep learning and some open challenges and their state-of-the-art.

Project Homepage / Git Abstract as a tweet

Understanding multi-modalities with deep learning approaches

Python Skill Level


Domain Expertise



Machine Learning

I am a former hotel manager; nowadays I am a student pursuing degrees as Computer Engineer B.S. and Computer Science M.S. I am a researcher and developer working in the fashion industry with Dafiti Group and Udacity’s reviewer for Machine Learning and Deep Learning Nanodegrees.