Memory maps to accelerate machine learning training
08-31, 14:15–14:30 (Europe/Zurich), HS 118

Memory-mapped files are an underused tool in machine learning projects, which offer very fast I/O operations, making them suitable for storing datasets during training that don't fit into memory.
In this talk, we will discuss the benefits of using memory maps, their downsides, and how to address them.


When working on a machine learning project, one of the most time-consuming parts is the model's training.

But a big part of the model's training is usually filled with filesystem I/O, which is very slow, especially in the context of computer vision.

In this talk, we will focus on using memory maps for storing the datasets during training - which allows you to significantly reduce the training time of your model.

We will also compare using memory maps to other ways to store the dataset during training, such as: in-memory datasets, one image per file, hdf5 file, etc. and will describe the strong and weak sides of the different approaches. Colab notebooks will be provided, and practical examples on significant performance improvements of popular online tutorials will be shown.

We will also show how to address common shortcomings and painpoints of using memory maps in machine learning projects.


Public link to supporting material

https://colab.research.google.com/drive/1-WMtVyfxx2aUMeV7vlG48Ia27-5cxnrS?usp=sharing

Expected audience expertise: Domain

some

Expected audience expertise: Python

some

Abstract as a tweet

Learn how to use memory-mapped files to accelerate the training of your machine learning model

Project Homepage / Git

https://github.com/hristo-vrigazov/mmap.ninja

Domains

General-purpose Python, Machine Learning, Open Source Library

Machine Learning Engineer with an interest in robotics, natural language processing and computer vision.
Have worked on various projects, such as driver assistance systems, recommender systems, word sense disambiguation, and others.
Author of several small open source packages, and have made small contributions to open source projects, such as ANTLR and Tensorflow.