Memory maps to accelerate machine learning training EuroSciPy 2022

Memory maps to accelerate machine learning training
.ical

2022-08-31 14:15–14:30, HS 118

Memory-mapped files are an underused tool in machine learning projects, which offer very fast I/O operations, making them suitable for storing datasets during training that don't fit into memory.
In this talk, we will discuss the benefits of using memory maps, their downsides, and how to address them.

When working on a machine learning project, one of the most time-consuming parts is the model's training.

But a big part of the model's training is usually filled with filesystem I/O, which is very slow, especially in the context of computer vision.

In this talk, we will focus on using memory maps for storing the datasets during training - which allows you to significantly reduce the training time of your model.

We will also compare using memory maps to other ways to store the dataset during training, such as: in-memory datasets, one image per file, hdf5 file, etc. and will describe the strong and weak sides of the different approaches. Colab notebooks will be provided, and practical examples on significant performance improvements of popular online tutorials will be shown.

We will also show how to address common shortcomings and painpoints of using memory maps in machine learning projects.

Public link to supporting material:

https://colab.research.google.com/drive/1-WMtVyfxx2aUMeV7vlG48Ia27-5cxnrS?usp=sharing

Abstract as a tweet:

Learn how to use memory-mapped files to accelerate the training of your machine learning model

Project Homepage / Git:

https://github.com/hristo-vrigazov/mmap.ninja

Domains:

General-purpose Python, Machine Learning, Open Source Library

Expected audience expertise: Domain:

some

Expected audience expertise: Python:

some

Hristo Vrigazov

Machine Learning Engineer with an interest in robotics, natural language processing and computer vision.
Have worked on various projects, such as driver assistance systems, recommender systems, word sense disambiguation, and others.
Author of several small open source packages, and have made small contributions to open source projects, such as ANTLR and Tensorflow.

Memory maps to accelerate machine learning training .ical 2022-08-31 14:15–14:30, HS 118

Memory maps to accelerate machine learning training
.ical

2022-08-31 14:15–14:30, HS 118