Driving down the Memray lane - Profiling your data science work
04-17, 15:45–16:15 (Europe/Berlin), A1

When handling a large amount of data, memory profiling the data science workflow becomes more important. It gives you insight into which process consumes lots of memory. In this talk, we will introduce Mamray, a Python memory profiling tool and its new Jupyter plugin.


In this talk, we will be exploring what memory profiling is, and how it can help with data science work. We will start the talk with a basic explanation of how Python arrange memories for various objects. This lays the foundation explanation of why we need a special tool to memory profile Python programs.

Then we will be going through a data science use case where we memory profiles some part of the process with the Memray Jupyter plug-in. This would be a use case that a data science practitioner or learner would be familiar with and they can see how memory profiling could be useful.

We will then explain how to interpret the frame diagram in Memray, a commonly used diagram in memory profiling to understand how much memory a process and its sub-process uses. This is something that for a new user, it could be hard to understand and not know what to look into. From this example, audiences can see what they can learn about from the frame diagram.

Goal

This talk is for data scientists, learners or anyone who is interested in memory profiling their Python program. Although the talk will be using a data science use case as an example, the explanation and the tool can be expanded to be used in any Python program. However, for data science practitioners and learners who have been using Python to process data, this may be a step forward for them to improve their data workflow and prevent memory leaks from their programs.

Outline

  • Introduction (5 mins)
  • Why we need a special tool for memory profiling (5 mins)
  • How to use Memray in Jupyter notebook (5 mins)
  • Demonstration for using Memray in data science work (5 mins)
  • How to interpret a frame diagram (5 mins)
  • Conclusion (5 mins)

Expected audience expertise: Domain

Intermediate

Expected audience expertise: Python

Intermediate

Abstract as a tweet

You should profile your data science work. In this talk, we will introduce Mamray its new Jupyter plugin.

After having a career in data science, Cheuk now brings her knowledge of data and passion for the tech community as the developer advocate for Anaconda. Cheuk constantly contributes to the open-source community by giving free talks and tutorials and organising sprints to encourage diverse contributions.