PyCon UK 2022

Vectorise all the things! How basic linear algebra can speed up your data science code
2022-09-17 , Assembly Room

Do you feel like your data science code is horribly inefficient, but you don’t know how to make things faster? Fear not! In this talk, we’ll speed up some common operations using tricks from linear algebra - all within the comfort of the Python ecosystem.


Have you found that your data science code works beautifully on a few dozen test rows, but leaves you wondering how to spend the next couple of hours after you start looping through your full data set? Are you only familiar with Python, and wish there was a way to speed things up without subjecting yourself to learning C? In this talk, I will show you some simple tricks, borrowed from linear algebra, which can give you significant performance gains in your Python data science code. I will gently take you through the basics of linear algebra, explaining core operations such as matrix addition, subtraction and multiplication, scalar multiplication and the dot product. I will then show you some examples of how you can easily utilise these concepts in your machine learning code to speed up common data science operations such as distance calculations, classification tasks and finding nearest neighbours.


Is your proposal suitable for beginners?: yes

Dr. Jodie Burchell is the Developer Advocate in Data Science at JetBrains, and was previously the Lead Data Scientist in audiences generation at Verve Group Europe. After finishing a PhD in Psychology and a postdoc in biostatistics, she has worked in a range of data science and machine learning roles across search improvement, recommendation systems, NLP and programmatic advertising. She is also the author of two books, "The Hitchhiker's Guide to Ggplot2" and "The Hitchhiker's Guide to Plotnine", and writes a data science blog.