PyCon UK 2019

Get to grips with pandas and scikit-learn
09-14, 10:30–12:00 (Europe/London), Room A

This session will be an exposition of data wrangling with pandas and machine learning with scikit-learn. It will cover a classification project, from importing the data to evaluating model performance.
This hands-on workshop is aimed at a "beginner" Data Science audience (but small experience in python is preferable).


We hear a lot about Machine Learning, but it’s just one part of a bigger process. Before applying any algorithm to a data set, discovery and preparation are needed. This hands-on workshop will cover an end-to-end classification project, from importing the data to evaluating model performance. After this tutorial, you will have completed a step by step Machine Learning workflow.

Part one: Grab your spade and dig in!

Pandas is a popular tool that will allow us to efficiently conduct Exploratory Data Analysis. After loading the data set we’ll use in this workshop, we’ll have a first look at it with Pandas and start cleaning it. We’ll also use visualisation to gain more insights and continue to prepare our data.

Part two: Where the Ma(th)gic happen.

In this part, we’ll introduce the powerful scikit-learn library. We'll split the data into training and testing sets and start pre-processing. Then we’ll choose, tune and train a Machine Learning model and finally evaluate its performance using cross-validation and a confusion matrix.

During this workshop, we will fill in a pre-prepared Jupyter notebook together, explaining each step to get a good understanding of the process. You will also have a guided exercise notebook to reinforce your learning on unseen data.

To get the most out of this workshop you will need Python 3, pandas, matplotlib, scikit-learn and jupyter installed. Please refer to the documentation of your operating system of choice or search on the Internet how to install the packages.


Is your proposal suitable for beginners? – yes

From Paris import Sandrine as SP

SP is a French Mathematician turned Data Scientist. She is currently working in financial services and is active in the London tech scene as an open source community leader.

Tags: Machine Learning, Basketball, Cooking, Numpy, Badminton, Family, Pandas, Cat, Travelling, scikit-learn, Friends, Discovering, Data Science, Gardening, Python, Comics