2019-09-14, 10:30–12:00, Room A
This session will be an exposition of data wrangling with pandas and machine learning with scikit-learn. It will cover a classification project, from importing the data to evaluating model performance.
This hands-on workshop is aimed at a "beginner" Data Science audience (but small experience in python is preferable).
We hear a lot about Machine Learning, but it’s just one part of a bigger process. Before applying any algorithm to a data set, discovery and preparation are needed. This hands-on workshop will cover an end-to-end classification project, from importing the data to evaluating model performance. After this tutorial, you will have completed a step by step Machine Learning workflow.
Part one:* Grab your spade and dig in!***
Pandas is a popular tool that will allow us to efficiently conduct Exploratory Data Analysis. After loading the data set we’ll use in this workshop, we’ll have a first look at it with Pandas and start cleaning it. We’ll also use visualisation to gain more insights and continue to prepare our data.
Part two:* Where the Ma(th)gic happen.***
In this part, we’ll introduce the powerful scikit-learn library. We'll split the data into training and testing sets and start pre-processing. Then we’ll choose, tune and train a Machine Learning model and finally evaluate its performance using cross-validation and a confusion matrix.
During this workshop, we will fill in a pre-prepared Jupyter notebook together, explaining each step to get a good understanding of the process. You will also have a guided exercise notebook to reinforce your learning on unseen data.
To get the most out of this workshop you will need Python 3, pandas, matplotlib, scikit-learn and jupyter installed. Please refer to the documentation of your operating system of choice or search on the Internet how to install the packages.