Get to grips with pandas and scikit-learn

This session will be an exposition of data wrangling with pandas and machine learning with scikit-learn for Python Programmers. This hands-on workshop will cover a classification project, from importing the data to evaluating model performance.


We hear a lot about Machine Learning, but it’s just one part of a bigger process. Before applying any algorithm to a data set, discovery and preparation are needed. This hands-on workshop will cover an end-to-end classification project, from importing the data to evaluating a model performance. After this tutorial, you will have completed a step by step Machine Learning workflow.

Part one: Grab your spade and dig in!
Pandas is a popular tool that will allow us to efficiently conduct Exploratory Data Analysis. After loading the data set we’ll use in this workshop, we’ll have a first look at it using Pandas and start cleaning it. We’ll also use visualisation to gain more insights and continue to prepare our data.

Part two: Where the Ma(th)gic happen.
In this part, we’ll introduce the scikit-learn library. We'll split the data into training and testing sets and start pre-processing. Then we’ll choose, tune and train a Machine Learning model and finally evaluate its performance using a confusion matrix.

During this workshop, we will fill in a pre-prepared Jupyter notebook together, explaining each step to get a good understanding of the process. You will also have a guided exercise notebook to reinforce your learning on unseen data.

To get the most out of this workshop you will need Python 3, pandas, matplotlib, scikit-learn and jupyter installed. Please refer to the documentation of your operating system of choice or search on the Internet how to install the packages.


Domains:

Algorithms, Data Science, Machine Learning

Domain Expertise:

none

Python Skill Level:

basic

Abstract as a tweet:

Get to grips with pandas and scikit-learn: a first contact with data science using python