Introduction to scikit-learn
2023-08-15 , HS 120

Update: Here, I provide a prepared jupyter notebook for your to fill with code during the tutorial: https://github.com/StefanieSenger/Talks/blob/main/2023_EuroSciPy/2023_EuroSciPy_Intro_to_scikit-learn_fillout-notebook.ipynb. Please download it and have it at hand when the tutorial starts. You can still download it during the introduction part of the tutorial.

This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning. We will talk about what Machine Learning is and how scikit-learn can implement it. In the practical part we will learn how to create a predictive modelling pipeline and how to fine tune its hyperparameters to improve the model's score.


Workshop Outline

  • Machine Learning 101 (10 min.)
  • What is scikit-learn? (5 min.)
  • Practical Part (+60 min.)
    • Predictive modeling pipeline
    • Evaluation of models
    • Hyperparameters tuning

Description

We will start with covering the main ideas behind Machine Learning and we introduce scikit-learn as a machine learning library. There will be plenty of room to ask questions.

The practical part of his tutorial will be subdivided into three parts. First, we will present how to design a predictive modeling pipeline that deals with heterogeneous types of data. Then, we will go more into detail in the evaluation of models and the type of trade-offs to consider. Finally, we will show how to tune the hyperparameters of the pipeline.

You are encouraged to code along with me.

Prerequisites

This workshop will serve you best when you have some basic knowledge of Python and know how to use a Jupyter Notebook. We will start from a prepared notebook and add code at every step.

Bring your laptop.

Have a virtual environment with numpy, pandas and scikit-learn installed.

Have the prepared notebook at hand.


Abstract as a tweet:

Introduction to machine learning using scikit-learn

Category [Machine and Deep Learning]:

Supervised Learning

Expected audience expertise: Domain:

none

Expected audience expertise: Python:

some

Category [High Performance Computing]:

Scalability

Category [Community, Education, and Outreach]:

Learning and Teaching Scientific Python

Category [Scientific Applications]:

Other

Category [Data Science and Visualization]:

Data Analysis and Data Engineering

Public link to supporting material:

https://inria.github.io/scikit-learn-mooc/

Historian (PhD) that went astray. I'm teaching Data Science to career changers at Le Wagon and started contributing to scikit-learn during the last months.