2023-08-15 –, HS 120
Update: Here, I provide a prepared jupyter notebook for your to fill with code during the tutorial: https://github.com/StefanieSenger/Talks/blob/main/2023_EuroSciPy/2023_EuroSciPy_Intro_to_scikit-learn_fillout-notebook.ipynb. Please download it and have it at hand when the tutorial starts. You can still download it during the introduction part of the tutorial.
This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning. We will talk about what Machine Learning is and how scikit-learn can implement it. In the practical part we will learn how to create a predictive modelling pipeline and how to fine tune its hyperparameters to improve the model's score.
Workshop Outline
- Machine Learning 101 (10 min.)
- What is scikit-learn? (5 min.)
- Practical Part (+60 min.)
- Predictive modeling pipeline
- Evaluation of models
- Hyperparameters tuning
Description
We will start with covering the main ideas behind Machine Learning and we introduce scikit-learn as a machine learning library. There will be plenty of room to ask questions.
The practical part of his tutorial will be subdivided into three parts. First, we will present how to design a predictive modeling pipeline that deals with heterogeneous types of data. Then, we will go more into detail in the evaluation of models and the type of trade-offs to consider. Finally, we will show how to tune the hyperparameters of the pipeline.
You are encouraged to code along with me.
Prerequisites
This workshop will serve you best when you have some basic knowledge of Python and know how to use a Jupyter Notebook. We will start from a prepared notebook and add code at every step.
Bring your laptop.
Have a virtual environment with numpy, pandas and scikit-learn installed.
Have the prepared notebook at hand.
Introduction to machine learning using scikit-learn
Category [Machine and Deep Learning] –Supervised Learning
Expected audience expertise: Domain –none
Expected audience expertise: Python –some
Category [High Performance Computing] –Scalability
Category [Community, Education, and Outreach] –Learning and Teaching Scientific Python
Category [Scientific Applications] –Other
Category [Data Science and Visualization] –Data Analysis and Data Engineering
Public link to supporting material –Historian (PhD) that went astray. I'm teaching Data Science to career changers at Le Wagon and started contributing to scikit-learn during the last months.