<img loading="lazy" src="/media/euroscipy-2022/img/logo_IzVzLQR.png" id="event-logo" alt="The event’s logo">

Getting started with JupyterLab

Mike Müller

JupyterLab is very widely used in the Python scientific community. Most, if not all, of the other tutorials will use Jupyter as a tool. Therefore, a solid understanding of the basics is very helpful for the rest of the conference as well as for your later daily work.
This tutorial provides an overview of important basic Jupyter features.

08:30

Increase citations, ease review & collaboration – Making machine learning in research reproducible

Jesper Dramsch

Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.

This tutorial aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.

This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.

10:00

30min

Break

HS 120

10:00

30min

Break

HS 118

10:30

Introduction to Python for scientific programming

Mojdeh Rastgoo

This tutorial will provide an introduction to Python intended for beginners.

It will notably introduce the following aspects:

built-in types
controls flow (i.e. conditions, loops, etc.)
built-in functions
basic Python class

10:30

Time Series Forecasting with scikit-learn's Quantile Gradient Boosted Regression Trees

Olivier Grisel

This tutorial will introduce how to leverage scikit-learn's powerful
histogram-based gradient boosted regression trees with various loss functions
(Least squares, Poisson and the pinball loss for quantile estimation) on a time
series forecasting problem. We will see how to leverage pandas to build lag and
windowing features and scikit-learn time-series cross-validation tools and other
model evaluation tools.

12:00

90min

Lunch

HS 120

12:00

90min

Lunch

HS 118

13:30

Evaluating your machine learning models: beyond the basics

Gaël Varoquaux, Arturo Amor

This tutorial will guide towards good evaluation of machine-learning models, choosing metrics and procedures that match the intended usage, with code examples using the latest scikit-learn's features. We will discuss how good metrics should characterize all aspects of error, e.g. on the positive and negative class; the probability of a detection, or the probability of a true event given a detection; as they may need to catter for class imbalance. Metrics may also evaluate confidence scores, e.g. calibration. Model-evaluation procedures should gauge not only the expected generalization performance, but also its variations.

13:30

Introduction to NumPy

Maria Teleńczuk

This tutorial will provide an introduction to the NumPy library intended for beginners.

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

15:00

30min

Break

HS 120

15:00

30min

Break

HS 118

15:30

Introduction to Audio & Speech Recognition

Vaibhav Srivastav

The audio (& speech) domain is going through a massive shift in terms of end-user performances. It is at the same tipping point as NLP was in 2017 before the Transformers revolution took over. We’ve gone from needing a copious amount of data to create Spoken Language Understanding systems to just needing a 10-minute snippet.

This tutorial will help you create strong code-first & scientific foundations in dealing with Audio data and build real-world applications like Automatic Speech Recognition (ASR) Audio Classification, and Speaker Verification using backbone models like Wav2Vec2.0, HuBERT, etc.

15:30

Introduction to pandas

Geir Arne Hjelle

This tutorial is an introduction to pandas intended for beginners.

pandas is one of Python's core packages for data science. pandas organizes data into DataFrames and provides powerful methods for manipulating them. The library is built on top of NumPy. It'll be helpful for the tutorial if you have some experience with NumPy arrays, for example, by following the Introduction to NumPy tutorial.

08:30

Introduction to PyTorch

Valerio Maggio

In this tutorial we will go through the main features of the PyTorch framework for Deep Learning.
We will start by learning how to build a neural network from the ground up, deep diving into torch.tensor, Dataset and optimisers.
We will analyse data cases from different domains (e.g. numerical, images), introducing different neural network layers and architecture. Last but not least, a few tips from a pure Data science-y perspective will be shared, to appreciate the wonderful integration PyTorch has with the Python Data model!

08:30

Introduction to SciPy

Vincent Maladiere

This tutorial will provide an introduction SciPy intended for beginners.

SciPy is a collection of mathematical algorithms and convenience functions built on the NumPy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data.

10:00

30min

Break

HS 120

10:00

30min

Break

HS 118

10:30

Introduction to geospatial data analysis with GeoPandas

Joris Van den Bossche

This tutorial is an introduction to geospatial data analysis, with a focus on tabular vector data using GeoPandas. It will show how GeoPandas and related libraries can improve your workflow (importing GIS data, visualizing, joining and preparing for analysis, exploring spatial relationships, …).

10:30

Introduction to scikit-learn I

Arturo Amor, Arkadiusz Trawiński

This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning.

This tutorial will be subdivided into three parts. First, we will present how to design a predictive modeling pipeline that deals with heterogeneous types of data. Then, we will go more into detail in the evaluation of models and the type of trade-off to consider. Finally, we will show how to tune the hyperparameters of the pipeline.

12:00

90min

Lunch

HS 120

12:00

90min

Lunch

HS 118

13:30

Image processing with scikit-image

Emmanuelle Gouillart, Lars Grüter

Image data are used in many scientific fields such as astronomy, life sciences or material sciences. This tutorial will walk you through image processing with the scikit-image library, which is the numpy-native image processing library of the scientific python ecosystem.

The first hour of the tutorial will be accessible to beginners in image processing (some experience with numpy array is a pre-requisite), and will focus on some basic concepts of digital image manipulation and processing (filters, segmentation, measures). In the last half hour, we will focus on more advanced aspects and in particular Emma will speak about performance and acceleration of image processing.

13:30

Introduction to scikit-learn II

Arturo Amor, Arkadiusz Trawiński

This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning.

15:00

30min

Break

HS 120

15:00

30min

Break

HS 118

15:30

Effectively using matplotib

Tim Hoffmann

This tutorial explains the fundamental ideas and concepts of matplotlib. It's suited for complete beginners to get started as well as existing users who want to improve their plotting abilities and learn about best practices.

15:30

Network Science with Python

Mridul Seth

This workshop is for data scientists and other programmers who want to add another tool in their data science toolkit. Modelling, analysing and visualising data as networks! Network Science deals with analysing network data, and the data can come from different fields like politics, finance, computer science, law and even Game of Thrones!

JAX and Flax: Function Transformations and Neural Networks

08:30

30min

Registration

Aula

09:00

60min

Andreas Steiner

Modern accelerators (graphics processing units and tensor processing units) allow for high performance computing at massive scale. JAX traces computation in Python programs through the familiar numpy API, and uses XLA to compile programs that run efficiently on these accelerators. A set of composable function transformations allows for expressing versatile scientific computing with an elegant syntax.

Flax provides abstractions on top of JAX that make it easy to handle weights and other states that is required for solving problems using neural networks.

This talk first presents the basic JAX API that allows for computing gradients, compiling functions, or vectorizing computation. It then proceeds to cover other parts of the JAX ecosystem commonly used for neural network programming, such as basic building blocks and optimizers.

Aula

10:00

30min

Break

Aula

10:00

30min

Break

HS 120

10:00

30min

Break

HS 118

10:30

Interactive Image Annotation with plotly and Dash

Emmanuelle Gouillart

Automatic image processing is a common task in many scientific and technological fields such as life sciences (with medical imaging), satellite imaging, etc. While machine learning is often used for efficient processing of such data sets, building a high-quality training set is an important task. Specialized software (such as rootpainter, ilastik) exist in different communities to build such training sets thanks to user annotations drawn on images.

In this talk, I will show how to use the open-source libraries plotly and dash to build custom interactive applications for interactive image annotation, and how to combine these tools with libraries such as scikit-image or machine learning/deep learning libraries for building a whole image processing pipeline.

Aula

11:05

Decision making under uncertainty

Christian Barz

Python is the most popular programming language in the data space and is one of the major driver of many advancements in machine learning. However, it's much less know that the Python library Pyomo is a great tool for solving mathematical optimization problems common in operations research.

In this talk I'm demonstrating how Pyomo can be used to find optimal decisions when data is uncertain and how to combine data driven forecasts with optimal decision making.

Education - Materials, methods, tools

11:05

45min

Mx Chiin-Rui Tan

This session focuses on issues related to education in the ecosystem, from three different aspects, and during the session, we focus on recent advances and existing and upcoming challenges.

Materials: how are projects dealing with documentation and education materials
Methods: What should we do to make our materials more accessible to underrepresented and/or historically marginalised groups?
Tools: What are the existing tools in the ecosystem helping us achieve the above goals, and what do we need to develop?

We will give an overview of these different aspects.

HS 119

11:05

What is Contributor Experience?

Noa Tamir

In my current work as a contributor experience lead, I am supporting and growing Matplotlib’s and Pandas’ communities by organizing events, meetings, and proactive engagement with a focus on equity and inclusion of historically marginalized groups. In my talk I’ll give an introduction to this new role, the grant that supports it, and some of the work done so far…

I will share takeaways for maintainers, and contributors; from simple changes that can be implemented relatively easily, to bigger topics, which one might want to learn more about, and slowly yet proactively, facilitate changes to tweak the contributor experience for a project.

11:40

Discover Pythran through 10 code samples

Serge « sans » Paille

The Pythran compiler is used to speed-up generic Python scientific kernels across the world. Through ten code samples taken from scipy, scikit-image codebase and stack overflow snippets, this talks is going to demonstrate the major features of the compiler, as well as some technical nits!

11:40

conda-forge: supporting the growth of the volunteer-driven, community-based packaging project

Jannis Leidel, Wolf Vollprecht, Jaime Rodríguez-Guerra

The conda-forge project is one of the fastest growing Open Source communities out there – and most data scientists have probably heard of it. In this talk we explain the inner workings of conda-forge, its relationship to conda and PyPI, and we will explain how everyone can package software with conda-forge.

12:10

85min

Lunch

HS 120

12:10

85min

Lunch

HS 118

13:35

How to increase diversity in open source communities

Maren Westermann

Today state of the art scientific research strongly depends on open source libraries. The demographic of the contributors to these libraries is predominantly white and male [1][2][3][4]. In recent years there have been a number of various recommendations and initiatives to increase the participation in open source projects of groups who are underrepresented in this domain [1][3][5][6]. While these efforts are valuable and much needed, contributor diversity remains a challenge in open source communities [2][3][7]. This talk highlights the underlying problems and explores how we can overcome them.

[Maintainers track] Interoperability in the DataFrame landscape: DataFrame API & PyArrow Update

13:35

45min

Joris Van den Bossche

This is part of the maintainers track.

In this session, we want to share some updates on the DataFrame ecosystem: the DataFrame interchange protocol (https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html) and Arrow C Data interface (https://arrow.apache.org/docs/format/CDataInterface.html), and the integration of those interoperability protocols with different libraries. Further, we want to have an open conversation about challenges and requirements related to DataFrame interoperability and supporting multiple DataFrame libraries in projects.

HS 119

13:35

conda-forge, mamba, boa and quetz - the evolution of package management for data science and beyond

Wolf Vollprecht

Mamba is a fast, cross-platform and language independent package manager that is fully compatible with conda packages.
It has enabled the conda-forge project to scale way beyond what was previously possible.
In this talk we present further innovations in the mamba ecosystem, including boa, a new build tool based on mamba and quetz, an open-source and extensible package server for conda packages.

13:55

Array expressions and symbolic gradients in SymPy

Francesco Bonazzi

SymPy is an open source computer algebra system (CAS) written in Python.

The recent addition of the array expression module provides an alternative to the matrix expression module, with generalized support to higher dimensions (matrices are constrained to 2 dimensions).

Given the importance of multidimensional arrays in machine learning and mathematical optimization problems, this talk will illustrate examples of tensorial expressions in mathematics and how they can be manipulated using either module or in the index-explicit way.

Conversion tools have been provided to SymPy to allow users to switch an expression between the array form and either the matrix or index-explicit form. In particular, the conversion from array to matrix form attempts to represent contractions, diagonalizations and axis-permutations with operations commonly used in matrix algebra, such as matrix multiplication, transposition, trace, Hadamard and Kronecker products.

A gradient algorithm for array expressions has been implemented, returning a closed-form array expression equivalent to the derivative of arrays by arrays. The derivative algorithm for matrix expressions now uses this algorithm, attempting to convert the array back to matrix form if trivial dimensions can be dropped.

13:55

Emergent structures in noisy channel message-passing

Iliya Zhechev

We will explain a mechanism for generating neural network glyphs, like the glyphs we use in human languages. Glyphs are purposeful marks, images with 2D structures used to communicate information. We will use neural networks to generate those structured images, by optimizing for robustness.

Colab Notebook | Slides | Blog Post | Github Repo