To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
30min
Registration
Aula
08:30
08:30
90min
Getting started with JupyterLab
Mike Müller

JupyterLab is very widely used in the Python scientific community. Most, if not all, of the other tutorials will use Jupyter as a tool. Therefore, a solid understanding of the basics is very helpful for the rest of the conference as well as for your later daily work.
This tutorial provides an overview of important basic Jupyter features.

HS 118
08:30
90min
Increase citations, ease review & collaboration – Making machine learning in research reproducible
Jesper Dramsch

Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.

This tutorial aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.

This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.

HS 120
10:00
10:00
30min
Break
HS 120
10:00
30min
Break
HS 118
10:30
10:30
90min
Introduction to Python for scientific programming
Mojdeh Rastgoo

This tutorial will provide an introduction to Python intended for beginners.

It will notably introduce the following aspects:

  • built-in types
  • controls flow (i.e. conditions, loops, etc.)
  • built-in functions
  • basic Python class
HS 118
10:30
90min
Time Series Forecasting with scikit-learn's Quantile Gradient Boosted Regression Trees
Olivier Grisel

This tutorial will introduce how to leverage scikit-learn's powerful
histogram-based gradient boosted regression trees with various loss functions
(Least squares, Poisson and the pinball loss for quantile estimation) on a time
series forecasting problem. We will see how to leverage pandas to build lag and
windowing features
and scikit-learn time-series cross-validation tools and other
model evaluation tools.

HS 120
12:00
12:00
90min
Lunch
HS 120
12:00
90min
Lunch
HS 118
13:30
13:30
90min
Evaluating your machine learning models: beyond the basics
Gaël Varoquaux, Arturo Amor

This tutorial will guide towards good evaluation of machine-learning models, choosing metrics and procedures that match the intended usage, with code examples using the latest scikit-learn's features. We will discuss how good metrics should characterize all aspects of error, e.g. on the positive and negative class; the probability of a detection, or the probability of a true event given a detection; as they may need to catter for class imbalance. Metrics may also evaluate confidence scores, e.g. calibration. Model-evaluation procedures should gauge not only the expected generalization performance, but also its variations.

HS 120
13:30
90min
Introduction to NumPy
Maria Teleńczuk

This tutorial will provide an introduction to the NumPy library intended for beginners.

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

HS 118
15:00
15:00
30min
Break
HS 120
15:00
30min
Break
HS 118
15:30
15:30
90min
Introduction to Audio & Speech Recognition
Vaibhav Srivastav

The audio (& speech) domain is going through a massive shift in terms of end-user performances. It is at the same tipping point as NLP was in 2017 before the Transformers revolution took over. We’ve gone from needing a copious amount of data to create Spoken Language Understanding systems to just needing a 10-minute snippet.

This tutorial will help you create strong code-first & scientific foundations in dealing with Audio data and build real-world applications like Automatic Speech Recognition (ASR) Audio Classification, and Speaker Verification using backbone models like Wav2Vec2.0, HuBERT, etc.

HS 120
15:30
90min
Introduction to pandas
Geir Arne Hjelle

This tutorial is an introduction to pandas intended for beginners.

pandas is one of Python's core packages for data science. pandas organizes data into DataFrames and provides powerful methods for manipulating them. The library is built on top of NumPy. It'll be helpful for the tutorial if you have some experience with NumPy arrays, for example, by following the Introduction to NumPy tutorial.

HS 118
08:30
08:30
90min
Introduction to PyTorch
Valerio Maggio

In this tutorial we will go through the main features of the PyTorch framework for Deep Learning.
We will start by learning how to build a neural network from the ground up, deep diving into torch.tensor, Dataset and optimisers.
We will analyse data cases from different domains (e.g. numerical, images), introducing different neural network layers and architecture. Last but not least, a few tips from a pure Data science-y perspective will be shared, to appreciate the wonderful integration PyTorch has with the Python Data model!

HS 120
08:30
90min
Introduction to SciPy
Vincent Maladiere

This tutorial will provide an introduction SciPy intended for beginners.

SciPy is a collection of mathematical algorithms and convenience functions built on the NumPy extension of Python. It adds significant power to the interactive Python session by providing the user with high-level commands and classes for manipulating and visualizing data.

HS 118
10:00
10:00
30min
Break
HS 120
10:00
30min
Break
HS 118
10:30
10:30
90min
Introduction to geospatial data analysis with GeoPandas
Joris Van den Bossche

This tutorial is an introduction to geospatial data analysis, with a focus on tabular vector data using GeoPandas. It will show how GeoPandas and related libraries can improve your workflow (importing GIS data, visualizing, joining and preparing for analysis, exploring spatial relationships, …).

HS 120
10:30
90min
Introduction to scikit-learn I
Arturo Amor, Arkadiusz Trawiński, PhD

This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning.

This tutorial will be subdivided into three parts. First, we will present how to design a predictive modeling pipeline that deals with heterogeneous types of data. Then, we will go more into detail in the evaluation of models and the type of trade-off to consider. Finally, we will show how to tune the hyperparameters of the pipeline.

HS 118
12:00
12:00
90min
Lunch
HS 120
12:00
90min
Lunch
HS 118
13:30
13:30
90min
Image processing with scikit-image
Emmanuelle Gouillart, Lars Grüter

Image data are used in many scientific fields such as astronomy, life sciences or material sciences. This tutorial will walk you through image processing with the scikit-image library, which is the numpy-native image processing library of the scientific python ecosystem.

The first hour of the tutorial will be accessible to beginners in image processing (some experience with numpy array is a pre-requisite), and will focus on some basic concepts of digital image manipulation and processing (filters, segmentation, measures). In the last half hour, we will focus on more advanced aspects and in particular Emma will speak about performance and acceleration of image processing.

HS 120
13:30
90min
Introduction to scikit-learn II
Arturo Amor, Arkadiusz Trawiński, PhD

This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning.

This tutorial will be subdivided into three parts. First, we will present how to design a predictive modeling pipeline that deals with heterogeneous types of data. Then, we will go more into detail in the evaluation of models and the type of trade-off to consider. Finally, we will show how to tune the hyperparameters of the pipeline.

HS 118
15:00
15:00
30min
Break
HS 120
15:00
30min
Break
HS 118
15:30
15:30
90min
Effectively using matplotib
Tim Hoffmann

This tutorial explains the fundamental ideas and concepts of matplotlib. It's suited for complete beginners to get started as well as existing users who want to improve their plotting abilities and learn about best practices.

HS 118
15:30
90min
Network Science with Python
Mridul Seth

This workshop is for data scientists and other programmers who want to add another tool in their data science toolkit. Modelling, analysing and visualising data as networks! Network Science deals with analysing network data, and the data can come from different fields like politics, finance, computer science, law and even Game of Thrones!

HS 120
08:30
08:30
30min
Registration
Aula
09:00
09:00
60min
JAX and Flax: Function Transformations and Neural Networks
Andreas Steiner

Modern accelerators (graphics processing units and tensor processing units) allow for high performance computing at massive scale. JAX traces computation in Python programs through the familiar numpy API, and uses XLA to compile programs that run efficiently on these accelerators. A set of composable function transformations allows for expressing versatile scientific computing with an elegant syntax.

Flax provides abstractions on top of JAX that make it easy to handle weights and other states that is required for solving problems using neural networks.

This talk first presents the basic JAX API that allows for computing gradients, compiling functions, or vectorizing computation. It then proceeds to cover other parts of the JAX ecosystem commonly used for neural network programming, such as basic building blocks and optimizers.

Aula
10:00
10:00
30min
Break
Aula
10:00
30min
Break
HS 120
10:00
30min
Break
HS 118
10:30
10:30
30min
Interactive Image Annotation with plotly and Dash
Emmanuelle Gouillart

Automatic image processing is a common task in many scientific and technological fields such as life sciences (with medical imaging), satellite imaging, etc. While machine learning is often used for efficient processing of such data sets, building a high-quality training set is an important task. Specialized software (such as rootpainter, ilastik) exist in different communities to build such training sets thanks to user annotations drawn on images.

In this talk, I will show how to use the open-source libraries plotly and dash to build custom interactive applications for interactive image annotation, and how to combine these tools with libraries such as scikit-image or machine learning/deep learning libraries for building a whole image processing pipeline.

Aula
11:05
11:05
30min
Decision making under uncertainty
Christian Barz

Python is the most popular programming language in the data space and is one of the major driver of many advancements in machine learning. However, it's much less know that the Python library Pyomo is a great tool for solving mathematical optimization problems common in operations research.

In this talk I'm demonstrating how Pyomo can be used to find optimal decisions when data is uncertain and how to combine data driven forecasts with optimal decision making.

HS 118
11:05
45min
Education - Materials, methods, tools
Mx Chiin-Rui Tan

This session focuses on issues related to education in the ecosystem, from three different aspects, and during the session, we focus on recent advances and existing and upcoming challenges.

  • Materials: how are projects dealing with documentation and education materials
  • Methods: What should we do to make our materials more accessible to underrepresented and/or historically marginalised groups?
  • Tools: What are the existing tools in the ecosystem helping us achieve the above goals, and what do we need to develop?

We will give an overview of these different aspects.

HS 119
11:05
30min
What is Contributor Experience?
Noa Tamir

In my current work as a contributor experience lead, I am supporting and growing Matplotlib’s and Pandas’ communities by organizing events, meetings, and proactive engagement with a focus on equity and inclusion of historically marginalized groups. In my talk I’ll give an introduction to this new role, the grant that supports it, and some of the work done so far…

I will share takeaways for maintainers, and contributors; from simple changes that can be implemented relatively easily, to bigger topics, which one might want to learn more about, and slowly yet proactively, facilitate changes to tweak the contributor experience for a project.

HS 120
11:40
11:40
30min
Discover Pythran through 10 code samples
Serge « sans » Paille

The Pythran compiler is used to speed-up generic Python scientific kernels across the world. Through ten code samples taken from scipy, scikit-image codebase and stack overflow snippets, this talks is going to demonstrate the major features of the compiler, as well as some technical nits!

HS 120
11:40
30min
conda-forge: supporting the growth of the volunteer-driven, community-based packaging project
Jannis Leidel, Wolf Vollprecht, Jaime Rodríguez-Guerra

The conda-forge project is one of the fastest growing Open Source communities out there – and most data scientists have probably heard of it. In this talk we explain the inner workings of conda-forge, its relationship to conda and PyPI, and we will explain how everyone can package software with conda-forge.

HS 118
12:10
12:10
85min
Lunch
HS 120
12:10
85min
Lunch
HS 118
13:35
13:35
15min
How to increase diversity in open source communities
Maren Westermann

Today state of the art scientific research strongly depends on open source libraries. The demographic of the contributors to these libraries is predominantly white and male [1][2][3][4]. In recent years there have been a number of various recommendations and initiatives to increase the participation in open source projects of groups who are underrepresented in this domain [1][3][5][6]. While these efforts are valuable and much needed, contributor diversity remains a challenge in open source communities [2][3][7]. This talk highlights the underlying problems and explores how we can overcome them.

HS 120
13:35
45min
[Maintainers track] Interoperability in the DataFrame landscape: DataFrame API & PyArrow Update
Joris Van den Bossche

This is part of the maintainers track.

In this session, we want to share some updates on the DataFrame ecosystem: the DataFrame interchange protocol (https://data-apis.org/dataframe-protocol/latest/purpose_and_scope.html) and Arrow C Data interface (https://arrow.apache.org/docs/format/CDataInterface.html), and the integration of those interoperability protocols with different libraries. Further, we want to have an open conversation about challenges and requirements related to DataFrame interoperability and supporting multiple DataFrame libraries in projects.

HS 119
13:35
15min
conda-forge, mamba, boa and quetz - the evolution of package management for data science and beyond
Wolf Vollprecht

Mamba is a fast, cross-platform and language independent package manager that is fully compatible with conda packages.
It has enabled the conda-forge project to scale way beyond what was previously possible.
In this talk we present further innovations in the mamba ecosystem, including boa, a new build tool based on mamba and quetz, an open-source and extensible package server for conda packages.

HS 118
13:55
13:55
15min
Array expressions and symbolic gradients in SymPy
Francesco Bonazzi

SymPy is an open source computer algebra system (CAS) written in Python.

The recent addition of the array expression module provides an alternative to the matrix expression module, with generalized support to higher dimensions (matrices are constrained to 2 dimensions).

Given the importance of multidimensional arrays in machine learning and mathematical optimization problems, this talk will illustrate examples of tensorial expressions in mathematics and how they can be manipulated using either module or in the index-explicit way.

Conversion tools have been provided to SymPy to allow users to switch an expression between the array form and either the matrix or index-explicit form. In particular, the conversion from array to matrix form attempts to represent contractions, diagonalizations and axis-permutations with operations commonly used in matrix algebra, such as matrix multiplication, transposition, trace, Hadamard and Kronecker products.

A gradient algorithm for array expressions has been implemented, returning a closed-form array expression equivalent to the derivative of arrays by arrays. The derivative algorithm for matrix expressions now uses this algorithm, attempting to convert the array back to matrix form if trivial dimensions can be dropped.

HS 118
13:55
15min
Emergent structures in noisy channel message-passing
Iliya Zhechev

We will explain a mechanism for generating neural network glyphs, like the glyphs we use in human languages. Glyphs are purposeful marks, images with 2D structures used to communicate information. We will use neural networks to generate those structured images, by optimizing for robustness.

Colab Notebook | Slides | Blog Post | Github Repo

HS 120
14:15
14:15
15min
Discrete event simulations of 'all electric' mines
Nicholas Hall

How a discrete event simulation can help mining companies reduce their dependence on diesel as a source of fuel for their large haulage trucks. Using open source software, mining environments are modeled, and helps decision making for building an all electric mine, where diesel powered vehicles are made obsolete.

HS 120
14:15
15min
Memory maps to accelerate machine learning training
Hristo Vrigazov

Memory-mapped files are an underused tool in machine learning projects, which offer very fast I/O operations, making them suitable for storing datasets during training that don't fit into memory.
In this talk, we will discuss the benefits of using memory maps, their downsides, and how to address them.

HS 118
14:35
14:35
15min
Deep learning at the Radiology & Nuclear Medicine Clinic / University Hospital Basel
Joshy Cyriac, Jakob Wasserthal

Deep learning can assist radiology doctors in interpreting and analyzing radiology images. We will present use cases which are used today in clinical practice. These range from organ segmentation to image classification.

HS 120
14:35
15min
Optimizing inference for state of the art python models
Ed Shee

This talk will take state of the art python models and show how, through advanced inference techniques, we can drastically increase the performance of the models at runtime. You’ll learn about the open source MLServer project and see live how easily it helps serve python-based machine learning models.

HS 118
14:35
45min
[Maintainers track] Python in the browser
Roman Yurchak, Thorsten Beier

This session is part of the mainters track.

Recently it became possible to run Python and the scientific Python packages in the browser thanks to WebAssembly and Emscripten. This is done in particular in the Pyodide and emscripten-forge projects. It allows for a scientific Python application, or a compute environment such as JupyterLite, to be seamlessly accessible to a large number of users with very little effort or infrastructure requirements.

At the same time, the scientific Python ecosystem did not evolve with the web in mind. We will discuss some of the challenges package maintainers may face when trying to run their package in the browser, and what could be done to overcome these.

HS 119
14:55
14:55
15min
How to make the most precise measurement
Markus Gruber

Computer chips are created using photolithography. Today's lithography machines are highly complex machines containing ultra-high precision optics. How do you create and in particular measure these optics? That's easy, you build the world's best interferometer. But what if that's not enough?

HS 120
14:55
15min
Industrial Strength DALLE-E: Scaling Complex Large Text & Image Models
Alejandro Saucedo

Identifying the right tools to enable for high performance machine learning may be overwhelming as the ecosystem continues to grow at break-neck speed. This becomes particularly emphasised when dealing with the ever growingly popular large language and image generation models such as GPT2, OTP and DALL-E, between others. In this session we will dive into a practical showcase where we will be productionising the large image generation model DALL-E, and showcase some optimizations that can be introduced as well as considerations as the use-cases scale. By the end of this session practitioners will be able to run their own DALL-E powered applications as well as integrate these with functionalities from other large language models like GPT2, etc. We will be leveraging key tools in the Python ecosystem to achieve this, including Pytorch, HuggingFace, FastAPI and MLServer.

HS 118
15:10
15:10
30min
Break
Aula
15:10
30min
Break
HS 120
15:10
30min
Break
HS 118
15:40
15:40
80min
Poster Session: Software Packages
Aula
09:00
09:00
60min
Supercharging Open Data with Open Privacy
Katharine Jarmul

Privacy is becoming an increasingly pressing topic in data collection and data science. Thankfully, Privacy Enhancing Technologies (or PETs) are maturing alongside the growing demand and concern. In this keynote, we’ll explore what possibilities emerge when using Privacy Enhancing Technology like differential privacy, encrypted computation and federated learning and investigate how these technologies could change the face of data science today.

Aula
10:00
10:00
30min
Break
Aula
10:00
30min
Break
HS 120
10:00
30min
Break
HS 118
10:30
10:30
30min
Lessions learned from 10 years of Python in industrial reseach and development
Tim Hoffmann

This talk explains why Python is a good choice for research and development. It spans the arch from a conceptual, almost philosophical, understanding of the software needs of reseach and development up to concrete organiziational strategies.

HS 120
10:30
30min
Sliding into Causal Inference, with Python!
Alon Nir

What would the world look like if Russia had won the cold war? If the Boston Tea Party never happened? And where would we all be if Guido van Rossum had decided to pursue a career in theatre? Unfortunately we don't have the technology to slide into parallel worlds and explore alternative histories. However it turns out we do have the tools to simulate parallel realities and give decent answers to intriguing 'what if' questions. This talk will provide a gentle introduction to these tools, professionally known as Causal Inference.

HS 118
10:30
45min
[Maintainer Track] Scientific Python / SPECs
Jarrod Millman

The Scientific Python project aims to better coordinate the ecosystem and grow the community. This session focuses on our efforts to better coordinate project development, and to improve shared infrastructure. In this session together we will discuss project goals and recent technical work.

The Scientific Python project’s vision is to help pave the way towards a unified, expanded scientific Python community. It focuses its efforts along two primary axes: (i) to create a joint community around all scientific projects and (ii) to support maintainers by building cross-cutting technical infrastructure and tools. In this session we mostly focus on the second aspect.

The project has already launched a process whereby projects can, voluntarily, adopt reference guidelines; these are known as SPECs or Scientific Python Ecosystem Coordination documents. SPECs are similar to projects specific guidelines like PEPs, NEPs, SLEPs, and SKIPs, to name a few. The distinction being that SPECs have a broader scope, targeted at all (or most) projects from the scientific Python ecosystem.

The project also provides and maintains tools to help maintainers. This includes a theme for the project websites (used on, e.g., numpy.org and scipy.org), a self-hosted privacy-friendly web analytics platform, a community discussions forum, a technical blog, and project development statistics.

We present these tools, discuss various upcoming SPECs, and highlight the project’s future potential.

HS 119
11:05
11:05
15min
Machine learning with missing values
Gaël Varoquaux

This talk will cover how to build predictive models that handle well missing values, using scikit-learn. It will give on the one side the statistical considerations, both the classic statistical missing-values theory and the recent development in machine learning, and on the other side how to efficiently code solutions.

HS 118
11:05
15min
Open Source Mission Support System for research aircraft missions
Reimar Bauer

The Mission Support System Software (MSS) is a client/server application developed in the community to collaboratively create flight plans based on model data. Through conda-forge, the components of MSS can be used on different platforms.

HS 120
11:25
11:25
15min
Data-Driven Thresholding for Extreme Event Detection in Geosciences
Milton Gomez

Extreme weather events are a well known source of human suffering, loss of life, and financial hardship. Amongst these, tropical cyclones are notoriously impactful, leading to significant interest in predicting the genesis, tracks, and intensity of these storms - a task which continues to present significant challenges. In particular, tropical cyclogenesis (TCG) can be described as “a needle in a haystack” problem, and steps must be taken to make predictions tractable. Previously, the filtering of non-genesis points by thresholding predictive variables has been described, with thresholds being selected to reduce the number of discarded TCG cases. In the art, this thresholding has often been carried out empirically, that while effective relies on domain knowledge. This talk instead proposes the development of a systematic, machine-learning-based approach implemented in Python. The method is designed to be interpretable to the point of becoming transparent machine learning. Threshold values that minimize the false-alarm rate and maintain a high recall are found, and then combined in a forward selection algorithm. As other extreme events in the geosciences are considered needle in the haystack problems, the described approach can be of use in reducing the variable space in which to study and predict the events. Finally, the transparent nature of the proposed approach can provide simple insight into the conditions in which these events occur.

HS 120
11:25
15min
Discovering Mathematical Optimization with Python
Pamela Alejandra Bustamante Faúndez

Mathematical optimization is the selection of the best alternative with respect to some criterion, among a set of candidate options.

There are multiple applications of mathematical optimization. For example, in investment portfolio optimization, we search for the best way to invest capital given different alternatives. In this case, an optimization problem will allow us to choose a portfolio that minimizes risk (or maximizes profit), among all possible allocations that meet the defined requirements.

In most cases, mathematical optimization is used as a tool to facilitate decision-making. Sometimes these decisions can be made automatically in real-time.

This talk will explore how to formulate and solve mathematical optimization problems with Python, using different optimization libraries.

HS 118
11:25
45min
[Maintainers Track] Contributor Experience & Diversity
Noa Tamir

This is part of the maintainers track.

Most of us have been hearing about Diversity Equity and Inclusion (DEI) for some years now, and even had access to many resources by now. Our projects have codes of conduct, and some have been doing sprints and mentorships. But how much has fundamentally changed?

Let’s meet for an honest conversation about the challenges of DEI actions, and culture change. How do we achieve long-term impact? What are low-hanging fruit? We can share hard-to-ask questions, effective tools, experiences that shaped our approach, and see if we can all nudge each other forward a little.

Inclusion happens at the community level, also when we want to address DEI itself. So, we will need to create a safe space for hard questions and leave judgment at the door.

Thanks to our grant to advance an inclusive culture in the scientific Python ecosystem, we have created the contributor experience lead role. We have been working with NumPy, SciPy, Matplotlib, and pandas to learn how to integrate this new role to a project, and how to introduce contributor hospitality techniques. We are working on creating widely available resources, and we would benefit from hearing from maintainers from the wider community.

HS 119
11:45
11:45
30min
Increase citations, ease review & collaboration – Making machine learning in research reproducible
Jesper Dramsch

Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.

This tutorial aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.

This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.

HS 118
11:45
30min
Real-time estimation of an heat pump I/O state with IoT data.
Davide Poggiali

In the present time, we are facing a continuous growing of the energy price. It is then important to optimize the use of heat pumps, both in domestic and industrial environments. Using an opportunely labeled dataset of accelerometer, speed or relative position over time coming from a cheap sensor it is possible to estimate the I/O state of any heating or cooling engine. This new real-time measure allows then to compute the energy consumption and to study the most cheap usage scheme.
In this presentation we will show a real-case implementation of some fast binary classifiers, from basic statistics to machine learning, assessing the performance of each method in terms of computational time, precision and accuracy levels.

HS 120
12:15
12:15
60min
Lunch
HS 120
12:15
60min
Lunch
HS 118
13:15
13:15
30min
A Primer to Maintainable Code
Alexander CS Hendorf

In this talk, I'll give an overview of software quality and why it's important - especially for scientists. Provide best practices and libraries to dive deeper into, hypes to ignore, and simple guidelines to follow to write code that your peers will love.

After the talk, the audience will have a guide on how to develop better code and be aware of potential blind spots.

HS 120
13:15
30min
Scientific Python in the browser with Pyodide
Roman Yurchak

In this talk, we will look at the growing Python in the browser ecosystem, with a focus on the Pyodide project. We will discuss the remaining challenges as well as new possibilities it offers for scientific computing, education, and research.

HS 118
13:50
13:50
30min
Interactive Data Science in the browser with JupyterLite and Emscripten Forge
Jeremy Tuloup, Thorsten Beier, Martin Renou

JupyterLite is a Jupyter distribution that runs entirely in the web browser, backed by in-browser language kernels including WebAssembly powered Jupyter Xeus kernels and Pyodide.

JupyterLite enables data science and interactive computing with the PyData scientific stack, directly in the browser, without installing anything or running a server.

JupyterLite leverages the Emscripten and Conda Forge infrastructure, making it possible to easily install custom packages with binary extensions in the browser, such as numpy, scipy and scikit-learn.

HS 118
13:50
30min
scikit-learn and fairness, tools and challenges
Adrin Jalali

Fairness, accountability, and transparency in machine learning have become a major part of the ML discourse. Since these issues have attracted attention from the public, and certain legislation are being put in place regulating the usage of machine learning in certain domains, the industry has been catching up with the topic and a few groups have been developing toolboxes to allow practitioners incorporate fairness constraints into their pipelines and make their models more transparent and accountable. Some examples are fairlearn, AIF360, LiFT, fairness-indicators (TF), ...

This talk explores some of the tools existing in this domain and discusses work being done in scikit-learn to make it easier for practitioners to adopt these tools.

HS 120
14:20
14:20
30min
Break
HS 120
14:20
30min
Break
HS 118
14:50
14:50
30min
Revolutionalise Data Visulization with PyScript
Cheuk Ting Ho

Since the announcement of PyScript, it has gained lots of attention and imagination about how we can run applications of Python in the browser. Out of everything that I have come across, most of the use cases are data visualisation. Let's see how we can up our data viz game with PyScript.

HS 118
14:50
30min
Scaling scikit-learn: introducing new computational foundations
Julien Jerphanion

scikit-learn is an open-source scientific library for machine learning in Python. In this talk, we will present the recent work carried over by the scikit-learn core-developers team to improve its native performance.

HS 120
15:25
15:25
15min
Continuous and on demand benchmarking
Mridul Seth

We all know and love our carefully designed CI pipelines, which tests our code and makes sure by adding some code or fixing a bug we aren’t introducing a regression in the codebase. But we often don’t give the same treatment to benchmarking as we give to correctness. The benchmarking tests are usually one off scripts written to test a specific change. In this talk, we will discuss various strategies to test our code for performance regressions using ASV (airspeed velocity) for python projects.

HS 120
15:25
15min
Pragmatic Panel: Build and Deploy Complex Data-Driven WebApps
Pierre-Olivier Simonard

Panel is one of the leading choices for building dashboards in Python. In this talk, we discuss the practical aspects of complex data-driven dashboards. There are tutorials and guides available which help teach new users the basics, but this talk focuses on the challenges of building more complex, industry-ready, deployed dashboards. There are a variety of niche issues which arise when you push the limits of complexity, and we will share the solutions we have developed. We will demonstrate these solutions as we walk through the entire lifecycle from data ingestion, though exploratory analysis to deployment as a finished website.

HS 118
16:00
16:00
30min
Lightning talks
Aula
16:30
16:30
30min
Closing notes
Aula
09:00
09:00
180min
EuroSciPy Sprint
Rosshof S01
09:00
180min
EuroSciPy Sprint
Rosshof S02
12:00
12:00
90min
Lunch
Rosshof S01
12:00
90min
Lunch
Rosshof S02
13:30
13:30
240min
EuroSciPy Sprint
Rosshof S01
13:30
240min
EuroSciPy Sprint
Rosshof S02