EuroSciPy 2024

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:00
09:00
90min
Introduction to Python
Mojdeh Rastgoo

This tutorial will provide an introduction to Python intended for beginners.

It will notably introduce the following aspects:

  • built-in types
  • controls flow (i.e. conditions, loops, etc.)
  • built-in functions
  • basic Python class
Community, Education, and Outreach
Room 6
09:00
90min
What is the magic of magic methods in the Python language?
Paweł Żal

Welcome to this tutorial on Python's magic methods, often underestimated or overlooked in programming practice. And yet, magic methods make it easy to write code that implements often very complex algorithms in a readable way.

In this tutorial, we will learn how to create and use magic methods and talk about some interesting non-trivial magic methods.

This tutorial will have classroom exercises, post class homeworks as well as complimentary readings. All the presentation, code, exercises will be shared in advance (~ 2 - 3 days) and the solutions of the exercise will be shared after the tutorial is completed.

Community, Education, and Outreach
Room 5
10:30
10:30
30min
Break
Room 6
10:30
30min
Break
Room 5
11:00
11:00
90min
Decorators - A Deep Dive
Mike Müller

Python offers decorator to implement re-usable code for cross-cutting task.
The support the separation of cross-cutting concerns such as logging, caching,
or checking of permissions.
This can improve code modularity and maintainability.

This tutorial is an in-depth introduction to decorators.
It covers the usage of decorators and how to implement simple and more advanced
decorators.
Use cases demonstrate how to work with decorators.
In addition to showing how functions can use closures to create decorators,
the tutorial introduces callable class instance as alternative.
Class decorators can solve problems that use be to be tasks for metaclasses.
The tutorial provides uses cases for class decorators.

While the focus is on best practices and practical applications, the tutorial
also provides deeper insight into how Python works behind the scene.
After the tutorial participants will feel comfortable with functions that take
functions and return new functions.

Scientific Applications
Room 5
11:00
90min
Introduction to NumPy
Sarah Diot-Girard

Are you starting to use Python for scientific computing? Join this tutorial to know more about NumPy, the building block for nearly all libraries in the scientific ecosystem.
You will learn how to manipulate NumPy arrays, understand how they store data and discover how to get optimal performances. By the end of this tutorial, you will be able to start working with NumPy and know the main pitfalls to avoid.

Data Science and Visualisation
Room 6
12:30
12:30
90min
Lunch
Room 6
12:30
90min
Lunch
Room 5
14:00
14:00
90min
Introduction to matplotlib for Data Visualization with Python
Nefta Kanilmaz

matplotlib is a library for creating visualizations with Python which "...makes easy things easy and hard things possible" (https://matplotlib.org/). This tutorial, intended for beginners, will introduce the library and explain core concepts as well as the main interfaces. Starting with styling simple point data plots, we will explain how to work with several dimensions, shared axes and advanced styling options using rcParams. After completing this tutorial, participants will hopefully be equipped with a thorough understanding of matplotlib to navigate the "hard things" in the world of data visualization.

Data Science and Visualisation
Room 6
14:00
90min
Probabilistic classification and cost-sensitive learning with scikit-learn
Guillaume Lemaitre, Olivier Grisel

Data scientists are repeatedly told that it is absolutely critical to align their model training methodology with a specific business objective. While being a rather good advice, it usually falls short on details on how to achieve this in practice.

This hands-on tutorial aims to introduce helpful theoretical concepts and concrete software tools to help them bridge this gap. This method will be illustrated on a worked practical use case: optimizing the operations of a fraud detection system for a payment processing platform.

More specifically, we will introduce the concepts of calibrated probabilistic classifiers, how to evaluate them and fix common causes of mis-calibration. In a second part, we will explore how to turn probabilistic classifiers into optimal business decision makers.

The tutorial material is available at the following URL: https://github.com/probabl-ai/calibration-cost-sensitive-learning

Machine and Deep Learning
Room 5
15:30
15:30
30min
Break
Room 6
15:30
30min
Break
Room 5
16:00
16:00
90min
Image analysis in Python with scikit-image
Lars Grüter, Marianne Corvellec, Stéfan van der Walt

Scientists are producing more and more images with telescopes, microscopes, MRI scanners, etc. They need automatable tools to measure what they've imaged and help them turn these images into knowledge. This tutorial covers the fundamentals of algorithmic image analysis, starting with how to think of images as NumPy arrays, moving on to basic image filtering, and finishing with a complete workflow: segmenting a 3D image into regions and making measurements on those regions.

Data Science and Visualisation
Room 6
16:00
90min
Using the Array API to write code that runs with Numpy, Cupy and PyTorch
Tim Head, Sebastian Berg

Python code that works with Numpy, Cupy and PyTorch arrays? Use a GPU when possible, but fallback to using a CPU if there is none? We will show you how you can write Python code that can do all of the above. The not so secret ingredient to do this is the Array API. In this workshop you will learn what the Array API is and how to use it to write programs that can take any compatible array as input.

High Performance Computing
Room 5
09:00
09:00
90min
Building robust workflows with strong provenance
Alexander Goscinski, Julian Geiger, Ali Khosravi

In computational science, different software packages are often glued together as scripts to perform numerical experiments. With increasing complexity, these scripts become unmaintainable, prone to crashes, hard to scale up and to collaborate on. AiiDA solves these problems via a powerful workflow engine and by keeping provenance for the entire workflow. In this tutorial, we learn how to create dynamic workflows combining together different executables that automatically can restart from failed runs and reuse results from completed calculations via caching.

Scientific Applications
Room 5
09:00
90min
Introduction to Polars: Fast and Readable Data Analysis
Geir Arne Hjelle

Polars is a new, powerful library for doing analysis on structured data. The library focuses on processing speed and a consistent and intuitive API. This tutorial will help you get started with Polars, by showing you how to read and write data and manipulate it with Polars' powerful expression syntax. You'll learn about how the lazy API is an important key to Polars' efficiency.

Data Science and Visualisation
Room 6
10:30
10:30
30min
Break
Room 6
10:30
30min
Break
Room 5
11:00
11:00
90min
Combining Python and Rust to create Polars Plugins
Marco Gorelli

Polars is a dataframe library taking the world by storm. It is very runtime and memory efficient and comes with a clean and expressive API. Sometimes, however, the built-in API isn't enough. And that's where its killer feature comes in: plugins. You can extend Polars, and solve practically any problem.

No prior Rust experience required, intermediate Python or general programming experience required. By the end of the session, you will know how to write your own Polars Plugin! This talk is aimed at data practitioners.

Data Science and Visualisation
Room 5
11:00
90min
Using Wikipedia as a language corpus for NLP
Jakub B. Jagiełło

Learning NLP often requires a corpus of sample texts. The common choice is Wikipedia. The project is open source and has huge amounts of a natural language content in dozens of languages. Happily the Wikimedia Foundation publishes data dumps in XML format, which could be easily parsed. In this tutorial you will learn how to do that in Python.

Scientific Applications
Room 6
12:30
12:30
90min
Lunch
Room 6
12:30
90min
Lunch
Room 5
14:00
14:00
90min
Introduction to Machine Learning with scikit-learn and Pandas
Justyna Szydłowska-Samsel

With Machine Learning becoming a topic of high interest in the scientific community, over the years, many different programming languages and environments have been used for Machine Learning research and system development. Python is known as easy to learn, yet powerful programming languages and has become a popular choice among professionals and amateurs. This tutorial will provide instructions on the usage of two popular Python libraries: Scikit-learn and Pandas, in Machine Learning modeling.

Machine and Deep Learning
Room 6
14:00
90min
Multi-dimensional arrays with Scipp
Mridul Seth

Inspired by xarray, Scipp enriches raw NumPy-like multi-dimensional data arrays by adding named dimensions and associated coordinates. For an even more intuitive and less error-prone user experience, Scipp adds physical units to arrays and their coordinates. Through this tutorial, participants will learn about the basics of modelling their data with the Scipp library and using in built tools in Scipp for scientific data analysis.

One of Scipp's key features is the possibility of using multi-dimensional non-destructive binning to sort record-based "tabular"/"event" data into arrays of bins. This provides fast and flexible binning, rebinning, and filtering operations, all while preserving the original individual records.

Scipp ships with data display and visualization features for Jupyter notebooks, including a powerful plotting interface. Named Plopp, this tool uses a graph of connected nodes to provide interactivity between multiple plots and widgets, requiring only a few lines of code from the user.

Scipp is available via pip and conda and runs on Linux, Mac and Windows.

High Performance Computing
Room 5
15:30
15:30
30min
Break
Room 6
15:30
30min
Break
Room 5
16:00
16:00
90min
A Hitchhiker's Guide to Contributing to Open Source
Sebastian Berg, Nikoleta E. Glynatsi

Open-source projects are essential for scientific programming. They provide many tools and resources that can be customized for different scientific needs. However, sometimes the existing tools in a package don't meet all the requirements of a project. This is when contributing to open-source packages becomes important. By contributing, you can implement new functionalities, improve the software and help keep the open-source community strong.

This workshop will make contributing to open-source projects easier to understand. It will guide participants from just using the software to actively contributing to it. The workshop will address technical challenges such as interacting with web-based hosting services (like GitHub and GitLab), branching, and opening pull requests. Additionally, it will cover how to contribute documentation and ensure the correctness of the code.

We will use the following repository during the workshop: https://github.com/Nikoleta-v3/HitchCos.

You can find a checklist of prerequisites and installation notes here: https://github.com/Nikoleta-v3/HitchCos/wiki/Prerequisites.

Community, Education, and Outreach
Room 6
16:00
90min
sktime - python toolbox for time series – introduction and new features 2024: foundation models, deep learning backends, probabilistic models, hierarchical demand forecasting, marketplace features
Franz Kiraly, Felipe Angelim, Muhammad Armaghan Shakir, Benedikt Heidrich

sktime is the most widely used scikit-learn compatible framework library for learning with time series. sktime is maintained by a neutral non-profit under permissive license, easily extensible by anyone, and interoperable with the python data science stack.

This tutorial gives a hands-on introduction to sktime, for common time series learning tasks such as forecasting, starting with a general overview of the package and forecasting interfaces for uni- and multivariate forecasts with endo-/exogeneous data, probabilistic forecasts, and forecasting in the presence of hierarchical data.

The tutorial then proceeds to showcase some of the newest features in 2024, based on a hierarchical demand forecasting use case example: support for foundation models, hugging face connectors, advanced support for hierarchical and global forecasts, and integration features for creating API compatible algorithms and sharing them via the sktime discoverability tools.

Machine and Deep Learning
Room 5
09:00
09:00
60min
10 Years of Open Source: Navigating the Next AI Revolution
Ines Montani

A lot has been happening in the field of AI and Natural Language Processing: there's endless excitement about new technologies, sobering post-hype hangovers and also uncertainty about where the field is heading next. In this talk, I'll share the most important lessons we've learned in 10 years of working on open source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs

Community, Education, and Outreach
Room 7
10:00
10:00
30min
Coffee Break
Room 7
10:00
30min
Coffee Break
Room 6
10:30
10:30
30min
Federated Learning: Where we are and where we need to be
Katharine Jarmul

In this talk, we'll review the landscape of open-source federated learning libraries with a lens on actual real world data problems, use cases and actors who could benefit from federated learning. We'll then analyze gaps, weaknesses and explore new ways we could formulate federated learning problems (and their associated libraries!) to build more useful software and use decentralized machine learning in real world use cases.

Machine and Deep Learning
Room 7
10:30
30min
From stringly typed to strongly typed: Insights from re-designing a library to get the most out of type hints
Janos Gabler

Many scientific Python packages are "stringly typed," i.e., using strings to select algorithms or methods and dictionaries for configuration. While easy for beginners and convenient for authors, these libraries miss out on static typing benefits like error detection before runtime and autocomplete. This talk shares insights from redesigning the optimagic library from the ground up with static typing in mind. Without compromising on simplicity, we achieve better static analysis, autocomplete, and fewer runtime errors. The insights are not specific to numerical optimization and apply to a wide range of scientific Python packages.

Community, Education, and Outreach
Room 6
10:30
45min
OpenGL is dying, let's talk about WebGPU
Almar Klein

OpenGL is old and on a path to being deprecated. Modern GPU API's like Vulkan and Metal solve most problems that plague OpenGL, and higher abstractions like wgpu / WebGPU provide a modern interface to control GPU hardware. The way that these work is much more pleasant to work with, and also provides performance benefits, especially for Python.

Data Science and Visualisation
Room 5
11:05
11:05
30min
Helmholtz Blablador and the LLM models' ecosystem
Alexandre Strube

Helmholtz Blablador is the LLM inference server from the Helmholtz Foundation. This talk explores Blablador's role in hosting open-source LLM models and models developed in-house at the Juelich Supercomputing Centre (JSC). This talk is about Blablador and the open source LLM models' ecosystem.

Machine and Deep Learning
Room 7
11:05
30min
Understanding NetworkX's API Dispatching with a parallel backend
Erik Welch, Aditi Juneja

Hi! Have you ever wished your pure Python libraries were faster? Or wanted to fundamentally improve a Python library by rewriting everything in a faster language like C or Rust? Well, wish no more... NetworkX's backend dispatching mechanism redirects your plain old NetworkX function calls to a FASTER implementation present in a separate backend package by leveraging the Python's entry_point specification!

NetworkX is a popular, pure Python library used for graph(aka network) analysis. But when the graph size increases (like a network of everyone in the world), then NetworkX algorithms could take days to solve a simple graph analysis problem. So, to address these performance issues this backend dispatching mechanism was recently developed. In this talk, we will unveil this dispatching mechanism and its implementation details, and how we can use it just by specifying a backend kwarg like this:

>>> nx.betweenness_centrality(G, backend=“parallel”)

or by passing the backend graph object(type-based dispatching):

>>> H = nxp.ParallelGraph(G)
>>> nx.betweenness_centrality(H)

We'll also go over the limitations of this dispatch mechanism. Then we’ll use the example of nx-parallel as a guide to building our own custom NetworkX backend. And then, using NetworkX's existing test suite, we'll test this backend that we build. Ending with a quick dive into the details of the nx-parallel backend.

Community, Education, and Outreach
Room 6
11:15
11:15
45min
Scientific Python
Jarrod Millman, Stéfan van der Walt

Learn more about the Scientific Python project (https://scientific.python.org): what it aims to achieve (helping the developer community), recent progress that has been made, and how to become involved.

Community, Education, and Outreach
Room 5
11:40
11:40
20min
Data augmentation with Scikit-LLM
Claudio Giorgio Giancaterino

Scikit-LLM is an innovative Python library, seamlessly integrates Large Language Models into the Scikit-Learn framework. Scikit-LLM becomes a powerful tool for natural language processing (NLP) tasks within the Scikit-Learn pipeline, and I'll showcase a data augmentation action to build features using zero-shot text classification and text vectorization.

Machine and Deep Learning
Room 7
11:40
20min
Enhancing Bayesian Optimization with Ensemble Models for Categorical Domains
Ilya Komarov

Bayesian optimization is a powerful technique for optimizing black-box, costly-to-evaluate functions, widely applicable across diverse fields. However, Gaussian process (GP) models commonly used in Bayesian optimization struggle with functions defined on categorical or mixed domains, limiting optimization in scenarios with numerous categorical inputs. In this talk, we present a solution by leveraging ensemble models for probabilistic modelling, providing a robust approach to optimize functions with categorical inputs. We showcase the effectiveness of our method through a Bayesian optimization setup implemented with the BoTorch library, utilizing probabilistic models from the XGBoostLSS framework. By integrating these tools, we achieve efficient optimization on domains with categorical variables, unlocking new possibilities for optimization in practical applications.

Data Science and Visualisation
Room 6
12:00
12:00
80min
Lunch Break
Room 7
12:00
80min
Lunch Break
Room 6
13:20
13:20
30min
Skrub: prepping tables for machine learning
Guillaume Lemaitre, Vincent Maladiere, Jérôme Dockès

When it comes to designing machine learning predictive models, it is reported that data scientists spend over 80% of their time preparing the data to input to the machine learning algorithm.

Currently, no automated solution exists to address this problem. However, the skrub Python library is here to alleviate some of the daily tasks of data scientists and offer an integration with the scikit-learn machine learning library.

In this talk, we provide an overview of the features available in skrub.

First, we focus on the preprocessing stage closest to the data sources. While predictive models usually expect a single design matrix and a target vector (or matrix), in practice, it is common that data are available from different data tables. It is also possible that the data to be merged are slightly different, making it difficult to join them. We will present the skrub joiners that handle such use cases and are fully compatible with scikit-learn and its pipeline.

Then, another issue widely tackled by data scientists is dealing with heterogeneous data types (e.g., dates, categorical, numerical). We will present the TableVectorizer, a preprocessor that automatically handles different types of encoding and transformation, reducing the amount of boilerplate code to write when designing predictive models with scikit-learn. Like the joiner, this transformer is fully compatible with scikit-learn.

Machine and Deep Learning
Room 7
13:20
30min
The joys and pains of reproducing research: An experiment in bioimaging data analysis
Marianne Corvellec

The conversation about reproducibility is usually focused on how to make research workflows (more) reproducible. Here, we consider it from the opposite perspective, and ask: How feasible is it, in practice, to reproduce research which is meant to be reproducible? Is it even done or attempted? We provide a detailed account of such an attempt, trying to reproduce some segmentation results for 3D microscopy images of a developing mouse embryo. The original research is a monumental work of bioimaging and analysis at the single-cell level, published in Cell in 2018, alongside with all the necessary research artifacts. Did we succeed in this attempt? As we share the joys and pains of this journey, many questions arise: How do reviewers assess the reproducibility claims exactly? Incentivizing reproducible research is still an open problem, since it is so much more costly (in time) to produce. And how can we incentivize those who test reproducibility? Not only is it costly to set up computational environments and execute data-intensive scientific workflows, but it may not appear as rewarding at first thought. In addition, there is a human factor: It is thorny to show authors that their publication does not hold up to their reproducibility claims.

Data Science and Visualisation
Room 6
13:55
13:55
20min
From data analysis in Jupyter Notebooks to production applications: AI infrastructure at reasonable scale
Frank Sauerburger

The availability of AI models and packages in the Python ecosystem has revolutionized many applications across domains. This talk discusses infrastructural decisions and best practices that bridge the gap between interactive data analyses in notebooks and production applications at a reasonable scale, suitable for both commercial and scientific contexts. In particular, the talk introduces the on-premises, Python-based AI architecture employed at MDPI, one of the largest open-access publishers. The presentation emphasizes the impact of the design on reproducibility, decoupling of different resources, and ease of use during the development and exploration phases.

Machine and Deep Learning
Room 7
13:55
30min
Mostly Harmless Fixed Effects Regression in Python with PyFixest
Alexander Fischer

This session introduces PyFixest, an open source Python library inspired by the "fixest" R package. PyFixest implements fast routines for the estimation of regression models with high-dimensional fixed effects, including OLS, IV, and Poisson regression. The library also provides tools for robust inference, including heteroscedasticity-robust and cluster robust standard errors, as well as the wild cluster bootstrap and randomization inference. Additionally, PyFixest implements several routines for difference-in-differences estimation with staggered treatment adoption.

PyFixest aims to faithfully replicate the core design principles of "fixest", offering post-estimation inference adjustments, user-friendly syntax for multiple estimations, and efficient post-processing capabilities. By making efficient use of jit-compilation, it is also one of the fastest solutions for regressions with high-dimensional fixed effects.

The presentation will argue why there is a need for another regression package in Python, cover PyFixest's functionality and design philosophy, and discuss future development prospects.

Data Science and Visualisation
Room 6
13:55
50min
[CHANGE OF PROGRAM] Informal discussions about switching build backends
Ralf Gommers

Goals:

  • Share tips, tricks and best practices for configuring the build backend of a Python package with compiled (Cython/C/C++/Rust/Fortran) code
  • Identify shared needs between packages, and discuss gaps in current build backends, documentation, or shared infrastructure

Topics:

  • Goals to aim for in your build config (and how to achieve them):
  • Faster builds and relevant tooling like profiling,
  • Build logs that actually help when diagnosing issues,
  • How to debug build failures effectively,
  • How to check for and visualize build dependencies,
  • Ensuring builds are reproducible,
  • Approaches to reducing binary size,
  • CI config ideas to guard against regressions
  • Recent build-related developments & a post-distutils world
  • What are the most pressing pain points for maintainers?
Scientific Applications
Room 5
14:25
14:25
20min
A Qdrant and Specter2 framework for tracking resubmissions of rejected manuscripts in academia
Daniele Raimondi

This presentation introduces a Qdrant vector DB and Specter2 model used to identify whether a rejected academic manuscript is later published in a competing journal. Our method combines AI, data science and analytics to ensure a good identification of manuscripts and authors. The findings offer insights into resubmission patterns, enhancing our understanding of academic publishing dynamics. The system is implemented in Python.

Data Science and Visualisation
Room 7
14:25
20min
Conformal Prediction with MAPIE: A Journey into Reliable Uncertainty Quantification
Claudio Giorgio Giancaterino

In the ever-evolving landscape of data science, accurate uncertainty quantification is crucial for decision-making processes. Conformal Prediction (CP) stands out as a powerful framework for addressing this challenge by providing reliable uncertainty estimates alongside predictions. In this talk, I'll delve into the world of Conformal Prediction, with a focus on the MAPIE Python library, offering a comprehensive understanding of its advantages and practical applications.

Data Science and Visualisation
Room 6
15:00
15:00
30min
Coffee Break
Room 7
15:00
30min
Coffee Break
Room 6
15:30
15:30
60min
Poster Spotlight+Lightning Session
Room 7
16:30
16:30
90min
Poster Session
Room 7
09:00
09:00
60min
Just contribute?!
Wolf Vollprecht

Open source software is here for everyone - but how are we making sure that everyone has equal access?
In this keynote I will discuss how to lower barriers of entry for new contributors - and the many facets to this: documentation, community, guidelines, and tools.
I will share my personal motivations for contributing to open-source software and my journey over the past five years and all of its learnings.

Community, Education, and Outreach
Room 7
10:00
10:00
30min
Coffee Break
Room 7
10:00
30min
Coffee Break
Room 6
10:30
10:30
30min
Optimagic: Can we unify Python's numerical optimization ecosystem?
Janos Gabler

Python has many high quality optimization algorithms but they are scattered across many different packages. Switching between packages is cumbersome and time consuming. Other languages are ahead of Python in this respect. For example, Optimization.jl provides a unified interface to more than 100 optimization algorithms and is widely accepted as a standard interface for optimization in Julia.

In this talk, we take stock of the existing optimization ecosystem in Python and analyze pain points and reasons why no single package has emerged as a standard so far. We use these findings to derive desirable features a Python optimization package would need to unify the ecosystem.

We then present optimagic, a NumFocus affiliated Project with the goal of unifying the Python optimization ecosystem. Optimagic provides a common interface to optimization algorithms from scipy, NlOpt, pygmo, and many other libraries. The minimize function feels familiar to users of scipy.optimize who are looking for a more extensive set of
supported optimizers. Advanced users can use optional arguments to configure every aspect of the optimization, create a persistent log file, turn local optimizers global with a multistart framework, and more.

Finally, we discuss an ambitious roadmap for improvements, new features, and planned community activities for optimagic.

Community, Education, and Outreach
Room 7
10:30
30min
The Parallel Universe in Python - A Time Travel to Python 3.13 and beyond
Mike Müller

Parallel computing is essential for many performance-critical applications. Python provides many solutions for this problem. New versions of Python will support sub-interpreters and a, currently experimental, free-threading version without the Global Interpreter Lock (GIL).

This talk starts with a short overview over this topic, clarifying terms such parallel, concurrent, and distribute computing as well as CPU-bound, memory-bound, and IO-bound problems. The presentation explains how Python and its standard library support parallel programming tasks. In addition, many Python libraries provide very useful approaches and tools for parallel computing. An overview of important libraries provides guidance which library can be used for what type of parallel problem.

How do Python's new features such as sub-interpreters and free-threading without the Global Interpreter Lock (GIL) impact parallel Programming in Python? This talk address this question by providing examples where these features might help to make programs simpler and/or faster.

High Performance Computing
Room 6
11:00
11:00
30min
LPython: Novel, Fast, Retargetable Python Compiler
Naman Gera

Python is one of the most used languages today, known for its simplicity and versatile ecosystem. For performance applications such as High Performance Computing (HPC) or any other kind of numerical computing the standard CPython implementation is often not fast enough. To address these issues, enter the fascinating world of LPython, a Python compiler designed to give you the best possible performance for numerical, array-oriented code, and can also generate code using multiple backends like LLVM, C, C++, WASM.

High Performance Computing
Room 6
11:00
45min
NumPy's new DType API and 2.0 transition
Sebastian Berg

NumPy 2 had some significant changes in its API and required many downstream libraries and users to adapt.
One of the larger new features is that the new DType API is now public. This C-API allows more powerful user defined DTypes, for which the new StringDType is an example. In the first part, I will give a brief overview of this API.

Since many downstream projects needed to adapt and publish new versions, in the second part I recap the current and past difficulties in transitioning to NumPy 2. This part of the session will be a forum for open discussion to gauge the challenges faced by users in making this transition.

Scientific Applications
Room 5
11:00
30min
forecasting foundation models: evaluation and integration with sktime – challenges and outcomes
Franz Kiraly, Benedikt Heidrich

Foundation models are here for forecasting! This will conclusively solve all forecasting problems with a one-model-fits-all approach! Or … maybe not?

Fact is, an increasingly growing number of foundation models for time series and forecasting hitting the market.

To innocent end users, this situation raises various challenges and questions. How do I integrate the models as candidates into existing forecasting workflows? Are the models performant? How do they compare to more classical choices? Which one to pick? How to know whether to “upgrade”?

At sktime, we have tried so you don’t have to! Although you will probably be forced to anyway, but even then, it’s worth sharing experiences.

Our key challenges and findings are presented in this talk – for instance, the unexpected fragmentation of the ecosystem, difficulties in evaluating the models fairly, and more.

(sktime is an openly governed community with neutral point of view. You may be surprised to hear that this talk will not try to sell you a foundation model)

Machine and Deep Learning
Room 7
11:30
11:30
20min
The Array API Standard in SciPy
Lucas Colley

The array API standard is unifying the ecosystem of Python array computing, facilitating greater interoperability between array libraries, including NumPy, CuPy, PyTorch, JAX, and Dask. Find out how we are using it in SciPy to bring support for hardware-accelerated (e.g. GPU) and distributed arrays to our users, and how you can do the same in your library.

High Performance Computing
Room 6
11:30
20min
The Mission Support System and its use in planning an aircraft campaign
Reimar Bauer

The Mission Support System (MSS) is an open source software package that has been used for planning flight tracks of scientific aircraft in multiple measurement campaigns during the last decade. It consists of many components, a data-retrieval tool chain, a wms server which creates 2-D figures from 4-D meterogical data. A client application for displaying the figures in combination with the planned flight track and other data. For data exchange between participants a collaboration server is used. The talk describes how we used these components for a campaign.

Scientific Applications
Room 7
12:00
12:00
80min
Lunch Break
Room 7
12:00
80min
Lunch Break
Room 6
13:20
13:20
30min
Accelerating Python on HPC with Dask
Jacob Tomlinson

Dask is a popular Python framework for scaling your workloads, whether you want to leverage all of the cores on your laptop and stream large datasets through memory, or scale your workload out to thousands of cores on large compute clusters. Dask allows you to distribute code using familiar APIs such as pandas, NumPy and scikit-learn or write your own distributed code with powerful parallel task-based programming primitives.

In this session we will dive into the many ways to deploy Dask workloads on HPC, and how to choose the right method for your workload. Then we will dig into the accelerated side of Dask and how you can leverage GPUs with RAPIDS and Dask CUDA and use UCX to take advantage of accelerated networking like Infiniband and NVLink.

High Performance Computing
Room 6
13:20
100min
Dispatching, Backend Selection, and Compatibility APIs
Guillaume Lemaitre, Joris Van den Bossche, Tim Head, Erik Welch, Marco Gorelli, Sebastian Berg, Aditi Juneja, Stéfan van der Walt

Scientific python libraries struggle with the existence of several array and dataframe providers. Many important libraries currently mainly support NumPy arrays or pandas dataframes.
However, as library authors we wish to allow users to smoothly use other array provides and simplify for example the use of GPUs without the need for explicit use of cuda enabled libraries.

This session will be split into three related discussions around efforts to tackle this situation:
* Dispatching and backend selection discussion
* Array API adoption progress and discussion
* Dataframe compatibility layer discussion

High Performance Computing
Room 5
13:20
30min
wgpu and pygfx: next-generation graphics for Python
Almar Klein

This talk introduces a new render engine for Python, called pygfx (pronounced "py-graphics"). Its purpose is to bring powerful and reliable visualization to the Python world. Since pygfx is built on wgpu, it has superior performance and reliability compared to OpenGL-based solutions. It is also designed to be versatile: with its modular architecture, one can assemble graphical scenes for diverse applications, ranging from scientific visualization to video games.

Data Science and Visualisation
Room 7
13:55
13:55
30min
Regularizing Python using Structured Control Flow
Valentin Haenel

In this talk we will present applied research and working code to regularize
Python programs using a Structured Control Flow Graph (SCFG). This is a novel
approach to rewriting programs at the source level such that the resulting
(regularized) program is potentially more amenable to compiler optimizations,
for example when using Numba[1] to compile Python. The SCFG representation of
a program is simpler to analyze and thus significantly easier to optimize
because the higher order semantic information regarding the program structure
is explicitly included. This can be of great benefit to many scientific
applications such as High Performance Computing (HPC), a discipline that relies
heavily on compiler optimizations to turn user source code into highly
performant executables. Additionally the SCFG format is a first step to
representing Python programs as Regionalized Value State Dependence Graphs
(RVSDGs). This is another recently proposed program representation which is
expected to unlock even more advanced compiler optimizations at the
Intermediary Representation (IR) level. The talk will cover an introduction to
the theory of SCFGs and RVSDG and demonstrate how programs are transformed. We
will start with simple Python programs containing control-flow constructs and
then show both the SCFG representation and the resulting regularized result to
illustrate the transformations.

High Performance Computing
Room 6
13:55
30min
fastplotlib: A high-level library for ultra fast visualization of large datasets using modern graphics APIs
Kushal Kolar, Caitlin Lewis

Fast interactive visualization remains a considerable barrier in analyses pipelines for large neuronal datasets. Here, we present fastplotlib, a scientific plotting library featuring an expressive API for very fast visualization of scientific data. Fastplotlib is built upon pygfx which utilizes the GPU via WGPU, allowing it to interface with modern graphics APIs such as Vulkan for fast rendering of objects. Fastplotlib is non-blocking, allowing for interactivity with data after plot generation. Ultimately, fastplotlib is a general purpose scientific plotting library that is useful for the fast and live visualization and analysis of complex datasets.

Data Science and Visualisation
Room 7
14:30
14:30
20min
Building optimized packages for conda-forge and PyPI
Wolf Vollprecht, Bas Zalmstra

In this talk we're introducing a new tool to build conda packages. It has been adopted by the conda community and is being rolled out in the widely used conda-forge distribution. The new recipe format has been vetted in multiple Conda Enhancement Proposals (CEPs). We are going to introduce the exciting new features of rattler-build (reproducible builds, high speed build execution, etc.). Using some examples, we will then discuss how you can use rattler-build & conda-forge to build highly optimized packages with SIMD and CUDA support. We will also take a look at cibuildwheel and recent improvements in the PyPI space for CUDA.

High Performance Computing
Room 6
14:30
30min
napari: multi-dimensional image visualization, annotation, and analysis in Python
Grzegorz Bokota, Wouter-Michiel Vierdag

Napari is an interactive n-dimensional image viewer for Python. It is able to rapidly render and interactively visualize almost any array like image data. Additionally, napari can overlay derived data, such as segmentations, points, polygons, surfaces and more. Each of these data exists as a layer in the napari viewer, which allows fine control over how the data is displayed. Furthermore, derived data can be edited. Together with the capability of writing plugins, napari lets you seamlessly weave exploration, computation, and annotation in common and custom image analysis workflows.

Data Science and Visualisation
Room 7
15:00
15:00
30min
Coffee Break
Room 7
15:00
30min
Coffee Break
Room 6
15:30
15:30
30min
Free-threaded (aka nogil) CPython in the Scientific Python ecosystem : status and road ahead
Loïc Estève

CPython 3.13 will be released in October 2024 and has been in beta since May 2024. One of its most awaited features is the possibility to remove the GIL (Global Interpreter Lock) through a compile-time flag.

In this talk we will explain the relevance of free-threaded CPython for the Scientific Python ecosystem, what already works, some of the caveats, and how to try it out on your favourite use case.

In particular we will discuss:
- the historic effort in the scikit-learn project to add Continuous Integration for the nogil fork of CPython 3.9, and the kind of issues that were surfaced
- the ongoing effort in the Scientific Python ecosystem (Numpy, Scipy, scikit-learn, etc ...) to test free-threaded CPython 3.13 and fix issues along the way
- how a typical scikit-learn grid-search use case can benefit from free-threaded CPython
- how to try out free-threaded CPython on your favourite use case
- possible future developments

Machine and Deep Learning
Room 7
15:30
30min
Simulated data is all you need: Bayesian parameter inference for scientific simulators with SBI
Jan Boelts (Teusen)

Simulators play a crucial role in scientific research, but accurately determining their parameters to reproduce observed data remains a significant challenge. Classical parameter inference methods often struggle due to the stochastic or black-box nature of these simulators. Simulation-based inference (SBI) offers a solution by enabling Bayesian parameter inference for simulation-based models: It only requires simulated data as input and returns a posterior distribution over suitable model parameters, including uncertainty estimates and parameter interactions. In this talk, we introduce SBI and present sbi, an open source library that serves as a central resource for SBI practitioners and researchers, offering state-of-the-art SBI algorithms, comprehensive documentation and tutorials.

Scientific Applications
Room 6
16:00
16:00
30min
A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying
Thomas Fraunholz, Tim Köhler

The world of open source computer vision has never been so exciting - and so challenging. With so many options available to you, what's the best way to solve your real world problem? The questions are always the same: Do I have enough data? Which model should I choose? How can I fine-tune and optimize the hyperparameters?

In collaboration with the German Aerospace Center, we investigated these questions to develop a model for quality assurance of CFRP tape laying, with only a small real data set fresh from production. We are very pleased to present a machine learning setup that can empirically answer these questions. Not only for us, but also for you - our setup can easily be transferred to your application!

Dive with us into the world of Open Source machine learning tools that are perfectly tailored for your next project. Discover the seamless integration of Hugging Face Model Hub, DvC and Ray Tune. You'll also gain unique insights into the fascinating world of CFRP tape laying, specifically how well different architectures of open source models perform on our small dataset.

If you want to level up your MLOps game and gain practical knowledge of the latest computer vision models and practices, this talk is a must for you. Don't miss the opportunity, and look forward to your next computer vision projects!

Machine and Deep Learning
Room 7
16:00
30min
Reproducible workflows with AiiDA - The power and challenges of full data provenance
Marnik Bercx, Xing Wang

AiiDA is a workflow manager with a strong focus on reproducibility through automated data provenance. In this talk we discuss what it means to have full “data provenance” for scientific workflows, the advantages it offers, but also the challenges it represents for new users and how we deal with them.

Scientific Applications
Room 6
16:30
16:30
60min
Sprints Orientation + Lightning Talks Day 2
Room 7
17:30
17:30
30min
Closing
Room 7
No sessions on Friday, Aug. 30, 2024.