2.0 -//Pentabarf//Schedule//EN

PUBLISH Z98TJA@@pretalx.com

-Z98TJA

Introduction to Python en

20240826T090000 20240826T103000 1.03000

Introduction to Python

This tutorial will provide an introduction to Python intended for beginners. It will notably introduce the following aspects: - built-in types - controls flow (i.e. conditions, loops, etc.) - built-in functions - basic Python class We introduce here the Python language. You can find the material for the tutorial here: https://github.com/mrastgoo/python-introduction-jupyterlite You can either clone the repository and run the tutorial on your laptop. Or use the following link : https://mrastgoo.github.io/python-introduction-jupyterlite/ to run the tutorial in jupyterlite. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/Z98TJA/ Room 6 Mojdeh Rastgoo PUBLISH UDVD77@@pretalx.com

-UDVD77

Introduction to NumPy en

20240826T110000 20240826T123000 1.03000

Introduction to NumPy

This is a hands-on workshop, please bring a laptop. You can find the installation instructions for the tutorial here: https://github.com/SdgJlbl/numpy-introduction-tutorial#installation-instructions. A back-up online solution will be available if you are not able to install everything locally. Target audience: beginner in the Python scientific ecosystem, some basic knowledge of Python and its tooling are a plus. Agenda: - What is NumPy and when to use it? - 10 min - Workshop set-up - 5 min - Creating and manipulating NumPy arrays - 10 min - Basic indexing - 10 min - Shape and broadcasting - 20 min - Filtering and masking - 15 min - Vectorized operations - 15 min - Wrap-up and key take-away - 5 min PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/UDVD77/ Room 6 Sarah Diot-Girard PUBLISH 8NL9R3@@pretalx.com

-8NL9R3

Introduction to matplotlib for Data Visualization with Python en

20240826T140000 20240826T153000 1.03000

Introduction to matplotlib for Data Visualization with Python

_matplotlib_ is one of the most-used and powerful visualization libraries for Python which comes with a huge feature set and configuration flexibility for users. However, these traits also introduce a good amount of complexity that can be hard to tackle alone. This tutorial aims to help beginners take the inital stage and speed up learning data visualization with _matplotlib_. We will follow along this rough outline: Part 1 // The Basics or How to Talk to _matplotlib_ (45 min) - The matplotlib interfaces: _plt_ vs _fig, ax_ - Visualizing and styling two dimensional data: point and line plots, making titles, labelling axes, choosing colors, ... - Hands-on examples Part 2 // Carrying On the Conversation: More Dimensions and Figures (45 min) - Custom styling with _rcParams_ - Creating multiple figures and sharing axes - Other dimensions: working with three-dimensional and polar data - Hands-on examples PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/8NL9R3/ Room 6 Nefta Kanilmaz PUBLISH ZVBAKK@@pretalx.com

-ZVBAKK

Image analysis in Python with scikit-image en

20240826T160000 20240826T173000 1.03000

Image analysis in Python with scikit-image

This tutorial is aimed at folks who have some experience in scientific computing with Python, but are new to image analysis. We will introduce the fundamentals of working with images in scientific Python. At every step, we will visualize and understand our work using Matplotlib. The tutorial will be split into three parts, of about 30 minutes each: - **Images are just NumPy arrays.** In this section we will cover the basics: how to think of images not as things we can see but numbers we can analyze. - **Changing the structure of images with image filtering.** In this section we will define *filtering*, a fundamental operation on signals (1D), images (2D), and higher-dimensional images (3D+). We will use filtering to find various structures in images, such as *blobs* and *edges*. - **Finding regions in images and measuring their properties.** In this section we will define *image segmentation* — splitting up images into regions. We will show how segmentation is commonly represented in the scientific Python ecosystem, some basic and advanced methods to do it, and use it to make object measurements. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/ZVBAKK/ Room 6 Marianne Corvellec Lars Grüter Stéfan van der Walt PUBLISH WZQXUY@@pretalx.com

-WZQXUY

What is the magic of magic methods in the Python language? en

20240826T090000 20240826T103000 1.03000

What is the magic of magic methods in the Python language?

Python allows you to equip created classes with special methods, also known as magic methods or dunder methods. To recognize a special method, you should know that it is a method whose name begins and ends with a double underscore. But it's not the name that their magic lies in, because these methods have a special meaning for Python. Python calls magic methods in response to fundamental operations, such as creating class instances, indexing sequences, comparing objects, managing attribute access, and more, so knowing how to create them is fundamental for any Pythonista. During this tutorial, you’ll: * Find out what Python magic methods are, * Understand the magic behind Python magic methods, * Customize various behaviors of classes using magic method. To fully benefit from this tutorial, you should be familiar with object-oriented programming in Python. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/WZQXUY/ Room 5 Paweł Żal PUBLISH BCAUKU@@pretalx.com

-BCAUKU

Decorators - A Deep Dive en

20240826T110000 20240826T123000 1.03000

Decorators - A Deep Dive

Python offers decorator to implement re-usable code for cross-cutting task. The support the separation of cross-cutting concerns such as logging, caching, or checking of permissions. This can improve code modularity and maintainability. This tutorial is an in-depth introduction to decorators. It covers the usage of decorators and how to implement simple and more advanced decorators. Use cases demonstrate how to work with decorators. In addition to showing how functions can use closures to create decorators, the tutorial introduces callable class instance as alternative. Class decorators can solve problems that use be to be tasks for metaclasses. The tutorial provides uses cases for class decorators. While the focus is on best practices and practical applications, the tutorial also provides deeper insight into how Python works behind the scene. After the tutorial participants will feel comfortable with functions that take functions and return new functions ## Audience This tutorial is for intermediate Python programmers who want to dive deeper. Solid working knowledge of functions and classes basics is required. ## Outline * Examples of using decorators * from the standard library * from third-party packages * Closures for decorators * Write a simple decorator * Best Practice * Use case: Caching * Use case: Logging * Parameterizing decorators * Chaining decorators * Callable instances instead of functions * Use case: Argument Checking * Use case: Registration * Class decorators * Wrap-up and questions PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/BCAUKU/ Room 5 Mike Müller PUBLISH UNYV7V@@pretalx.com

-UNYV7V

Probabilistic classification and cost-sensitive learning with scikit-learn en

20240826T140000 20240826T153000 1.03000

Probabilistic classification and cost-sensitive learning with scikit-learn

Detailed outline of the tutorial: - Introduction - Evaluting ML based predictions with: - ranking metrics, - probabilistic metrics, - decision metrics. - Proper scoring losses and their decomposition in: - calibration loss, - grouping loss, - irreducible loss. - Part I: Probabilistic classification - The calibration curve - Possible causes of miscalibration - Model misspecification - Overfitting and bad level of regularization - Possible ways to improve calibration - Non-linear feature engineering to avoid misspecification - Post-hoc calibration with Isotonic regression - Tuning parameters and early stopping with a proper-scoring rule - Part II: Optimal decision making under uncertainty - Defining a custom business cost functions - Individual-specific cost functions - Setting the Elkan-optimal threshold with `FixedThresholdClassifier` - Cost-sensitive learning for arbitrary cost functions with `TunedThresholdClassifierCV` - Predict-time decision threshold optimization. This tutorial will be delivered as a set of publicly available Jupyter notebooks under an open source license. We will mostly use components of the latest version of the scikit-learn library + a few custom extensions. The tutorial material is available at the following URL: https://github.com/probabl-ai/calibration-cost-sensitive-learning PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/UNYV7V/ Room 5 Olivier Grisel Guillaume Lemaitre PUBLISH 89KK7L@@pretalx.com

-89KK7L

Using the Array API to write code that runs with Numpy, Cupy and PyTorch en

20240826T160000 20240826T173000 1.03000

Using the Array API to write code that runs with Numpy, Cupy and PyTorch

There are many Python libraries to choose from for numerical computing, data science, machine learning and deep learning. A downside of this diversity is that the API of each of these array libraries is subtly different. This makes it hard to write code that works with more than one array type. As a result taking advantage of modern hardware, like a GPU, is hard because you need to handle the differences between Numpy and an array library that supports GPUs. The Array API standard aims to solve this problem by providing a common API that all compatible libraries support. This means that you can write code that works no matter what array library is used. And, because there are array libraries with GPU support, it means you can write Python code that works on CPUs and GPUs! This workshop will be hands on! This means you need to bring a laptop that has at least Numpy and PyTorch installed on it. This will let you experience that your code works with either of these libraries. To see the effect of using a GPU you will also need to either have a laptop that has one or use a service like Google colab. After a brief introduction and demo we will tackle one or two applications that can be implemented in Numpy and then modified to use the Array API. By the end of this workshop you will know how to take Numpy code and modify it so that it is compatible with the Array API. You will be well equipped to modify existing libraries in the PyData ecosystem or write your own applications. **Material: https://github.com/betatim/sound-array-api-tutorial** PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/89KK7L/ Room 5 Tim Head Sebastian Berg PUBLISH MPMRUZ@@pretalx.com

-MPMRUZ

Introduction to Polars: Fast and Readable Data Analysis en

20240827T090000 20240827T103000 1.03000

Introduction to Polars: Fast and Readable Data Analysis

Polars is a new, lightning-fast library for analyzing structured data. The library focuses on processing speed and a consistent and intuitive API. Its syntax supports transformations like selection, filtering, and aggregation with dedicated and powerful expressions. Polars does lazy evaluation out-of-the-box with an advanced query planner. In this tutorial, you'll learn how you can manipulate your data with Polars. You'll start by reading existing data into a Polars DataFrame and learn how to use _tidy_ principles to organize your analysis workflow. After learning the basics of Polars, you'll start exploring Polars' lazy interface, which is where the library really shines. With the lazy API, queries are only executed when the results are needed. This can improve performance significantly, as Polars can take advantage of several different optimizations. Throughout the tutorial, you'll gain experience working lazily. You'll learn how to inspect the optimized query plan and how to play to the library's strengths. This tutorial is for anyone curious about Polars. You don't need previous experience with other libraries like pandas, but if you have used pandas earlier, you'll learn how Polars is different and how the libraries can play nicely together. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/MPMRUZ/ Room 6 Geir Arne Hjelle PUBLISH XZVGDB@@pretalx.com

-XZVGDB

Using Wikipedia as a language corpus for NLP en

20240827T110000 20240827T123000 1.03000

Using Wikipedia as a language corpus for NLP

In this tutorial you will learn where to find the Wikipedia dumps, how to use Python’s built-in XML parser together with a MediaWiki syntax parser (mwparserfromhell) to extract raw text from Wikipedia articles. We will also discuss the difference between streaming and in-memory parsers, and why the former are better for parsing huge amounts of data. We will discuss the typical NLP stream, and as an example of additional steps needed in inflected languages, we will use a morphological analyser to lemmatise words sourced from polish language Wikipedia to calculate their frequencies. As an example application we will compare such statistics with a Polish language corpus available in Python’s NLTK library (“pl196x” module, with the IPI PAN corpus of polish language of the 1960s) and show lexical differences between both corpora. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/XZVGDB/ Room 6 Jakub B. Jagiełło PUBLISH WVZPXM@@pretalx.com

-WVZPXM

Introduction to Machine Learning with scikit-learn and Pandas en

20240827T140000 20240827T153000 1.03000

Introduction to Machine Learning with scikit-learn and Pandas

With Machine Learning becoming a topic of high interest in the scientific community, over the years, many different programming languages and environments have been used for Machine Learning research and system development. Python is known as easy to learn, yet powerful programming languages and has become a popular choice among professionals and amateurs. This tutorial will provide instructions on the usage of two popular Python libraries: Scikit-learn and Pandas, in Machine Learning modeling. The tutorial includes: - data preparation for ML modeling - introduction of basic ML models - implementation of a basic ML model PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/WVZPXM/ Room 6 Justyna Szydłowska-Samsel PUBLISH UF7LM8@@pretalx.com

-UF7LM8

A Hitchhiker's Guide to Contributing to Open Source en

20240827T160000 20240827T173000 1.03000

A Hitchhiker's Guide to Contributing to Open Source

In this tutorial, we will introduce the process of contributing to open source projects. The workshop will begin by covering basic Git commands, and we will demonstrate how to interact with GitHub, fork and clone repositories. Moreover, we will discuss how to implement functions in a Pythonic way, how to document and test them. We will also cover topics such as online documentation and Continuous Integration (CI) systems. For each part, there will be associated exercises that will ultimately lead participants to contribute to a repository we have set up for the workshop. Participants can choose what to work on from a list of issues. These issues will include implementing functions, testing current functions, documenting current functions, and correcting typos in online documentation material. Finally, we will discuss and demonstrate, through the online repository we will have set up, how to interact with the open-source community, look for guidelines to follow styles, and testing guidelines. To also familiarize participants with the process of reviewing and commenting, we will comment on and provide feedback to the pull requests opened by the participants during the workshop as well as in the following days. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/UF7LM8/ Room 6 Nikoleta E. Glynatsi Sebastian Berg PUBLISH BYESWT@@pretalx.com

-BYESWT

Building robust workflows with strong provenance en

20240827T090000 20240827T103000 1.03000

Building robust workflows with strong provenance

Have you ever built a computational script for running calculations and lost track of the data you produced? Have you submitted your script to a high-performance cluster (HPC) and your job failed so you needed to restart the whole workflow? Did you want to streamline the production and access of computational experiment results? By writing your workflow in AiiDA, intermediate and final results are stored in a structured manner in a database. In addition, you can restart from the last checkpoint and reuse results from duplicated calculations via caching. As such, AiiDA not only helps you with your personal data management, but also enables easy sharing with other collaborators. This is a hands on session which is structured in the following way: Part 1: Introduction to AiiDA - what problems can it help you to solve (20 mins) - Provenance, a robust solution for process management and data traceability - Scalability, interoperability, and high-throughput performance Part 2: How to quickly create a workflow from a set of executables (40 mins) - Quickly set up a running instance - Concatenating several scripts to one workflow - Parsing output files to filter out meaningful results from outputs Part 3: How to create more complex workflows (30 mins) - Implementing concurrent jobs in graph-like dependencies - Generate on-the-fly a workflow from input - Querying results from the AiiDA database By the end of the tutorial, you will have learned how to use AiiDA to quickly create workflows that leverage its restart and caching capabilities. You will learn how to implement workflows with graph-like dependencies to run their calculations concurrently, and how to access and share their results. You can follow this tutorial by using the development environment provided by https://nanohub.org/tools/aiida.. Because nanohub changes the path when making the environment publically available, you need to run the following command in one of the jupyter cells to run notebook 2 and 3 ``` !echo "export PATH=$PATH:$(realpath ../../data/euro-scipy-2024/diag-wf):$(realpath ../../data/euro-scipy-2024/diag-wf/bin/default)" >> ~/.bash_profile ``` The support thread for the tutorial on Discourse can be found at the following link: https://aiida.discourse.group/t/euroscipy-2024-support/456 PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/BYESWT/ Room 5 Alexander Goscinski Julian Geiger Ali Khosravi PUBLISH 7SKUEN@@pretalx.com

-7SKUEN

Combining Python and Rust to create Polars Plugins en

20240827T110000 20240827T123000 1.03000

Combining Python and Rust to create Polars Plugins

Have you ever had the experience of needing to write a really custom function? Did you end up using a custom Python lambda function and waiting endlessly whilst your code executed? Learn how to put an end to that! This tutorial is aimed at advanced dataframe users who want to go beyond what Polars offers them. The structure will be: - 5 minutes motivation: example of a custom function which is painfully slow - 30 minutes: the bare minimum Rust you need to know in order to write a Polars plugin - 20 minutes: let's get something running! Starting from a cookiecutter template, let's glue pieces together and get a simple "pig-latinnifier" running - 25 minutes: customising the basic "pig-latinnifier" to implement that same custom function as a plugin - 5 minutes: let's glue things together, run the plugin, and observe how much faster it is! - 5 minutes: assorted requests / Q&A This may look ambitious - however, I have taught Polars Plugins professionally and have given talks about the topic before, so I'm confident that it is doable. By the end of the session, attendees will know how to write their own Polars Plugin. This talk is aimed at data practitioners who have experience with Python and data analysis (however, no prior Rust experience is required!). If you want to follow the tutorial on your own laptop, then you will need to come prepared with the following installed: - Rust (see https://rustup.rs/) - an IDE, ideally with the Rust Analyzer installed - a Python3.9+ virtual environment, in which you should install Polars and Maturin If you can follow the instructions at https://github.com/MarcoGorelli/cookiecutter-polars-plugins, you'll be off to a flying start! PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/7SKUEN/ Room 5 Marco Gorelli PUBLISH 8WL8GX@@pretalx.com

-8WL8GX

Multi-dimensional arrays with Scipp en

20240827T140000 20240827T153000 1.03000

Multi-dimensional arrays with Scipp

In this tutorial we will cover various key features of Scipp, a python library with a C++ core. During the tutorial we will go through multiple notebooks with hands on exercises. Part A (30 mins) - Introduction to general concepts in scipp - Basic data structures in scipp Part B (30 mins) - Binning of data and computation on top of it. - Tips and tricks of data analysis on top of multi dimensional arrays. Part C (30 mins) - Visualizing scipp data with plopp. - Interop with the wider scientific python ecosystem. - File I/O. By the end of the tutorial we hope that participants will be comfortable with using the scipp API to model and analyze their data. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/8WL8GX/ Room 5 Mridul Seth PUBLISH DF3VHU@@pretalx.com

-DF3VHU

sktime - python toolbox for time series – introduction and new features 2024: foundation models, deep learning backends, probabilistic models, hierarchical demand forecasting, marketplace features en

20240827T160000 20240827T173000 1.03000

sktime - python toolbox for time series – introduction and new features 2024: foundation models, deep learning backends, probabilistic models, hierarchical demand forecasting, marketplace features

The tutorial gives an up-to-date introduction to sktime base features with a focus on forecasting, model building, hierarchical and global data, and marketplace features. It showcases a selection of new and exciting features 2024: - Integrations for foundation models, pre-trained or fine-tuned deep learning models, hugging face connector - global forecasting interfaces, building parallelizable pipelines for hierarchical data sets with level individual models and autoML - Probabilistic models, distribution prediction, reduction to tabular probabilistic regression - New developer marketplace patterns for developing and registering API compatible estimators with the sktime estimator search and discoverability tools sktime is developed by an open community, with aims of ecosystem integration in a neutral, charitable space. We welcome contributions and seek to provides opportunity for anyone worldwide. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2024/talk/DF3VHU/ Room 5 Franz Kiraly Felipe Angelim Muhammad Armaghan Shakir Benedikt Heidrich PUBLISH JFATCJ@@pretalx.com

-JFATCJ

10 Years of Open Source: Navigating the Next AI Revolution en

20240828T090000 20240828T100000 1.00000

10 Years of Open Source: Navigating the Next AI Revolution

A lot has been happening in the field of AI and Natural Language Processing: there's endless excitement about new technologies, sobering post-hype hangovers and also uncertainty about where the field is heading next. In this talk, I'll share the most important lessons we've learned in 10 years of working on open source software, our core philosophies that helped us adapt to an ever-changing AI landscape and why open source and interoperability still wins over black-box, proprietary APIs PUBLIC CONFIRMED Keynote https://pretalx.com/euroscipy-2024/talk/JFATCJ/ Room 7 Ines Montani PUBLISH PFVX9L@@pretalx.com

-PFVX9L

Federated Learning: Where we are and where we need to be en

20240828T103000 20240828T110000 0.03000

Federated Learning: Where we are and where we need to be

This talk will introduce: - Aspects of federated learning that are important for real world use cases - Federated learning libraries available via open-source - An evaluation of federated learning open-source libraries - A gap analysis of potential problems when leveraging open-source for real world use cases - Suggestions for navigating this gap and building supporting libraries or new open-source solutions to address the discovered problems PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/PFVX9L/ Room 7 Katharine Jarmul PUBLISH UXHSQC@@pretalx.com

-UXHSQC

Helmholtz Blablador and the LLM models' ecosystem en

20240828T110500 20240828T113500 0.03000

Helmholtz Blablador and the LLM models' ecosystem

In the ever-evolving world of machine learning, the Helmholtz Foundation's Blablador stands out as an open LLM inference server/service for the academic community. This talk discusses Blablador and its role in hosting both open-source LLM models, models developed by the academic community in general, and those developed in-house at the Juelich Supercomputing Centre (JSC). Blablador not only supports a wide range of models but also provides a robust platform for researchers and developers to experiment, collaborate, and innovate. As a result, Blablador has significantly contributed to the growth and advancement of the LLM models' ecosystem. This talk will look at the architecture and functionality of Blablador, its integration with the JSC, and its impact on the LLM models' ecosystem. We explore Blablador's role in fostering collaboration and innovation in the machine learning community. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/UXHSQC/ Room 7 Alexandre Strube PUBLISH CBSXQN@@pretalx.com

-CBSXQN

Data augmentation with Scikit-LLM en

20240828T114000 20240828T120000 0.02000

Data augmentation with Scikit-LLM

Scikit-learn is one of the most well-known and widely-used open-source Python libraries in the field of machine learning by data scientists due to its wide range of models and friendly use. You are able to solve any task, from regression to classification, from clustering to dimensionality reduction, using just one library. Scikit-LLM is a Python library that embodies large language models into the scikit-learn framework. It’s a tool to perform natural language processing (NLP) tasks all within the Scikit-Learn pipeline. The features provided by Scikit-LLM are -Zero-Shot Text Classification -Few-Shot Text Classification -Dynamic Few-Shot Text Classification -Multi-Label Zero-Shot Text Classification -Text Vectorization -Text Translation -Text Summarization. Will be presented an use case of data augmentation for flood event from the US storm events database using zero-shot text classification and embeddings techniques. PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/CBSXQN/ Room 7 Claudio G. Giancaterino PUBLISH MFF7GE@@pretalx.com

-MFF7GE

Skrub: prepping tables for machine learning en

20240828T132000 20240828T135000 0.03000

Skrub: prepping tables for machine learning

When it comes to designing machine learning predictive models, it is reported that data scientists spend over 80% of their time preparing the data to input to the machine learning algorithm. Currently, no automated solution exists to address this problem. However, the `skrub` Python library is here to alleviate some of the daily tasks of data scientists and offer an integration with the `scikit-learn` machine learning library. In this talk, we provide an overview of the features available in `skrub`. First, we focus on the preprocessing stage closest to the data sources. While predictive models usually expect a single design matrix and a target vector (or matrix), in practice, it is common that data are available from different data tables. It is also possible that the data to be merged are slightly different, making it difficult to join them. We will present the `skrub` joiners that handle such use cases and are fully compatible with `scikit-learn` and its pipeline. Then, another issue widely tackled by data scientists is dealing with heterogeneous data types (e.g., dates, categorical, numerical). We will present the `TableVectorizer`, a preprocessor that automatically handles different types of encoding and transformation, reducing the amount of boilerplate code to write when designing predictive models with `scikit-learn`. Like the joiner, this transformer is fully compatible with `scikit-learn`. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/MFF7GE/ Room 7 Guillaume Lemaitre Vincent Maladiere Jérôme Dockès PUBLISH 8NJGVH@@pretalx.com

-8NJGVH

From data analysis in Jupyter Notebooks to production applications: AI infrastructure at reasonable scale en

20240828T135500 20240828T141500 0.02000

From data analysis in Jupyter Notebooks to production applications: AI infrastructure at reasonable scale

While there is certainly no shortage of tutorials on how to build AI applications in a Jupyter notebook, it can be challenging to move from proof-of-concepts to reliable and reproducible data analyses used for data-driven decisions, or production-grade applications. The presentation discusses architectural decisions in a Python-based environment to bridge this gap at typical scales in academia and industry. Splitting the system into smaller composable building blocks provides reproducibility, more rapid development, and more efficient use of available resources, and has enabled MDPI to leverage AI at multiple stages of the business process. The concepts presented in the talk apply to a wide range of applications. PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/8NJGVH/ Room 7 Frank Sauerburger PUBLISH HCMV78@@pretalx.com

-HCMV78

A Qdrant and Specter2 framework for tracking resubmissions of rejected manuscripts in academia en

20240828T142500 20240828T144500 0.02000

A Qdrant and Specter2 framework for tracking resubmissions of rejected manuscripts in academia

Understanding what happens to rejected manuscripts is crucial in academic publishing. We developed a system to track whether rejected manuscripts are later published in competing journals using advanced machine-learning techniques. By extracting rejected manuscript embeddings with the Specter2 model and storing them in a vector database, we compare these with published articles, focusing on title and abstract similarities. Author similarity and other checks ensure accurate identification despite author name variations. Our system generates two key scores: manuscript similarity and author similarity. A machine learning approach classifies papers as the same or different, with thresholds fine-tuned through manual labelling and scatter plot analysis. This approach combines AI, data science and analytics, providing valuable insights into resubmission patterns and enhancing our understanding of academic publishing dynamics. PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/HCMV78/ Room 7 Daniele Raimondi PUBLISH CETWRS@@pretalx.com

-CETWRS

From stringly typed to strongly typed: Insights from re-designing a library to get the most out of type hints en

20240828T103000 20240828T110000 0.03000

From stringly typed to strongly typed: Insights from re-designing a library to get the most out of type hints

Most libraries in the scientific Python ecosystem are "stringly typed". For example, many packages allow users to switch between multiple algorithms or methods by passing a string with the method name; Advanced configuration often relies on dictionaries of keyword arguments. "Stringly typed" libraries are easy to use for beginners and convenient to implement for library authors. However, they miss out on several benefits of static typing. - Push errors left: Instead of exceptions at runtime, many errors can be discovered by an IDE, type checker, or other automated tools. - Autocomplete: An IDE shows all available arguments and options and their types. This serves as quick documentation and saves typing (and typos). - Robustness: Static analysis provides an additional safety layer on top of unit tests. To achieve these benefits, it is not enough to annotate existing code with type hints. Instead, the code has to be redesigned from the ground up with static typing in mind. This talk shares insights from redesigning "optimagic" (a Python package for numerical optimization) to get the most out of type hints. While preserving backward compatibility, we establish new recommended ways of using the library. Users who adopt the new workflow benefit immediately from better autocomplete, better static analysis, and fewer runtime errors. Most importantly, we do not compromise on simplicity for beginner users. While the talk focuses on benefits for users, maintainers and library authors also benefit via improved maintainability and extensibility. The lessons learned from the redesign are not specific to the case study. They apply to any package in the scientific ecosystem that lets users select between multiple algorithms or methods, all of which have different optional arguments. Prime examples are plotting libraries, ODE solvers, Numerical Integration packages, and many more. Moreover, they are relevant for anyone adopting type hints in large existing codebases. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/CETWRS/ Room 6 Janos Gabler PUBLISH QLVBYY@@pretalx.com

-QLVBYY

Understanding NetworkX's API Dispatching with a parallel backend en

20240828T110500 20240828T113500 0.03000

Understanding NetworkX's API Dispatching with a parallel backend

## Ideal flow In the first few minutes, we will familiarize ourselves with the NetworkX library and graph analysis in general. Subsequently, we will delve into the growing demand for faster graph algorithms driven by numerous PRs proposing to include parallel implementations for various algorithms in NetworkX. Moving on to a quick demo of the performance limitations of existing NetworkX algorithms such as `betweenness_centrality` and `square_clustering` when applied to larger SNAP graph datasets, underscoring the critical necessity for more efficient implementations. Balancing this need for more efficient and faster algorithms with the core values of NetworkX-- to remain free from any external dependencies and to uphold its pure Python nature, led to the development of a new backend dispatching system in NetworkX. Next, we will understand what `entry_points` are and how NetworkX utilizes them to discover the backend packages and then redirect the NetworkX function calls to the backend implementations according to the specified backend. I will then delve into the details of the features a backend developer can utilize, like how they can use NetworkX's existing testing suite to test their backend just by setting the `NETWORKX_TEST_BACKEND` environment variable. Using the nx-parallel backend as an example, I will explain these implementation details and features. Finally, I will provide a brief demo on building your own backend. In the last few minutes, we will get to know the nx-parallel backend that runs NetworkX's single-core graph algorithms on multiple CPU cores using [joblib](https://joblib.readthedocs.io/en/stable/generated/joblib.Parallel.html). We will discuss its implementation details and observe the parallel processes running concurrently and their CPU core usage distribution in the Activity Monitor. Next, we will explore the concept of chunking, and learn how and when it helps in parallel computing, particularly in the context of graph algorithms. Many algorithms in nx-parallel are generator functions, so we’ll also go over how chunking is done for generator functions. In the end, we’ll engage in a quick demo comparing nx-parallel and NetworkX using a large graph dataset with custom chunking enabled. We will end with a summary of other NetworkX backends and some future to-dos for NetworkX’s backend dispatching and the nx-parallel backend. And then finally conclude with an interactive Q&A. Refer to the linked poster below for more! Thank you :) PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/QLVBYY/ Room 6 Aditi Juneja Erik Welch PUBLISH SPUZPK@@pretalx.com

-SPUZPK

Enhancing Bayesian Optimization with Ensemble Models for Categorical Domains en

20240828T114000 20240828T120000 0.02000

Enhancing Bayesian Optimization with Ensemble Models for Categorical Domains

Bayesian optimization (BO) is a powerful method for optimizing black-box, costly-to-evaluate functions, with applications across various fields. These include hyperparameter tuning for complex machine learning models, designing better-tasting beverages, geological carbon sequestration, and developing new chemical products. BO algorithms rely on two key components: a probabilistic model and an acquisition function. The probabilistic model predicts the target variable's distribution at each point in the predictor space. At the same time, the acquisition function scores these distributions to guide the selection of the following evaluation points, aiming for efficient optimization. The Gaussian Process (GP) is BO’s most popular probabilistic model. However, GPs have limitations. They struggle with functions defined on categorical or mixed domains, making them less effective when numerous categorical inputs are involved. Optimizing with GPs requires a careful choice of likelihood, kernel functions, and priors, posing a risk of mismodeling. Additionally, GPs can’t natively describe functions with conditional spaces, their training time increases polynomially with the number of training samples, and the popular implementation of BO with GP (BoTorch) does not support discontinuous polytope inference. An intelligent choice of probabilistic model can address these limitations. In this talk, we benchmark tree-based ensemble probabilistic models against GPs on several corner cases. We explore constrained optimization on mixed domains and functions with conditional spaces. We compare training and inference times, scaling with dimensionality, and optimization performance. We consider existing solutions, such as GPyOpt with sklearn's RandomForestRegressor, and our implementation integrating BO with the BoTorch library using probabilistic models from the XGBoostLSS framework. Through practical examples, we will demonstrate the effectiveness of tree-based probabilistic models for BO and showcase how our approach can unlock new possibilities for optimization in real-world applications. PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/SPUZPK/ Room 6 Ilya Komarov PUBLISH BXNEY8@@pretalx.com

-BXNEY8

The joys and pains of reproducing research: An experiment in bioimaging data analysis en

20240828T132000 20240828T135000 0.03000

The joys and pains of reproducing research: An experiment in bioimaging data analysis

In this presentation, I would like to share my personal experience trying to reproduce a bioimaging data analysis workflow. The starting point was my fascination with the live imaging of developing mouse embryos in the accompanying videos of a 2018 research paper [1]. These incredible images show the organism's individual cells (nuclei being marked with a fluorescent protein) dividing and migrating over time, in three spatial dimensions. The authors of this research developed not only imaging but also analysis tools, in order to achieve accurate cell segmentation, cell tracking, detection of cell divisions, registration across multiple embryos, and more. As a maintainer of scikit-image, and just for starters, I was curious to try out some of our classical (as opposed to machine-learning) segmentation workflows on these 3D biomedical images. After all, the paper is open access, it comes with supplementary materials which point to data and software repositories... and its authors are kind and helpful! The latter point proved to be extremely important, especially when it came to getting and using the data. I would like to commend these researchers, who are so conscientious and generous in their sharing. In April 2023, the scikit-image team selected two Outreachy interns [2] to work on two different projects, including one on narrative documentation to expand our gallery with biomedical examples. The additional workforce was welcome, considering the challenges we encountered every step along the way. The datasets are available in the Image Data Resource (IDR) [3], which is fairly standard practice in the field, but it turns out that downloading data from the IDR is not trivial [4, 5, 6 and references therein]. For long-term cell tracking, the authors of [1] developed a framework named TGMM (Tracking with Gaussian Mixture Models). If you pass it a single frame, it computes a cell segmentation for this frame. Admittedly, the question of comparing this segmentation result with another one, say, obtained in pure Scientific Python, goes beyond that of reproducibility. We present a segmentation workflow based on SciPy and scikit-image, and compare its results with the TGMM's (although this would rather fall under the "Scientific Applications" track). The datasets are available in KLB format [7], which is not much in use anymore. Nowadays, a similar study would probably publish their data in the Zarr format, which is popular in bioimaging [8]. This would make it much easier to load the data into various analysis tools. For example, the `dask.array.from_zarr` function [9] might be convenient to work in Python. But is it fair to ask that the data be available in Zarr, just because the ecosystem has changed since 2018? Who would take care of converting (large) published datasets, assuming there is an actual demand for re-using them? To read a dataset in Python, I used the `pyklb` package [10], which provides a Python wrapper for the KLB file format but is not maintained anymore. Unsurprisingly, I had to use a custom-made environment just to be able to install `pyklb`. The documentation of all the tweaking that ensued lives in GitHub issues [11] for now. I will compile it, along with the rest of my logbook, and share it in a dedicated repository [12], so the reproducibility loop is complete. Getting the TGMM software to run on my PC proved painful but ultimately possible (which made it almost joyful)! Research-style documentation tends to be scattered and out-of-date (e.g., [13]), which is completely understandable: In research groups, people and funding cycles come and go, so who really would be able to maintain software published (shared) as part of a reproducible study? For exemple, will these tiny pull requests [14] ever be seen, let alone merged? Should there be some kind of community responsibility here? Shifting the perspective, should we consider that research software is significantly different from 'regular' software, in the sense that it goes through an indefinite code freeze from the moment it is 'released?' It seems fair (and it is sufficient for reproducibility) that, for a given study, research artifacts would be published as a snapshot only. Note that the TGMM software provides Docker images [15]. This presentation definitely brings more questions than it offers answers. We look forward to hearing what the audience could share in terms of ideas, resources, and experiences. We would love to know of other attempts at reproducing published open research. We suspect there is a special scenario in which the person reproducing the work is the future self of the original author! [1] https://doi.org/10.1016/j.cell.2018.09.031 [2] https://www.outreachy.org/ (accessed 2024-05-25) [3] "The Image Data Resource (IDR) is a public repository of image datasets from published scientific studies, where the community can submit, search and access high-quality bio-image data." https://idr.openmicroscopy.org (accessed 2024-05-25) [4] https://idr.openmicroscopy.org/about/download.html (accessed 2024-05-25) [5] https://github.com/IDR/idr.openmicroscopy.org/pull/193 (accessed 2024-05-25) [6] https://forum.image.sc/t/permission-denied-when-trying-to-download-idr-openmicroscopy-at-fasp/88907 (accessed 2024-05-25) [7] KLB 2.0 block-based lossless image compression file format https://bitbucket.org/fernandoamat/keller-lab-block-filetype (accessed 2024-05-25) [8] https://gerbi-gmb.de/2023/10/02/next-generation-file-formats-for-bioimaging/ (accessed 2024-05-25) [9] https://docs.dask.org/en/latest/generated/dask.array.from_zarr.html (accessed 2024-05-25) [10] https://github.com/bhoeckendorf/pyklb (accessed 2024-05-25) [11] https://github.com/bhoeckendorf/pyklb/issues/11 (accessed 2024-05-25) [12] https://github.com/mkcor/repro-tgmm (accessed 2024-05-25) [13] https://bitbucket.org/fernandoamat/tgmm-paper/src/master/doc/ (accessed 2024-05-25) [14] https://bitbucket.org/fernandoamat/tgmm-paper/pull-requests/ (accessed 2024-05-25) [15] https://bitbucket.org/fernandoamat/tgmm-paper/src/master/doc/new/docs/user-guide/docker.md (accessed 2024-05-25) PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/BXNEY8/ Room 6 Marianne Corvellec PUBLISH UGJ3HQ@@pretalx.com

-UGJ3HQ

Mostly Harmless Fixed Effects Regression in Python with PyFixest en

20240828T135500 20240828T142500 0.03000

Mostly Harmless Fixed Effects Regression in Python with PyFixest

This session introduces PyFixest, an open source Python library inspired by the "fixest" R package. PyFixest implements fast routines for the estimation of regression models with high-dimensional fixed effects, including OLS, IV, and Poisson regression. The library also provides tools for robust inference, including heteroscedasticity-robust and cluster robust standard errors, as well as the wild cluster bootstrap and randomization inference. Additionally, PyFixest implements several routines for difference-in-differences estimation with staggered treatment adoption. PyFixest aims to faithfully replicate the core design principles of "fixest", offering post-estimation inference adjustments, user-friendly syntax for multiple estimations, and efficient post-processing capabilities. By making efficient use of jit-compilation, it is also one of the fastest solutions for regressions with high-dimensional fixed effects. The presentation will argue why there is a need for another regression package in Python, cover PyFixest's functionality and design philosophy, and discuss future development prospects. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/UGJ3HQ/ Room 6 Alexander Fischer PUBLISH NH7LGF@@pretalx.com

-NH7LGF

Conformal Prediction with MAPIE: A Journey into Reliable Uncertainty Quantification en

20240828T142500 20240828T144500 0.02000

Conformal Prediction with MAPIE: A Journey into Reliable Uncertainty Quantification

-Advantages and Fundamentals concepts of Conformal Prediction Uncertainty is an inherent aspect of real-world data, and accurate quantification is vital for making informed decisions. Conformal Prediction offers a principled approach to estimate the uncertainty associated with predictions, providing users with more reliable and actionable insights. -Types of conformal predictors Not all conformal predictors are created equal. I'll give an introduction of different types of CP predictors. -MAPIE python library I'll present MAPIE (Model Agnostic Prediction Interval Estimator), a Python library that simplifies the implementation of Conformal Prediction. -Practical example on tabular data To bring theory into practice, I'll walk through an use case using tabular data. PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/NH7LGF/ Room 6 Claudio G. Giancaterino PUBLISH QMV8P3@@pretalx.com

-QMV8P3

OpenGL is dying, let's talk about WebGPU en

20240828T103000 20240828T111500 0.04500

OpenGL is dying, let's talk about WebGPU

I'll talk about the problems that plague OpenGL, and how modern API's like Vulkan, Metal, and DirectX12 solve these. Since these API's are still very low level, we need abstractions to provide a friendlier API. The most promising abstraction is WebGPU, which also comes with a corresponding C-API (webgpu.h). The wgpu-py library provides a Python wrapper. This talk is for anyone who wants to learn more about the current state of GPU API's and/or is interested in wgpu / WebGPU. I'll briefly discuss the GPU programming model, but a basic understanding of GPU's is assumed. Apart from obvious applications in graphics and visualization, there are also opportunities to use wgpu for compute tasks, i.e. as an alternative to Cuda. I'll keep the talk relatively short, so we have plenty of time for questions and discussions. PUBLIC CONFIRMED Maintainer track https://pretalx.com/euroscipy-2024/talk/QMV8P3/ Room 5 Almar Klein PUBLISH SYXTMZ@@pretalx.com

-SYXTMZ

Scientific Python en

20240828T111500 20240828T120000 0.04500

Scientific Python

The scientific Python ecosystem comprises foundational libraries like NumPy and SciPy, technique-specific libraries like scikit-learn, NetworkX, and scikit-image, and domain-specific libraries such as PyHEP and AstroPy. The Scientific Python project is an effort to better coordinate and support the community of scientific Python ecosystem developers. In this interactive talk, we give project updates, and invite the community to become more involved in our joint efforts to improve the ecosystem for its developers. PUBLIC CONFIRMED Maintainer track https://pretalx.com/euroscipy-2024/talk/SYXTMZ/ Room 5 Jarrod Millman Stéfan van der Walt PUBLISH LMSJ8Z@@pretalx.com

-LMSJ8Z

[CHANGE OF PROGRAM] Informal discussions about switching build backends en

20240828T135500 20240828T144500 0.05000

[CHANGE OF PROGRAM] Informal discussions about switching build backends

Goals: - Share tips, tricks and best practices for configuring the build backend of a Python package with compiled (Cython/C/C++/Rust/Fortran) code - Identify shared needs between packages, and discuss gaps in current build backends, documentation, or shared infrastructure Topics: - Goals to aim for in your build config (and how to achieve them): - Faster builds and relevant tooling like profiling, - Build logs that actually help when diagnosing issues, - How to debug build failures effectively, - How to check for and visualize build dependencies, - Ensuring builds are reproducible, - Approaches to reducing binary size, - CI config ideas to guard against regressions - Recent build-related developments & a post-distutils world - What are the most pressing pain points for maintainers? PUBLIC CONFIRMED Maintainer track https://pretalx.com/euroscipy-2024/talk/LMSJ8Z/ Room 5 Ralf Gommers PUBLISH 9EUT78@@pretalx.com

-9EUT78

Just contribute?! en

20240829T090000 20240829T100000 1.00000

Just contribute?!

Open source software is here for everyone - but how are we making sure that everyone has equal access? In this keynote I will discuss how to lower barriers of entry for new contributors - and the many facets to this: documentation, community, guidelines, and tools. I will share my personal motivations for contributing to open-source software and my journey over the past five years and all of its learnings. PUBLIC CONFIRMED Keynote https://pretalx.com/euroscipy-2024/talk/9EUT78/ Room 7 Wolf Vollprecht PUBLISH NGECXK@@pretalx.com

-NGECXK

Optimagic: Can we unify Python's numerical optimization ecosystem? en

20240829T103000 20240829T110000 0.03000

Optimagic: Can we unify Python's numerical optimization ecosystem?

Numerical optimization is a large field with applications in engineering, statistics, data science, and many other disciplines. The fundamental goal is always the same: Find a set of parameters that makes a number large or small (potentially fulfilling some constraints). Unfortunately, no single algorithm exists that can solve all optimization problems. Therefore, doing optimization in practice usually involves a lot of trial and error until one finds an optimizer that works well for specific problem characteristics. The good news is that many high-quality optimization algorithms are implemented in Python or have Python bindings. The bad news is that they are scattered across many different packages. Switching between packages is expensive, as each package has its own way of specifying problems and calling optimizers. Other languages are ahead of Python in this respect. For example, `Optimization.jl` provides a unified interface to more than 100 optimization algorithms and is widely accepted as a standard interface for optimization in Julia. In this talk, we take stock of the existing optimization ecosystem in Python and analyze pain points and reasons why no single package has emerged as a standard so far. We use these findings to derive desirable features a Python optimization package would need to unify the ecosystem. We then present optimagic, a NumFocus affiliated Project with the goal of unifying the Python optimization ecosystem. Optimagic provides a common interface to optimization algorithms from scipy, NlOpt, pygmo, and many other libraries. The minimize function feels familiar to users of scipy.optimize who are looking for a more extensive set of supported optimizers. Advanced users can use optional arguments to configure every aspect of the optimization, create a persistent log file, turn local optimizers global with a multistart framework, and more. Finally, we discuss an ambitious roadmap for improvements, new features, and planned community activities for optimagic. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/NGECXK/ Room 7 Janos Gabler PUBLISH XBXX89@@pretalx.com

-XBXX89

forecasting foundation models: evaluation and integration with sktime – challenges and outcomes en

20240829T110000 20240829T113000 0.03000

forecasting foundation models: evaluation and integration with sktime – challenges and outcomes

During the talk, we will expand on a number of challenges we believe end users will typically face, and how we approached them: - API fragmentation of different foundation models. Different models use vastly different interfaces - even if weights are available on model sharing platforms, they do not come with consistent specifications. This is a substantial, underestimated challenge, and providing consistent APIs is difficult (but sktime helps!) - Customer lock-in dynamics. Many providers would like to tie you to their model and upsell. This disincentivizes interoperability and comparability – between competitor solutions, but also with classical models, baselines, or second-to-latest generation models. But comparability and interoperability are in the interest of the end user - and can be addressed with prudent architectural decisions. - Handling of nested software backend layers. Using a foundation model in a consistent API requires one to juggle multiple layers – model framework, deep learning backend, data backend, model marketplace, fine-tuning functionality. It is even more difficult to design a coherent software API for your production system. Learnings from sktime integrations are presented. Finally, we will report results of tests and evaluations, including aspects of software integration, performance, and trustworthiness – exclusive at EuroSciPy 2024. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/XBXX89/ Room 7 Franz Kiraly Benedikt Heidrich PUBLISH GQN8AF@@pretalx.com

-GQN8AF

The Mission Support System and its use in planning an aircraft campaign en

20240829T113000 20240829T115000 0.02000

The Mission Support System and its use in planning an aircraft campaign

The presentation shows why we need software and data to cooperatively manage an aircraft for our missions. We have to figure for a group of instruments the optimal flight path through the upper trosposphere / lower stratosphere. After a short introduction into the complexity of aircraft campaigns I want to show the components of the software package. - our documentation - a data retrieval tool chain for 4-D model forecast data. - Based on the data and the aim a modified WMS Server which shows a Side View and a Linear View upon the common Top View - An UI which interacts with this kind of server and the possibility to work collaborativly on a flight track. - As summary how the software was used in a recent campaign PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/GQN8AF/ Room 7 Reimar Bauer PUBLISH GKYTSY@@pretalx.com

-GKYTSY

wgpu and pygfx: next-generation graphics for Python en

20240829T132000 20240829T135000 0.03000

wgpu and pygfx: next-generation graphics for Python

### Intro This talk introduces a new render engine for Python, called pygfx (pronounced "py-graphics"), as well as our Python wrapper for wgpu. ### Purpose The purpose of pygfx is to bring powerful and reliable visualization to the Python world. Since pygfx is built on wgpu, it has superior performance and reliability compared to OpenGL-based solutions. It is also designed to be versatile: with its modular architecture, one can assemble graphical scenes for diverse applications, ranging from scientific visualization to video games. ### What you can expect In this talk we will touch on a few technical details related to GPU programming, but we will explain these in the context of how these affect the graphics that we need in the scientific Python community. We'll discuss how wgpu has enabled us to introduce features in pygfx that we could only dream of previously. And we will show how wgpu and pygfx fit into the scientific ecosystem. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/GKYTSY/ Room 7 Almar Klein PUBLISH 3RENPJ@@pretalx.com

-3RENPJ

fastplotlib: A high-level library for ultra fast visualization of large datasets using modern graphics APIs en

20240829T135500 20240829T142500 0.03000

fastplotlib: A high-level library for ultra fast visualization of large datasets using modern graphics APIs

Over the past decade, advanced analyses pipelines have been developed for the analysis of large datasets. However, fast visualization and live interactivity during data collection remains challenging. While current tools within the Python plotting ecosystem allow for interactive data visualization, they either fail to leverage modern GPUs efficiently, lack intuitive APIs for rapid prototyping, or require users to write their own shaders. Additionally, other popular plotting libraries, such as bokeh and matplotlib, are not geared towards fast interactive visualization with millions of objects. Given these challenges with current visualization tools, the need for a modern GPU-driven interactive plotting library exists. In this presentation, we will go through the technical details, as well as a brief demo on how fastplotlib makes fast interactive visualization of complex datasets possible. We will demonstrate the broad applicability of fastplotlib as a fast, general-purpose plotting library. Fastplotlib is built on top of pygfx which is a cutting edge Python rendering engine that utilizes WGPU, which can efficiently leverage modern GPU and CPU hardware. WGPU is the successor to OpenGL and features a low overhead with respect to the amount of code per-draw-per-object allowing for speed even when rendering millions of objects. Pygfx is also non-blocking, which allows for interactivity and modification of already drawn objects. Fastplotlib utilizes the pygfx rendering library for fast visualization with an expressive API for scientific visualization. The benefits of fastplotlib are that it reduces boilerplate code which allows users to focus on their data without having to manage the underlying rendering process. Additionally, fastplotlib allows for animations as well as high-level interactivity among plots, which can be combined with lazy loading and lazy compute of very large datasets that are hundreds of gigabytes or terabytes in size. Furthermore, fastplotlib can be used in jupyter notebooks, allowing it to be used on cloud computing and other remote infrastructures for streaming visualizations of extremely large datasets. In total, these unique features and the underlying architecture create a plotting library that is fast, easy to use, and multifaceted. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/3RENPJ/ Room 7 Kushal Kolar Caitlin Lewis PUBLISH 3HCQFS@@pretalx.com

-3HCQFS

napari: multi-dimensional image visualization, annotation, and analysis in Python en

20240829T143000 20240829T150000 0.03000

napari: multi-dimensional image visualization, annotation, and analysis in Python

Napari is an interactive n-dimensional image viewer for Python. In contrast to matplotlib it can quickly visualize large data even when the data has more than 2 dimensions. Data can also be larger than RAM as napari can natively perform chunked loading when the array is chunked. The napari canvas can be 2D or 3D. When you give napari an array with more dimensions than the canvas, it will automatically create sliders for those additional dimensions, allowing you to rapidly explore your data to the full extent, rather than a few sampled slices. Image analysis and visualization involves more than images though: feature detection algorithms result in points, segmentation results in label images, annotation results in shapes such as rectangles or polygons, and more. Napari provides layers that can be displayed on top of each other or side by side. Layers can also be controlled programmatically, which allows users to create workflows compatible with other scientific Python libraries to gain a rapid understanding of the performance of their algorithms, identifying strengths and pinpointing areas for improvement. Sometimes, image analysis algorithms get you this far, but not quite far enough. In such cases, it’s useful to manually curate their output, and then continue with downstream steps of an analysis. Napari provides editing tools for its layer types, allowing one for example to add missing points to the output of a peak detection algorithm, remove incorrect ones, paint over incorrect parts of a segmentation, or draw polygons around missed objects of interest. The resulting data points are saved in standard Scientific Python data structures, such as NumPy or Zarr arrays. This design makes it easy to seamlessly weave together image exploration, image computation, processing, and analysis, and data annotation, curation, and editing. Napari also provides a plugin interface, allowing developers to extend napari’s capabilities, providing users with novel ways to interact with their data. Because napari provides both a library accessible within Python, IPython, and Jupyter, and a standalone executable script, we have even found that napari plugins can be an effective way to help collaborators run Python image analysis workflows without having to launch Python. Because most new algorithms are being developed in Python, it is straightforward to try out new algorithms and assess how they perform using napari or even write a plugin that allows others to easily try as well, even in case of the user not having coding experience. Napari is still being improved. We are actively working on async slicing and rendering, multicanvas views, layer groups etc.. Developments are guided by the user community. We actively encourage contributions by organizing weekly community meetings, napari code cafés and paired coding sessions. In this talk we will cover the key features of napari, including its layer-based approach to handling image data, interactive annotation and segmentation tools, and show real-time performance capabilities. Additionally, we will demonstrate how napari can be leveraged in various scientific workflows, highlighting use cases in biology, neuroscience, and medical imaging. Attendees will gain insights into how napari can enhance their data visualization tasks and streamline analysis pipelines. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/3HCQFS/ Room 7 Wouter-Michiel Vierdag Grzegorz Bokota PUBLISH DLYRXH@@pretalx.com

-DLYRXH

Free-threaded (aka nogil) CPython in the Scientific Python ecosystem : status and road ahead en

20240829T153000 20240829T160000 0.03000

Free-threaded (aka nogil) CPython in the Scientific Python ecosystem : status and road ahead

CPython 3.13 will be released in October 2024 and has been in beta since May 2024. One of its most awaited features is the possibility to remove the GIL (Global Interpreter Lock) through a compile-time flag. In this talk we will explain the relevance of free-threaded CPython for the Scientific Python ecosystem, what already works, some of the caveats, and how to try it out on your favourite use case. In particular we will discuss: - the historic effort in the scikit-learn project to add Continuous Integration for the `nogil` fork of CPython 3.9, and the kind of issues that were surfaced - the ongoing effort in the Scientific Python ecosystem (Numpy, Scipy, scikit-learn, etc ...) to test free-threaded CPython 3.13 and fix issues along the way - how a typical scikit-learn grid-search use case can benefit from free-threaded CPython - how to try out free-threaded CPython on your favourite use case - possible future developments PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/DLYRXH/ Room 7 Loïc Estève PUBLISH JWAMDE@@pretalx.com

-JWAMDE

A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying en

20240829T160000 20240829T163000 0.03000

A Comparative Study of Open Source Computer Vision Models for Application on Small Data: The Case of CFRP Tape Laying

The world of open-source computer vision has never been so exciting—and so challenging. With so many options available, what's the best way to solve your real-world problem? The questions are always the same: Do I have enough data? Which model should I choose? How can I fine-tune and optimize the hyperparameters? In collaboration with the German Aerospace Center, we investigated these questions to develop a model for quality assurance of CFRP tape laying, using only a small real dataset fresh from production. We are very pleased to present a machine learning setup that can empirically answer these questions. Not only for us, but also for you—our setup can easily be transferred to your application! This talk provides you with a blueprint for your own projects, focusing on a setup that allows you to improve your models in a controlled manner and compare results effectively: - We begin by examining the problem through our specific use case of CFRP tape laying and breaking it down into generic solution steps. - These solution steps are translated into a machine learning pipeline using DVC (Data Version Control). This approach saves computation time on steps where neither the data nor the source code has changed and helps to keep track of your progress and performance over time using Git. - We will explore various current model architectures available in the Hugging Face Model Hub and demonstrate how you can fine-tune them on your data using Python packages such as transformers and ray. On the topic of hyperparameter search, we will discuss the available algorithms and the most promising parameters. - Finally, we will review our results, specifically how well different architectures of open-source models perform on our small dataset. We will explore the question of how different model architectures compare and whether the largest model always gives the best results. If you want to level up your MLOps game and gain practical knowledge of the latest computer vision models and practices, this talk is a must for you. Don't miss the opportunity, and look forward to your next computer vision projects! PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/JWAMDE/ Room 7 Thomas Fraunholz Tim Köhler PUBLISH E8HD9K@@pretalx.com

-E8HD9K

The Parallel Universe in Python - A Time Travel to Python 3.13 and beyond en

20240829T103000 20240829T110000 0.03000

The Parallel Universe in Python - A Time Travel to Python 3.13 and beyond

Parallel computing is essential for many performance-critical applications. Python provides many solutions for this problem. New versions of Python will support sub-interpreters and a, currently experimental, free-threading version without the Global Interpreter Lock (GIL). This talk starts with a short overview over this topic, clarifying terms such parallel, concurrent, and distribute computing as well as CPU-bound, memory-bound, and IO-bound problems. The presentation explains how Python and its standard library support parallel programming tasks. In addition, many Python libraries provide very useful approaches and tools for parallel computing. An overview of important libraries provides guidance which library can be used for what type of parallel problem. How do Python's new features such as sub-interpreters and free-threading without the Global Interpreter Lock (GIL) impact parallel Programming in Python? This talk address this question by providing examples where these features might help to make programs simpler and/or faster. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/E8HD9K/ Room 6 Mike Müller PUBLISH 3K8ZXN@@pretalx.com

-3K8ZXN

LPython: Novel, Fast, Retargetable Python Compiler en

20240829T110000 20240829T113000 0.03000

LPython: Novel, Fast, Retargetable Python Compiler

In this talk, we will delve into LPython, an open-source, LLVM-based Python compiler that transposes type-annotated Python code into optimized machine code. It offers rapid Ahead-Of-Time (AOT) compilation to binaries, with an option for Just-In-Time (JIT) compilation and smooth interoperability with CPython. We'll also provide benchmarks, including a comparison with Numba for JIT compilation and Clang++ and g++ for AOT compilation. We will examine the unique qualities LPython has to offer and learn the "why" behind the compiler. Its main focus is speed, and our benchmarks will validate its competitiveness with the current cutting-edge technology. Here's the talk outline: * **Introduction** (2 mins): A warm welcome and a brief presenter introduction. An overview of the talk and learning expectations for attendees. * **What is LPython?** (2 mins): Explaining the need for yet another Python compiler. How it operates at each stage and introduction to the LCompilers family. * **Phases of compilation, Intermediate representation (ASR), Optimizations** (5 mins): A look at the internals, from Python code to parsers, Abstract Syntax Tree (AST), Abstract Semantic Representation (ASR), and backends (LLVM, C, C++, WASM, etc). This section also gives a short overview of how low- and high-level machine-independent optimizations are performed. * **Just-In-Time and Ahead-Of-Time compilation** (2 mins): Distinguishing between these two types of compilation in LPython; JIT compiles code during runtime while AOT produces binary output. * **Interoperability with CPython** (2 mins): Discuss how LPython enables seamless integration with CPython libraries. Demonstrate the `@pythoncall` and `@lpython` decorators, and their practical uses (e.g., Matplotlib for graphs). * **Online Demo** (2 mins): Live demo of code compilation using LPython in a browser. Displaying AST, ASR, C, and WAT (WebAssembly Text Format) output tabs on the website. * **Speed and Performance benchmarks** (4 mins): Presenting benchmarks against Numba, Clang++, g++, and Python itself, demonstrating LPython’s ability to run on Linux, Mac, Windows, and WebAssembly. * **Accelerated code sample** (1 min): Showcasing how to speed up a simple Python method developed for tomography. * **User-friendly developer experience** (2 mins): Displaying compiler diagnostic messages (errors, warnings, etc) and highlighting the user-friendly developer experience. * **Conclusion and takeaways** (2 mins): Recapping the main points covered in the talk. Encouraging attendees to incorporate LPython into their projects and providing practical advice on integration. Upcoming features will also be discussed. * **Q&A session** (5 mins): Responding to questions and engaging in dialogue. **Prerequisites**: Intermediate experience in Python would be helpful. An interest in learning the basics of Python compilers. **Content URLs**: LPython: https://lpython.org/ LPython blog: https://lpython.org/blog/2023/07/lpython-novel-fast-retargetable-python-compiler/ LPython GitHub: https://github.com/lcompilers/lpython/ Slides/GitHub repo: to be added later. **Speaker Info**: Naman Gera currently works as a Research Software Engineer at the UK’s national synchrotron science facility, [Diamond Light Source](https://www.diamond.ac.uk/Home/About.html), based in Harwell Science and Innovation Campus, UK. He is presenting this work done with his past employer, GSI Technology, being a part of the compiler team led by [Dr. Ondrej Certik](https://ondrejcertik.com/). When AFK, Naman likes being in nature, climbing mountains, and solo travelling to many countries. **Speaker URLs**: GitHub: https://github.com/namannimmo10 LinkedIn: https://www.linkedin.com/in/namannimmo/ Email: namangera15@gmail.com PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/3K8ZXN/ Room 6 Naman Gera PUBLISH BYTCSC@@pretalx.com

-BYTCSC

The Array API Standard in SciPy en

20240829T113000 20240829T115000 0.02000

The Array API Standard in SciPy

SciPy has had "support for distributed and GPU arrays" on its roadmap for over five years now. For a library built around NumPy arrays, that is easier said than done. There are other array libraries, such as CuPy, PyTorch, JAX, and Dask, which can help address these user wishes. Supporting multiple array libraries is not simple, since their APIs differ. Lengthy if-else statements in every function won't cut it - what we want is to be able to write 'array-agnostic' code which will work with multiple array libraries, without having to special-case on the input array type. The Python array API standard aims to standardise functionality that exists in most array libraries. It specifies an API which 'array-consumer' libraries can use to write array-agnostic code. In this talk, I give a brief introduction to the standard, before explaining how we are implementing support for it in SciPy, and the progress which we have made so far. Rough talk outline: - 5 mins - what is the array API standard, and why should you care? - 7 mins - what work is needed for a consumer library to adopt the array API standard, and what does that look like in SciPy? What tools are available to help? - 3 mins - current progress in SciPy & looking to the future. PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/BYTCSC/ Room 6 Lucas Colley PUBLISH 89BJ9Q@@pretalx.com

-89BJ9Q

Accelerating Python on HPC with Dask en

20240829T132000 20240829T135000 0.03000

Accelerating Python on HPC with Dask

Dask is a popular Python framework for scaling your workloads, whether you want to leverage all of the cores on your laptop and stream large datasets through memory, or scale your workload out to thousands of cores on large compute clusters. Dask allows you to distribute code using familiar APIs such as pandas, NumPy and scikit-learn or write your own distributed code with powerful parallel task-based programming primitives. We will start by exploring the concept of adaptive clusters, which allow for dynamic scaling of resources based on the workload's demands. Adaptive clusters automatically submit and manage many jobs to an HPC queue, ensuring efficient resource utilisation and cost-effectiveness. This method is particularly useful for workloads with varying computational requirements, as it adjusts the number of active workers in real-time. Next, we will dive into using runners that leverage parallel execution environments such as MPI or job schedulers like SLURM to bootstrap Dask clusters within a single large job allocation. Submitting a single job offers some benefits (aside from the fact that HPC administrators often prefer this approach), including better node locality, as the scheduler places processes on nodes that are physically closer together. This results in more efficient communication and reduced latency. Additionally, launching all workers simultaneously ensures balanced data distribution across the cluster. The session will then shift focus to the accelerated side of Dask, demonstrating how to harness the power of GPUs to significantly boost computation speed. We will introduce Dask CUDA, part of RAPIDS, a suite of open-source libraries designed to execute end-to-end data science and analytics pipelines entirely on GPUs. By integrating Dask CUDA, users can achieve unprecedented levels of performance, particularly for data-intensive tasks such as machine learning and data preprocessing. We will also explore the advantages of using UCX (Unified Communication X) to enhance Dask's performance on HPC systems with advanced networking technologies. UCX provides a high-performance communication layer that supports various network transports, including Infiniband and NVLink. By leveraging these accelerated networking options, users can achieve lower latency and higher bandwidth, resulting in faster data transfers between Dask workers and more efficient parallel computations. Outline: - Overview of Dask - Scaling out Pandas and NumPy - Custom parallel code - Workflow engines - Machine learning and AI applications - Deploying Dask on HPC - Adaptive clusters - Fixed size runners - Accelerating Dask on HPC - RAPIDS and Dask CUDA - UCX PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/89BJ9Q/ Room 6 Jacob Tomlinson PUBLISH U3EMKF@@pretalx.com

-U3EMKF

Regularizing Python using Structured Control Flow en

20240829T135500 20240829T142500 0.03000

Regularizing Python using Structured Control Flow

The Control Flow Graph (CFG) is a well known and established concept in computer science and used as part of the compilation or interpretation step of all modern programming languages. Unfortunately, CFGs are not always the ideal representation for compilers to work with because they often result in arbitrarily structured graphs (“spaghetti”). This in turn can lead to compiler optimization steps not being leveraged and as a result a compiler may be unable to generate an optimal low-level representation of the program. A recent enhancement is the concept of the Structured Control Flow Graph (SCFG). In this extension to CFGs all blocks are part of special regions within the SCFG. The three possible region shapes are: linear, branch and loop. A linear region is simply a linear sequence of instructions. A Branch region is a shape where the control flow is split symmetrically and joined again. Finally, a loop region is a subgraph with a single backedge from the region's exiting latch. These shapes effectively describe all possible control flow patterns of a computer program and resulting structure offers significantly more chances for a compiler to apply transformations and optimizations, which in turn may lead to significant performance improvements. The Python package being presented is capable of constructing a CFG from Python source code input in the form of an Abstract Syntax Tree (AST). Furthermore the package can then apply two algorithmic steps known as Loop Restructuring (LR) and Branch Restructuring (BR) (as described in [2]) which convert the constructed CFG into an SCFG. Lastly, the package is able to synthesize a regularized Python program from the SCFG representation which is behaviourally equivalent to the original but is potentially easier for a compiler to work with. Essentially the package implements a program transformation at the source code level, where both input and output are runnable Python programs. Going beyond SCFGs, a novel Intermediate Representation (IR), the Regionalized Value State Dependence Graph (RVSDG), has been proposed [3]. Compared to CFGs and SCFGs, which are control-flow centric IRs, RVSDGs are data-flow centric IRs. This means that a number of common compiler transforms can be performed when the program has been converted to the RVSDG representation without having to reconstruct invariant properties post transformation. Also, this representation unlocks a number of compiler transforms, as data-flow through the program is explicitly available. Transforms will be algorithmically simpler, more elegant and computationally less expensive. Importantly, the construction of the SCFG representation for the input program is a necessary first step to constructing the full RVSDG and has significant merit in its own right. Sources on GitHub as: https://github.com/numba/numba-rvsdg Package on PyPi as: https://pypi.org/project/numba-rvsdg/ References: [1] Siu Kwan Lam, Antoine Pitrou and Stanly Seibert. Numba: A LLVM-based Python JIT Compiler. Proc. Second Workshop on the LLVM Compiler Infrastructure in HPC, pp. 1-6, ACM. 2015 [2] Helge Bahmann, Nico Reissmann, Magnus Jahre, and Jan Christian Meyer. Perfect reconstructability of control flow from demand dependence graphs. ACM Transactions on Architecture and Code Optimization, 11(4):66:1–66:25, 2015. [3] Nico Reissmann, Jan Christian Meyer, Helge Bahmann, and Magnus Själander. RVSDG: An Intermediate Representation for Optimizing Compilers. Association for Computing Machinery (ACM) 19(6):1-28, 2020 PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/U3EMKF/ Room 6 Valentin Haenel PUBLISH JXB79J@@pretalx.com

-JXB79J

Building optimized packages for conda-forge and PyPI en

20240829T143000 20240829T145000 0.02000

Building optimized packages for conda-forge and PyPI

This talk has two aims: introducing a new tool – rattler-build – and showing how rattler-build is used to build highly optimized packages for the widely used conda-forge distribution (which ships thousands of scientific Python packages). rattler-build has been developed from scratch to replace `conda-build`. It's a package build tool written in Rust, on top of the `rattler` libraries. We have undergone a meticulous standardization process through a number of Conda Enhancement Proposals. The result is a refined recipe format that removes a lot of the warts of conda-build: - proper YAML - no semantic comments or arbitrary Jinja allowed - integrates nicely with VSCode and other editors because we have a proper JSON schema - strictly defined behavior for multi-output recipes The result is that rattler-build is infinitely faster than conda-build at evaluating recipes. `rattler-build` also comes with perfect reproducibility for reproducible, bit-by-bit equivalent packages. In `conda-forge`, there has been recent innovation to ship more optimized packages. In the past, packages that wanted to make use of SIMD features in the CPU had to implement "dynamic dispatching" at runtime - at the cost of a larger package size. Since some time, `conda-forge` defines multiple "cpu-levels". These are defined for `sse`, `avx2`, `avx512` or `ARM Neon`. On the client-side the maximum CPU level is detected and the best available package is then installed. This opens the doors for highly optimized packages on `conda-forge` that support the latest CPU features. We will show how to use this in practice with rattler-build. For GPUs, `conda-forge` has supported different `CUDA` levels for a long time, and we'll look at how that is used as well. Lastly, we also take a look at PyPI. There are ongoing discussions on how to improve support for wheels with CUDA support. We are going to discuss how the (pre-)PEP works and synergy possibilities of `rattler-build` and `cibuildwheel`. PUBLIC CONFIRMED Talk (15 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/JXB79J/ Room 6 Wolf Vollprecht Bas Zalmstra PUBLISH 893KBK@@pretalx.com

-893KBK

Simulated data is all you need: Bayesian parameter inference for scientific simulators with SBI en

20240829T153000 20240829T160000 0.03000

Simulated data is all you need: Bayesian parameter inference for scientific simulators with SBI

A central challenge with using simulators is identifying parameters that accurately reproduce observed data. When working with complex, possibly stochastic and black-box simulators, parameter inference often amounts to hand tuning, grid-searches or optimization for single best-fitting parameters. These approaches can be inefficient and brittle when working with high-dimensional and noisy simulators. The Bayesian parameter inference approach overcomes these challenges. It infers a conditional distribution over the simulator parameters given the observed data and a specified parameter prior encoding domain knowledge. The inferred posterior distribution identifies all suitable parameters while quantifying uncertainties and potential parameter correlations. However, classical approximate inference algorithms like e.g. Markov Chain Monte Carlo (MCMC) or variational inference usually cannot be applied to scientific simulators because they require access to the likelihood of the simulator. The core idea of simulation-based inference (SBI) is to enable Bayesian parameters inference for scientific simulators by requiring only simulated data. To that end, modern SBI methods use neural network-based conditional density estimation (e.g., normalizing flows, diffusion models) to learn to approximate the posterior solely from data. After training, the estimator can be applied to observed data to obtain an approximation to the desired posterior distribution. Many new SBI methods have been developed in the last years, enabling its applications to a wide range of simulators. However, bringing SBI into application can be challenging for practitioners because of a lack of reliable and approachable software tools. The [`sbi` library](https://sbi-dev.github.io/sbi/) aims to close this applicability gap: It implements state-of-the-art SBI algorithms, provides methods for the entire SBI workflow including accuracy checks and plotting, and comes with a comprehensive documentation and detailed tutorials. `sbi` is a community project with more than 50 contributors from across Europe. It originated as a research code base at the [University of Tübingen](https://www.mackelab.org/) and is now maintained by researchers from Tübingen and the [appliedAI TransferLab](https://transferlab.ai/). Contributions are very welcome and hackathons and workshops are organized regularly. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/893KBK/ Room 6 Jan Boelts (Teusen) PUBLISH HKVEXW@@pretalx.com

-HKVEXW

Reproducible workflows with AiiDA - The power and challenges of full data provenance en

20240829T160000 20240829T163000 0.03000

Reproducible workflows with AiiDA - The power and challenges of full data provenance

AiiDA is a robust open-source Python package to help researchers automate, manage, persist, share, and reproduce complex workflows. A defining feature of AiiDA is the automatic recording of the calculations' history, or “provenance”, including relevant data inputs and outputs. This allows for designing detailed interfaces of processes and workflows, using advanced queries to look for relevant results or share data. This makes AiiDA particularly suitable for building a sustainable computational infrastructure for running high-throughput workflows and facilitates sharing data and provenance in a FAIR way for publication. On the other hand, writing workflows while keeping in mind the requirements of tracking the full provenance can be cumbersome for new users. Until very recently, running a new external (i.e. non-Python) code required developing a dedicated plugin, connecting processes in a workflow using advanced Python concepts. For high-throughput performance on HPC systems, AiiDA inherently depended on services that are not always trivial to install, such as a PostgreSQL database and a RabbitMQ message broker. In the past year, several improvements have been made to improve its usability, with a particular focus on getting new users up and running as quickly as possible. In this talk, we’ll start with a brief overview of AiiDA’s philosophy and core features as a workflow manager and discuss solutions to the challenges described above. The new plugin package `aiida-shell` makes running any shell executable easy, without the need to develop a custom plugin, while preserving basic provenance. Adding support for SQLite databases and making RabbitMQ optional allows users who don’t need high performance or scalability to run without the need to install and configure these services. Furthermore, a new WorkGraph feature provides a powerful framework for designing flexible node-based workflows with basic Python knowledge. This flexibility is essential to allow users to piece together a workflow for their scientific use case quickly. The WorkGraph also allows management and visualization of workflows in web browsers and Jupyter notebooks. PUBLIC CONFIRMED Talk (25 mins + Q&A) https://pretalx.com/euroscipy-2024/talk/HKVEXW/ Room 6 Marnik Bercx Xing Wang PUBLISH CDB9NG@@pretalx.com

-CDB9NG

NumPy's new DType API and 2.0 transition en

20240829T110000 20240829T114500 0.04500

NumPy's new DType API and 2.0 transition

One of the new features of NumPy 2.0 is a new variable length StringDType. This DType was written using the new DType C API which had been only experimentally available previously. In this talk I will introduce the new concepts for creating user DTypes. What is the C API to construct such a new DType and what are the most important methods that need to be implemented? The StringDType and similar dtypes experiments in https://github.com/numpy/numpy-user-dtypes will serve as examples for this. In the second part, I will recap how long maintainers needed to release downstream libraries compatible with NumPy 2. Unfortunately, availability of downstream libraries may not be a good indicator of end-user difficulties, which are harder to predict. Undoubtedly, this transition is still ongoing in some parts of the ecosystem and this session is a good opportunity to discuss it. How hard was it to adapt to the Python API changes, the promotion changes due to NEP 50, and the requirement to recompile with NumPy 2? Discussing these will help us make future decisions about similar breaking changes. PUBLIC CONFIRMED Maintainer track https://pretalx.com/euroscipy-2024/talk/CDB9NG/ Room 5 Sebastian Berg PUBLISH 8MXPRW@@pretalx.com

-8MXPRW

Dispatching, Backend Selection, and Compatibility APIs en

20240829T132000 20240829T150000 1.04000

Dispatching, Backend Selection, and Compatibility APIs

## Dispatching and Backend Selection Discussion In the first part, we would like to briefly review the successful [`NetworkX` backend selection](https://networkx.org/documentation/stable/reference/backends.html) and work towards a possible future dispatching project under the Scientific Python umbrella, [`spatch`](https://github.com/scientific-python/spatch). Many projects implement multiple dispatching based on types. Other projects have experimented with backend selection that goes beyond type dispatching and allows swapping in a different algorithm. `NetworkX` has both and also added features such as including installed backends into its documentation. And an older experiment for a dispatching system was `uarray`. Projects such as scikit-learn currently focus on a hybrid approach: dispatching via the array API where possible, otherwise using a backend selection system. Dispatching and backend selection is a complex field with various possible implementations. We would like to discuss requirements, specifically with a project like `scikit-image` in mind, and discuss how these requirements can be achieved. This session is meant to be an open discussion to push forward a new `spatch` library: an example implementation of such a dispatching system. ## Array API adoption progress and discussion In the second part, we will discuss adoption of [Array API](https://github.com/data-apis/array-api/) into libraries such as SciPy, and scikit-learn. How did the support develop in the past year and what issues remain. ## Dataframe compatibility layers Finally, similarly to the Array API, solutions such as [narwhals](https://github.com/narwhals-dev/narwhals) provide a compatibility layer for libraries working with dataframes. PUBLIC CONFIRMED Maintainer track https://pretalx.com/euroscipy-2024/talk/8MXPRW/ Room 5 Sebastian Berg Guillaume Lemaitre Tim Head Marco Gorelli Erik Welch Stéfan van der Walt Aditi Juneja Joris Van den Bossche