To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
30min
Registration
Aula
08:30
08:30
90min
Getting started with JupyterLab
Mike Müller

JupyterLab is very widely used in the Python scientific community. Most, if not all, of the other tutorials will use Jupyter as a tool. Therefore, a solid understanding of the basics is very helpful for the rest of the conference as well as for your later daily work.
This tutorial provides an overview of important basic Jupyter features.

Scientific Applications
HS 120
08:30
90min
Network Analysis Made Simple (and fast!)
Mridul Seth

Through the use of NetworkX's API, tutorial participants will learn about the basics of graph theory and its use in applied network science. Starting with a computationally-oriented definition of a graph and its associated methods, we will build out into progressively more advanced concepts (path and structure finding). We will also discuss new advances to speed up NetworkX Code with dispatching to alternate computation backends like GraphBLAS. This will be a hands-on tutorial, so stretch your muscles and get ready to go through the exercises!

Data Science and Visualisation
Aula
10:00
10:00
30min
Break
Aula
10:00
30min
Break
HS 120
10:30
10:30
90min
Introduction to Geospatial Machine Learning with SRAI
Piotr Szymański, Szymon Woźniak, Piotr Gramacki, Kamil Raczycki, Kacper Leśniara

This tutorial offers a thorough introduction to the srai library for Geospatial Artificial Intelligence. Participants will learn how to use this library for geospatial tasks like downloading and processing OpenStreetMap data, extracting features from GTFS data, dividing an area into smaller regions, and representing regions in a vector space using various spatial features. Additionally, participants will learn to pre-train embedding models and train predictive models for downstream tasks.

Scientific Applications
Aula
10:30
90min
Introduction to Python for scientific programming
Milton Gomez

This tutorial will provide an introduction to Python intended for beginners.

It will notably introduce the following aspects:

  • built-in types
  • controls flow (i.e. conditions, loops, etc.)
  • built-in functions
  • basic Python class
Scientific Applications
HS 120
12:00
12:00
90min
Lunch
Aula
12:00
90min
Lunch
HS 120
13:30
13:30
90min
Developing pandas extensions in Rust
Marc Garcia

pandas is a batteries included dataframe library, implementing hundreds of generic operations for tabular data, such as math or string operations, aggregations and window functions... In some case, domain specific code may benefit from user defined functions (UDFs) that implement some particular logic. These functions can sometimes be implemented using more basic pandas vectorized operations, and they will be reasonably fast, but in some others a Python function working with the individual values needs to be implemented, and those will execute orders of magnitude slower than their equivalent vectorized versions. In this tutorial we will see how to implement functions in Rust that can be used with dataframe values at the individual level, but run at the speed of vectorized code, and in some cases faster.

High Performance Computing
Aula
13:30
90min
Introduction to NumPy
Geir Arne Hjelle

NumPy is one of the foundational packages for doing data science with Python. It enables numerical computing by providing powerful N-dimensional arrays and a suite of numerical computing tools. In this tutorial, you'll be introduced to NumPy arrays and learn how to create and manipulate them. Then, you'll see some of the tools that NumPy provides, including random number generators and linear algebra routines.

Scientific Applications
HS 120
15:00
15:00
30min
Break
Aula
15:00
30min
Break
HS 120
15:30
15:30
90min
Introduction to Data Analysis Using Pandas
Stefanie Molin

Working with data can be challenging: it often doesn’t come in the best format for analysis, and understanding it well enough to extract insights requires both time and the skills to filter, aggregate, reshape, and visualize it. This session will equip you with the knowledge you need to effectively use pandas – a powerful library for data analysis in Python – to make this process easier.

Data Science and Visualisation
HS 120
15:30
90min
Predictive survival analysis with scikit-learn, scikit-survival and lifelines
Olivier Grisel, Vincent Maladiere

This tutorial will introduce how to train machine learning models for time-to-event prediction tasks (health care, predictive maintenance, marketing, insurance...) without introducing a bias from censored training (and evaluation) data.

Machine and Deep Learning
Aula
08:30
08:30
90min
Ibis: A fast, flexible, and portable tool for data analytics.
Phillip Cloud, Gil Forsyth

Ibis provides a common dataframe-like interface to many popular databases and analytics tools (BigQuery, Snowflake, Spark, DuckDB, …). This lets users analyze data using the same consistent API, regardless of which backend they’re using, and without ever having to learn SQL. No more pains rewriting pandas code to something else when you run into performance issues; write your code once using Ibis and run it on any supported backend. In this tutorial users will get experience writing queries using Ibis on a number of local and remote database engines.

Data Science and Visualisation
Aula
08:30
90min
Introduction to matplotlib for visualization in Python
Tim Hoffmann

This tutorial explains the fundamental ideas and concepts of matplotlib. It's suited for complete beginners to get started as well as existing users who want to improve their plotting abilities and learn about best practices.

Data Science and Visualisation
HS 120
10:00
10:00
30min
Break
Aula
10:00
30min
Break
HS 120
10:30
10:30
90min
Introduction to scikit-learn
Stefanie Sabine Senger

Update: Here, I provide a prepared jupyter notebook for your to fill with code during the tutorial: https://github.com/StefanieSenger/Talks/blob/main/2023_EuroSciPy/2023_EuroSciPy_Intro_to_scikit-learn_fillout-notebook.ipynb. Please download it and have it at hand when the tutorial starts. You can still download it during the introduction part of the tutorial.

This tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning. We will talk about what Machine Learning is and how scikit-learn can implement it. In the practical part we will learn how to create a predictive modelling pipeline and how to fine tune its hyperparameters to improve the model's score.

Machine and Deep Learning
HS 120
10:30
90min
PPML: Machine Learning on data you cannot see
Valerio Maggio

Privacy guarantee is the most crucial requirement when it comes to analyse sensitive data. However, data anonymisation techniques alone do not always provide complete privacy protection; moreover Machine Learning models could also be exploited to leak sensitive data when attacked, and no counter-measure is applied. Privacy-preserving machine learning (PPML) methods hold the promise to overcome all these issues, allowing to train machine learning models with full privacy guarantees. In this tutorial we will explore several methods for privacy-preserving data analysis, and how these techniques can be used to safely train ML models without actually seeing the data.

Machine and Deep Learning
Aula
12:00
12:00
90min
Lunch
Aula
12:00
90min
Lunch
HS 120
13:30
13:30
90min
Generating Data Frames for your test - using Pandas stratgies in Hypothesis
Cheuk Ting Ho

Do you test your data pipeline? Do you use Hypothesis? In this workshop, we will use Hypothesis - a property-based testing framework to generate Pandas DataFrame for your tests, without involving any real data.

Machine and Deep Learning
Aula
13:30
90min
Introduction to numerical optimization
Tim Mensinger, Janos Gabler, Tobias Raabe

In this hands-on tutorial, participants will delve into numerical optimization fundamentals and engage with the optimization libraries scipy.optimize and estimagic. estimagic provides a unified interface to many popular libraries such as nlopt or pygmo and provides additional diagnostic tools and convenience features. Throughout the tutorial, participants will get the opportunity to solve problems, enabling the immediate application of acquired knowledge. Topics covered include core optimization concepts, running an optimization with scipy.optimize and estimagic, diagnostic tools, algorithm selection, and advanced features of estimagic, such as bounds, constraints, and global optimization.

High Performance Computing
HS 120
15:00
15:00
30min
Break
Aula
15:00
30min
Break
HS 120
15:30
15:30
90min
From Complex Scientific Notebook to User-Friendly Web Application
Aleksandra Plonska, Piotr Płoński

Learn how to show your work with the MERCURY framework. This open-source tool perfectly matches your computed notebook (e.g., written in Jupyter Notebook). Without knowledge of frontend technologies, you can present your results as a web app (with interactive widgets), report, dashboard, or report. Learn how to improve your notebook and make your work understandable for non-technical mates. Python only!

Data Science and Visualisation
Aula
15:30
90min
Image processing with scikit-image
Guillaume Lemaitre, Joan Massich

This tutorial explores scikit-image, the numpy-native library in the scientific python ecosystem, for visual data analysis and manipulation.
Designed for beginners and advanced users, it empowers image analysis skills and offers insights into scikit-image documentation.

It covers basic concepts like image histogram, contrast, filtering, segmentation, and descriptors through practical exercises.
The tutorial concludes with advanced performance optimization techniques.

Familiarity with numpy arrays is essential as it the underlying data representation.

Scientific Applications
HS 120
08:30
08:30
30min
Registration
Aula
09:00
09:00
60min
Integrating Ethics in ML: From Philosophical Foundations to Practical Implementations
Giada Pistilli

In the rapidly evolving landscape of Machine Learning (ML), significant advancements like Large Language Models (LLMs) are gaining critical importance in both industrial and academic spheres. However, the rush towards deploying advanced models harbors inherent ethical tensions and potential adverse societal impacts. The keynote will start with a brief introduction to the principles of ethics, viewed through the lens of philosophy, emphasizing how these fundamental concepts find application within ML. Grounding our discussion in tangible realities, we will delve into pertinent case studies, including the BigScience open science initiative, elucidating the practical application of ethical considerations. Additionally, the keynote will touch upon findings from my recent research, which investigates the synergy between ethical charters, legal tools, and technical documentation in the context of ML development and deployment.

Aula
10:00
10:00
30min
Break
Aula
10:00
30min
Break
HS 120
10:30
10:30
30min
Anomaly Detection in Time Series: Techniques, Tools and Tricks
Vadim Nelidov

From sensor data to epidemic outbreaks, particle dynamics to environmental monitoring, much of crucial real world data has temporal nature. Fundamental challenges facing data specialist dealing with time series include not only predicting the future values, but also determining when these values are alarming. Standard anomaly detection algorithms and common rule-based heuristics often fall short in addressing this problem effectively. In this talk, we will closely examine this domain, exploring its unique characteristics and challenges. You will learn to apply some of the most promising techniques for detecting time series anomalies as well as relevant scientific Python tools that can help you with it.

Data Science and Visualisation
HS 120
10:30
90min
Contributor, Developer and Volunteer Experience: Navigating Challenges Beyond Code
Giada Pistilli, Cheuk Ting Ho, Maren Westermann, Stefania Delprete

Let's Talk Inclusivity and Mental Health.

What's beyond the lines of code? Let's explore the spectrum of experiences, from contributors to volunteers, developers to conference attendees.

Join us to share your insights, experiences, and solutions for a more supportive and inclusive scientific Python ecosystem. Let's empower one another and shape a community that thrives on empathy, understanding, and collaboration.

Community, Education, and Outreach
HS 119 - Maintainer track
10:30
30min
Ibis: Because SQL is everywhere but you don't want to use it
Phillip Cloud, Gil Forsyth

We love to use Python in our day jobs, but that enterprise database you run your ETL job against may have other ideas. It probably speaks SQL, because SQL is ubiquitous, it’s been around for a while, it’s standardized, and it’s concise.
But is it really standardized? And is it always concise? No!

Do we still need to use it? Probably!

What’s a data-person to do? String-templated SQL?
print(f”That way lies {{ m̴͕̰̻̏́ͅa̸̟̜͉͑d̵̨̫̑n̵̖̲̒͑̾e̸̘̼̭͌s̵͇̖̜̽s̸̢̲̖͗͌̏̊͜ }}”.)

Instead, come and learn about Ibis! It offers a dataframe-like interface to construct concise and composable queries and then executes them against a wide variety of backends (Postgres, DuckDB, Spark, Snowflake, BigQuery, you name it.).

Data Science and Visualisation
Aula
11:05
11:05
30min
Get the best from your scikit-learn classifier: trusted probabilties and optimal binary decision
Guillaume Lemaitre

When operating a classifier in a production setting (i.e. predictive phase), practitioners are interested in potentially two different outputs: a "hard" decision used to leverage a business decision or/and a "soft" decision to get a confidence score linked to each potential decision (e.g. usually related to class probabilities).

Scikit-learn does not provide any flexibility to go from "soft" to "hard" predictions: it uses a cut-off point at a confidence score of 0.5 (or 0 when using decision_function) to get class labels. However, optimizing a classifier to get a confidence score close to the true probabilities (i.e. a calibrated classifier) does not guarantee to obtain accurate "hard" predictions using this heuristic. Reversely, training a classifier for an optimum "hard" prediction accuracy (with the cut-off constraint at 0.5) does not guarantee obtaining a calibrated classifier.

In this talk, we will present a new scikit-learn meta-estimator allowing us to get the best of the two worlds: a calibrated classifier providing optimum "hard" predictions. This meta-estimator will land in a future version of scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/26120.

We will provide some insights regarding the way to obtain accurate probabilities and predictions and also illustrate how to use in practice this model on different use cases: cost-sensitive problems and imbalanced classification problems.

Machine and Deep Learning
HS 120
11:05
30min
Pandas 2.0 and beyond
Joris Van den Bossche, Richard Shadrach

Pandas has reached a 2.0 milestone in 2023. But what does that mean? And what is coming after 2.0? This talk will give an overview of what happened in the latest releases of pandas and highlight some topics and major new features the pandas project is working on

Data Science and Visualisation
Aula
11:40
11:40
20min
DataFrame-agnostic code: are we there yet?
Marco Gorelli

Have you ever wanted to write a DataFrame-agnostic function, which should perform the same operation regardless of whether the input is pandas / polars / something else? Did you get stuck with special-casing to handle all the different APIs? All is good, the DataFrame Standard is here to help!

Data Science and Visualisation
Aula
11:40
20min
GPT generated text detection: problems and solution in the scientific publishing
Dr. Milos Cuculovic, Andrea Guzzo

Since its release, ChatGPT is now widely adopted as "the" text generation tool used across all industries and businesses. This also includes the domain of scientific research where we do observe more and more scientific papers partially or even fully generated by AI. The same also applies to the peer-reviews reports created while reviewing a paper.

What are the guidelines in the scientific research world? What is now the meaning of the written word and how do we build a model that can identify whether a text is AI-generated? What are the potential solutions to solve this important issue?

Within this talk, we are discussing on how to detect AI-generated text and how to create a scalable architecture integrating this tool.

Machine and Deep Learning
HS 120
12:00
12:00
90min
Lunch
Aula
12:00
90min
Lunch
HS 120
13:30
13:30
45min
Sparse Data in the Scientific Python Ecosystem: Current Needs, Recent Work, and Future Improvements
Julien Jerphanion

This maintainer track aims to lead discussions about the current needs for sparse data in the scientific python Ecosystem. It will present achievements and pursuit of the work initiated in the first Scientific Python Developer Summit, which took from 22nd May to 28th May 2023.

High Performance Computing
HS 119 - Maintainer track
13:30
30min
Timing and Benchmarking Scientific Python
Kai Striega

Scientific code is often complex, resource-intensive, and sensitive to performance issues, making accurate timing and benchmarking critical for optimising performance and ensuring reproducibility. However, benchmarking scientific code presents several challenges, including variability in input data, hardware and software dependencies, and optimisation trade-offs. In this talk, I discuss the importance of timing and benchmarking for scientific code and outline strategies for addressing these challenges. Specifically, I emphasise the need for representative input data, controlled benchmarking environments, appropriate metrics, and careful documentation of the benchmarking process. By following these strategies, developers can effectively optimise code performance, select efficient algorithms and data structures, and ensure the reliability and reproducibility of scientific computations.

Data Science and Visualisation
Aula
13:30
30min
Why I Follow CI/CD Principles When Writing Code: Building Robust and Reproducible Applications
Artem Kislovskiy

This talk will discuss the importance of Continuous Integration and Continuous Delivery (CI/CD) principles in the development of scientific applications, with a focus on creating robust and reproducible code that can withstand rigorous testing and scrutiny. The presentation will cover best practices for project structure and code organization, as well as strategies for ensuring reproducibility, collaboration, and managing dependencies. By implementing CI/CD principles in scientific application development processes, researchers can improve efficiency, reliability, and maintainability, ultimately accelerating research.

Scientific Applications
HS 120
14:05
14:05
30min
Accelerating your Python code - a systematic overview
Tim Hoffmann

Python is slow. We feel the performance limitations when doing computationally intensive work. There are many libraries and methods to accelerate your computations, but which way to go? This talk serves as a navigation guide through the world of speeding up Python. At the end, you should have a high-level understanding of performance aspects and know which way to go when you want to speed up your code next time.

High Performance Computing
Aula
14:05
30min
Solara: A Pure Python, React-style Framework for Scaling Your Data Apps
Maarten Breddels

Solara is a pure Python web framework designed to scale complex applications. Leveraging a React-like API, Solara offers the scalability, component-based coding, and simple state management that have made React a standard for large web applications. Solara uses a pure Python implementation of React, Reacton, to create ipywidgets-based applications that work both in the Jupyter Notebook environment and as standalone web apps with frameworks like FastAPI. This talk will explore the design principles of Solara, illustrate its potential with case studies and live examples, and provide resources for attendees to incorporate Solara into their own projects. Whether you're a researcher developing interactive visualizations or a data scientist building complex web applications, Solara provides a Python-centric solution for scaling your projects effectively.

Data Science and Visualisation
HS 120
14:15
14:15
45min
What-not to expect from NumPy 2.0
Sebastian Berg

NumPy is planning a 2.0 release early next year replacing the 1.X release. While we hope that the release will not be disruptive to most users we do plan some larger changes that may affect many. These changes include modifications to the Python and C-API, for example making the NumPy promotion rules more consistent around scalar values.

Scientific Applications
HS 119 - Maintainer track
14:40
14:40
20min
Chalk’it: an open-source framework for rapid web applications
Mongi BEN GAID

Chalk'it is an open-source framework that transforms Python scripts into distributable web app dashboards. It utilizes drag-and-drop widgets to establish an interface linked to a dataflow connecting Python code and various data sources. Chalk'it supports multiple Python graphics libraries, including Plotly, Matplotlib and Folium for interactive mapping and visualization. The framework operates entirely in web browsers using Pyodide. In our presentation, we will showcase Chalk'it, emphasizing its primary features, software architecture, and key applications, with a special focus on geospatial data visualization.

Data Science and Visualisation
HS 120
14:40
20min
Estimagic: A library that enables scientists and engineers to solve challenging numerical optimization problems
Janos Gabler

estimagic is a Python package for nonlinear optimization with or without constraints. It is particularly suited to solving difficult nonlinear estimation problems. On top, it provides functionality to perform statistical inference on estimated parameters.

In this presentation, we give a tour through estimagic's most notable features and explain its position in the ecosystem of Python libraries for numerical optimization.

Community, Education, and Outreach
Aula
15:00
15:00
30min
Break
Aula
15:00
30min
Break
HS 120
15:30
15:30
60min
Posters Spotlight + Lightning talks Day 1
Aula
16:30
16:30
90min
Poster session
Aula
08:30
08:30
30min
Welcome coffee
Aula
09:00
09:00
60min
Keynote on polars
Ritchie Vink

Polars is the "relatively" new fast dataframe implementation that redefines what DataFrames are able to do on a single machine, both in regard to performance and dataset size.
In this talk, we will dive into polars and see what makes them so efficient. It will touch on technologies like Arrow, Rust, parallelism, data structures, query optimization and more.

Aula
10:00
10:00
30min
Break
Aula
10:00
30min
Break
HS 120
10:30
10:30
20min
Build Drug Discovery web applications with PyScript, Ketcher and rdkit
Nikita Churikov

So you don't know JavaScript but know how to use python? Do you want to build an app where you can draw molecules for some application like properties prediction? Then come to this talk where I'll show you how to use Ketcher, EPAM tool for small molecule drawing, PyScirpt and rdkit for your next drug discovery app.

Scientific Applications
HS 120
10:30
20min
From Implementation to Ecosystem: The Journey of Zarr
Jonathan Striebel

Zarr is an API and cloud-optimized data storage format for large, N-dimensional, typed arrays, based on an open-source technical specification. In the last 4 years it grew from a Python implementation to a large ecosystem. In this talk, we want to share how this transformation happened and our lessons learned from this journey. Today, Zarr is driven by an active community, defined by an extensible specification, has implementations in C++, C, Java, Javascript, Julia, and Python, and is used across domains such as Geospatial, Bio-imaging, Genomics and other Data Science domains.

Community, Education, and Outreach
Aula
10:30
90min
Interoperability in the Scientific Python Ecosystem
Tim Head, Mridul Seth, Olivier Grisel, Franck Charras, Sebastian Berg, Joris Van den Bossche

This slot will cover the effort regarding interoperability in the scientific Python ecosystem. Topics:

  • Using the Array API for array-producing and array-consuming libraries
  • DataFrame interchange and namespace APIs
  • Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem
  • Entry Points: Enabling backends and plugins for your libraries

Using the Array API for array-producing and array-consuming libraries

Already using the Array API or wondering if you should in a project you maintain? Join this maintainer track session to share your experience and exchange knowledge and tips around building array libraries that implement the standard or libraries that consume arrays.

DataFrame-agnostic code using the DataFrame API standard

The DataFrame Standard provides you with a minimal, strict, and predictable API, to write code that will work regardless of whether the caller uses pandas, polars, or some other library.

DataFrame Interchange protocol and Apache Arrow

The DataFrame interchange protocol and Arrow C Data interface are two ways to interchange data between dataframe libraries. What are the challenges and requirements that maintainers encounter when integrating this into consuming libraries?

Entry Points: Enabling backends and plugins for your libraries

In this talk, we will discuss how NetworkX used entry points to enable more efficient computation backends to plug into NetworkX

Scientific Applications
HS 119 - Maintainer track
10:55
10:55
20min
Building divserve open source communities - learnings from PyLadies Berlin’s monthly open source hack nights
Maren Westermann

Today state of the art scientific research as well as industrial software development strongly depend on open source libraries. The demographic of the contributors to these libraries is predominantly white and male. In order to increase participation of groups who have been historically underrepresented in this domain PyLadies Berlin, a volunteer run community group focussed on helping marginalised people to professionally establish themselves in tech, has been running hands on monthly open source hack nights for more than a year. After some initial challenges the initiative yielded encouraging results. This talk summarises the learnings and teaches how they can be applied in the wider open source community.

Community, Education, and Outreach
Aula
10:55
20min
Where is the flock? The use of graph neural networks for bird identification with meteorological radar.
Olga Lyashevska, Abel Soares Siqueira

In this project we generate tools to identify birds within the spatial extent of a meteorological radar. Using the opportunities created by modern dual-polarization radars we build graph neural networks to identify bird flocks. For this, the original point cloud data is converted to multiple undirected graphs following a set of predefined rules, which are then used as an input in graph convolutional neural network (Kipf and Welling, 2017, https://doi.org/10.48550/arXiv.1609.02907). Each node has a set of features such as range, x, y, z coordinates and several radar specific parameters e.g. differential reflectivity and phase shift which are used to build model and conduct graph-level classification. This tool will alleviate problem of manual identification and labelling which is tedious and time intensive. Going forward we also focus on using the temporal information in the radar data. Repeated radar measurements enable us to track these movements across space and time. This makes it possible for regional movement studies to bridge the methodological gap between fine-scale, individual-based tracking studies and continental-scale monitoring of bird migration. In particular, it enables novel studies of the roles of habitat, topography and environmental stressors on movements that are not feasible with current methodology. Ultimately, we want to apply the methodology to data from continental radar networks to study movement across scales.

Machine and Deep Learning
HS 120
11:20
11:20
20min
Exploring Geospatial data for Machine Learning using Google Earth Engine: An introduction
Duarte O.Carmo

Have you ever wondered what type of data you can get about a certain location on the globe? What if I told you that you can access an enormous amount of information while sitting right there at your laptop? In this talk, I'll show you how to use Google Earth Engine to enrich your dataset. Either your exploring, or planning your next ML project, Geospatial data can provide you with a lot of information you did not know you had access to. Let me show you how!

Scientific Applications
Aula
11:45
11:45
20min
Content-based recommendation-system for the examples in sphinx-gallery
Arturo Amor

The gallery of your project might group the examples by module, by use case, or some other logic. But as examples grow in complexity, they may be relevant for several groups. In this talk we discuss some possible solutions and their drawbacks to motivate the introduction of a new feature to sphinx-gallery: a content-based recommendation system.

Community, Education, and Outreach
HS 120
11:45
20min
Deploying multi-GPU workloads on Kubernetes in Python
Jacob Tomlinson

By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk, we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.

High Performance Computing
Aula
12:05
12:05
85min
Lunch
Aula
12:05
85min
Lunch
HS 120
13:30
13:30
30min
(in)Complete introduction to AI Safety
Michele "Ubik" De Simoni

AI is poised to be "Our final invention," either the key to a never-ending utopia or a direct road to dystopia (or apocalypse). Even without the eschatological framing, it's still a revolutionary technology increasingly embedded in every aspect of our life, from smartphones to smart cities, from autonomous agents to autonomous weapons. In the face of acceleration, there can be no delay: if we want AI to shape a better tomorrow, we must discuss safety today.

Machine and Deep Learning
Aula
13:30
30min
My foray from Scientific Python into the Pyodide / WebAssembly universe
Loïc Estève

Pyodide is a Python distribution for the browser and Node.js based on WebAssembly / Emscripten.
Pyodide supports most commonly used scientific Python packages, like numpy, scipy, scikit-learn, matplotlib and there is growing interest to use it for improving package documentation through interactivity.

In this talk we will describe the work we have done in the past nine months to improve the state of Pyodide in a scientific Python context, namely:
- running the scikit-learn and scipy test suites with Node.js to get a view of what currently works, what does not, and what can be hopefully be fixed one day
- packaging OpenBLAS in Pyodide and use it for Pyodide scipy package to improve its stability, maintainability and performance
- adding JupyterLite functionality to sphinx-gallery, which is used for example galleries of popular scientific Python package like scikit-learn, matplotlib, scikit-image, etc ...
- adding the sphinx-gallery Jupyterlite functionality for scikit-learn example gallery

We will also mention some of the Pyodide sharp bits and conclude with some of the ideas we have to use it even more widely.

Data Science and Visualisation
HS 120
14:05
14:05
30min
Let’s exploit pickle, and `skops` to the rescue!
Adrin Jalali

Pickle files can be evil and simply loading them can run arbitrary code on your system. This talk presents why that is, how it can be exploited, and how skops is tackling the issue for scikit-learn/statistical ML models. We go through some lower level pickle related machinery, and go in detail how the new format works.

Machine and Deep Learning
Aula
14:05
30min
MyST & Thebe: Community-driven tools for awesome open science communication with Jupyter[lite] backed computation
Steve Purves, Rowan Cockett

Imagine a world where there are tools allowing any researcher to easily produce high quality scientific websites. Where it's trivial to include rich interactive figures that connect to Jupyter servers or run in-browser with WASM & pyodide, all from a local folder of markdown files and Jupyter notebooks.

We introduce MyST Markdown (https://mystmd.org/), a set of open-source, community-driven tools designed for open scientific communication.

It's a powerful authoring framework that supports blogs, online books, scientific papers, preprints, reports and journals articles. It includes thebe a minimal connector library for Jupyter, and thebe-lite that bundles a JupyterLite server with pyodide into any web page for in-browser python. It also provides publication-ready tex and pdf generation from the same content base, minimising the rework of publishing to the web and traditional services.

Community, Education, and Outreach
HS 120
14:40
14:40
20min
Python versioning in a changing world
Wolf Vollprecht

Python versioning is a critical aspect of maintaining a consistent ecosystem of packages, yet it can be challenging to get right. In this talk, we will explore the difficulties of Python versioning, including the need for upper bounds, and discuss mitigation strategies such as lockfiles in the Python packaging ecosystem (pip, poetry, and conda / mamba). We will also highlight a new community effort to analyze Python libraries dynamically and statically to detect the symbols (or libraries) they are using. By analyzing symbol usage, we can predict when package combinations will start breaking with each other, achieving a high rate of correct predictions. Our goal is to gather more community inputs to create a robust compatibility matrix. Additionally, we are doing similar work in C/C++ using libabigail to address ABI problems.

Scientific Applications
Aula
14:40
20min
Transformations in Three Dimensions
Alexander Fabisch

Rigid transformation in 3D are complicated due to the multitude of different conventions and because they often form complex graphs that are difficult to manage. In this talk I will give a brief introduction to the topic and present the library pytransform3d as a set of tools that can help you to tame the complexity. Throughout the talk I will use examples from robotics (imitation learning, collision detection, state estimation, kinematics) to motivate the discussed features, even though presented solutions are useful beyond robotics.

Scientific Applications
HS 120
15:00
15:00
30min
Break
Aula
15:00
30min
Break
HS 120
15:30
15:30
30min
Exploring GPU-powered backends for scikit-learn
Franck Charras, Olivier Grisel

Could scikit-learn future be GPU-powered ? This talk will discuss the performance improvements that GPU computing could bring to existing scikit-learn algorithms, and will describe a plugin-based design that is being foresighted to open-up scikit-learn compatibility to faster compute backends, with special concern for user-friendliness, ease of installation, and interoperability.

High Performance Computing
Aula
15:30
30min
Incidents management using Hawkes processes and other Tech AIOps projects in ING
Arkadiusz Trawiński, Joost Göbbels

In this talk, we will discuss incident management using Hawkes processes within an IT infrastructure. We show how a model previously applied for earthquake predictions can help answer the question ‘what caused what’ in a major European bank.

Data Science and Visualisation
HS 120
15:30
45min
The Graphic Server Protocol, a joint effort to facilitate the interoperability of Python scientific visualization libraries
Nicolas Rougier

The graphic server protocol is a proposal to mutualize efforts across scientific visualization libraries, languages and platforms such as to provide a unified intermediate-level protocol to render graphical primitives independently of the specifics of the high-level visualization interfaces.

Data Science and Visualisation
HS 119 - Maintainer track
16:05
16:05
30min
Scaling pandas to any size with PySpark
Hyukjin Kwon, Allan Folting

This talk discusses using the pandas API on Apache Spark to handle big data, and the introduction of Pandas Function APIs. Presented by an Apache Spark committer and a product manager, it offers technical and managerial insights.

High Performance Computing
Aula
16:05
30min
The Helmholtz Analytics Toolkit (Heat) and its role in the landscape of massively-parallel scientific Python
Fabian Hoppe

Handling and analyzing massive data sets is highly important for the vast majority of research communities, but it is also challenging, especially for those communities without a background in high-performance computing (HPC). The Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python, targeting the usage by non-experts in HPC.

In this presentation, we will provide an overview of Heat's current features and capabilities and discuss its role in the ecosystem of distributed array computing and machine learning in Python.

High Performance Computing
HS 120
16:15
16:15
20min
Model Documentation: The Keystone towards Inclusivity and Accessibility
Ezi Ozoani

The use of AI documentation such as repository cards (model and dataset cards), as a means of transparently discussing ethical and inclusive problems that could be found within the outputs and/or during the creation of AI artefacts, with the aim of inclusivity, fairness and accountability, has increasingly become part of the ML discourse. As limitations and risks centred documentation approaches have become more standard and anticipated with launches of new development e.g Chatgpt/GPT-4 system card and other LLM model cards.

This talk highlights the inclusive approaches that the broader open source community could explore when thinking about their aims when creating documentation.

Community, Education, and Outreach
HS 119 - Maintainer track
16:40
16:40
60min
Sprints Orientation + Lightning Talks Day 2
Aula
17:40
17:40
20min
Closing
Aula
No sessions on Friday, Aug. 18, 2023.