“Exceeding Classical: Probabilistic Data Structures in Data Intensive Applications” Andrii Gakhov · Talk (long) (30 minutes)

We interact with an increasing amount of data but classical data structures and algorithms can't fit our requirements anymore. This talk is to present the probabilistic algorithms and data structures and describe the main areas of their applications.


“A practical guide towards algorithmic bias and explainability in machine learning” Alejandro Saucedo · Talk (long) (30 minutes)

Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company


“CFFI, Ctypes, Cython, Cppyy: how to run C code from Python” Matti Picus · Tutorial (90 minutes)

Python is flexible, C and C++ are fast. How to use them together? There are many ways to call C code from Python, we will learn about the major ones, find out when you would prefer to use one over the other.


“Inside NumPy: preparing for the next decade” Matti Picus · Talk (long) (30 minutes)

Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know?


“VeloxChem: Python meets quantum chemistry and HPC” Olav Vahtras · Talk (long) (30 minutes)

A new and efficient Python/C++ modular library for real and complex response functions at the
level of Kohn-Sham density functional theory


“Tracking migration flows with geolocated Twitter data” Antònia Tugores · Talk (long) (30 minutes)

Detect migration flows worldwide using geolocated Twitter data: routes, settlement areas, mobility to more than one country, spatial integration in cities, etc.


“scikit-fdiff, a new tool for PDE solving” Nicolas Cellier · Poster (90 minutes)

Scikit-fdiff (formally Triflow) has been developed in order to facilitate mathematic models building. It has been made to quickly build and try many asymptotic falling film modelling with different phenomena coupling (energy and mass transfer).


“emzed: a Python based framework for analysis of mass-spectrometry data” Uwe Schmitt · Talk (long) (30 minutes)

This talk is about emzed, a Python library to support biologists with little programming knowledge to implement ad-hoc analyses as well as workflows for mass-spectrometry data.


“Visual Diagnostics at Scale” Dr. Rebecca Bilbro · Talk (long) (30 minutes)

Machine learning is a search for the best combination of features, model, and hyperparameters. But as data grow, so does the search space! Fortunately, visual diagnostics can focus our search and allow us to steer modeling purposefully, and at scale.


“Parallelizing Python applications with PyCOMPSs” Javier Conejero · Tutorial (90 minutes)

PyCOMPSs is a task-based programming model that enables the parallel execution of Python scripts by annotating methods with task decorators. At run time, it identifies tasks' data-dependencies, schedules and executes them in distributed environments.


“QuTiP: the quantum toolbox in Python as an ecosystem for quantum physics exploration and quantum information science” Nathan Shammah, Alexander Pitchford · Talk (long) (30 minutes)

In this talk you will learn how QuTiP, the quantum toolbox in Python (http://qutip.org), has emerged from a library to an ecosystem. QuTiP is used for education, to teach quantum physics. In research and industry, for quantum computing simulation.


“Matrix calculus with SymPy” Francesco Bonazzi · Talk (long) (30 minutes)

In this talk we explore a recent addition to SymPy which allows to find closed-form solutions to matrix derivatives. As a consequence, generation of efficient code for optimization problems is now much easier.


“Deep Learning for Understanding Human Multi-modal Behavior” Ricardo Manhães Savii · Talk (15 minutes)

Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks.


“Recent advances in python parallel computing” Pierre Glaser · Talk (long) (30 minutes)

Modern hardware is multi-core. It is crucial for Python to provide
efficient parallelism. This talk exposes the current state and advances
in Python parallelism, in order to help practitioners and developers take
better decisions on this matter.


“Hands-on TensorFlow 2.0” Josh Gordon · Tutorial (90 minutes)

A hands-on introduction to TensorFlow 2.0 at an intermediate difficulty level, with code examples for Deep Dream, Style Transfer, and Image Colorization.


“Performing Quantum Measurements in QuTiP” Simon Cross · Tutorial (90 minutes)

Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you!

No previous quantum mechanics experience required!


“Building data pipelines in Python: Airflow vs scripts soup” Dr. Tania Allard · Tutorial (90 minutes)

In this workshop, you will learn how to migrate from ‘scripts soups’ (a set of scripts that should be run in a particular order) to robust, reproducible and easy-to-schedule data pipelines in Airflow.


“Getting Started with JupyterLab” Mike Müller · Tutorial (90 minutes)

JupyterLab is used for essentially all other tutorials at EuroSciPy. This tutorial gives an overview over the basic functionality and shows how to use some of the many tools it provides to simplify your Python programming workflow.


“The Magic of Neural Embeddings with TensorFlow 2” Oliver Zeigermann · Talk (long) (30 minutes)

Neural Embeddings are a powerful tool of turning categorical into numerical values. Given reasonable training data semantics present in the categories can be preserved in the numerical representation.


“Make your Python code fly at transonic speeds!” Pierre Augier · Talk (15 minutes)

Transonic is a new pure Python package to easily accelerate modern Python-Numpy code with different accelerators (like Cython, Pythran, Numba, Cupy, etc...).


“Distributed GPU Computing with Dask” Peter Andreas Entschev · Talk (long) (30 minutes)

Dask has evolved over the last year to leverage multi-GPU computing alongside its existing CPU support. We present how this is possible with the use of NumPy-like libraries and how to get started writing distributed GPU software.


“Reproducible Data Science in Python” Chandrasekhar Ramakrishnan, Rok Roškar · Tutorial (90 minutes)

In this tutorial, we will take a detailed look at the concept of reproducibility, survey the landscape of existing solutions, and, using one solution in particular, Renku, we will do some hands-on work.


“Introduction to scikit-learn: from model fitting to model interpretation” Guillaume Lemaitre, Olivier Grisel · Tutorial (90 minutes)

We will present scikit-learn by focusing on the available tools used to train a machine-learning model. Then, we will focus on the challenge linked to model interpretation and the available tools to understand these models.


“Caterva: A Compressed And Multidimensional Container For Big Data” Francesc Alted · Talk (long) (30 minutes)

Caterva is a library on top of the Blosc2 compressor that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk.


“High performance machine learning with dislib” Javier Álvarez · Talk (15 minutes)

This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.


“From Modeler to Programmer” Dr. Mike Müller · Poster (90 minutes)

The modeling system ueflow allows for customable, dynamic boundary conditions.
The modeler can write Python plugins to implement the behavior of these boundary conditions.


“Understanding Numba” Valentin Haenel · Talk (long) (30 minutes)

In this talk I will take you on a whirlwind tour of Numba and you will be quipped with a mental model of how Numba works and what it is good at. At the end, you will be able to decide if Numba could be useful for you.


“Best Coding Practices in Jupyterlab” Alexander CS Hendorf · Talk (15 minutes)

Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. In this talks I will present some best practices to make code more readable, better to maintain and re-usable.


“pystencils: Speeding up stencil computations on CPUs and GPUs” Martin Bauer · Talk (15 minutes)

pystencils speeds up stencil computations on numpy arrays using a sympy-based high level description, that is compiled into optimized C code.


“The Rapid Analytics and Model Prototyping (RAMP) framework: tools for collaborative data science challenges” Guillaume Lemaitre, Joris Van den Bossche · Talk (15 minutes)

The RAMP (Rapid Analytics and Model Prototyping) framework provides a platform to organize reproducible and transparent data challenges. We will present the different framework bricks.


“Introduction to geospatial data analysis with GeoPandas and the PyData stack” Joris Van den Bossche · Tutorial (90 minutes)

This tutorial is an introduction to geospatial data analysis, with a focus on tabular vector data using GeoPandas. It will show how GeoPandas and related libraries can improve your GIS workflow and fit nicely in the traditional PyData stack.


“Scientific DevOps: Designing Reproducible Data Analysis Pipelines with Containerized Workflow Managers” Nicholas Del Grosso · Talk (long) (30 minutes)

A review of DevOps tools as applied to data analysis pipelines, including workflow managers, software containers, testing frameworks, and online repositories for performing reproducible science that scales.


“Enhancing & re-designing the QGIS user interface – a deep dive” Sebastian M. Ernst · Talk (15 minutes)

How can one of the largest code bases in open source Geographical Information Science – QGIS – be enhanced and re-designed? Through the powers of Python plugins. This talk demonstrates concepts on how to make QGIS more user-friendly.


“Modern Data Science: A new approach to DataFrames and pipelines” Jovan Veljanoski, Maarten Breddels · Talk (long) (30 minutes)

We will demonstrate how to explore and analyse massive datasets (>150GB) on a laptop with the Vaex library in Python. Using computational graphs, efficient algorithms and storage (Apache Arrow / hdf5) Vaex can easily handle up to a billion rows.


“MNE-Python, a toolkit for neurophysiological data” Joan Massich · Poster (90 minutes)

A summary of the MNE-Python changes introduced during the two last releases and highlights for future directions.


“Apache Arrow: a cross-language development platform for in-memory data” Joris Van den Bossche · Talk (long) (30 minutes)

Apache Arrow, defining a columnar, in-memory data format standard and communication protocols, provides a cross-language development platform with already several applications in the PyData ecosystem.


“kESI - a kernel-based method for reconstruction of sources of brain electric activity in realistic brain geometries” Jakub M. Dzik, Marta Kowalska · Poster (90 minutes)

kESI is a new Python package for kernel-based reconstruction of brain electric activity from recorded electric field potentials using realistic assumptions about brain geometry and conductivity.


“ToFu - an open-source python/cython library for synthetic tomography diagnostics on Tokamaks” Laura Mendoza, Didier VEZINET · Talk (long) (30 minutes)

We present an open-source parallelized and cythonized python library, ToFu, for modeling tomography diagnostics on nuclear fusion reactors. Its functionalities (with realistic examples), its architecture and its design will be shown.


“Really reproducible behavioural paper” Jakub M. Dzik · Poster (90 minutes)

A heavily XKCD themed poster about writing a really reproducible behavioural paper in Python environment.
The poster is also available online.


“kCSD - a Python package for reconstruction of brain activity” Marta Kowalska, Jakub M. Dzik · Tutorial (90 minutes)

kCSD is a Python package for localization of sources of brain electric activity based on recorded electric potentials.


“High Voltage Lab Common Code Basis library: a uniform user-friendly object-oriented API for a high voltage engineering research.” Mikołaj Rybiński · Talk (15 minutes)

The library leverages Python richness to provide a uniform user-friendly API for a zoo of industrial communication protocols used to control high voltage engineering devices, together with abstraction and implementations for such devices.


“How to process hyperspectral data from a prototype imager using Python” Matti Eskelinen · Talk (15 minutes)

We present a collection of software for handling hyperspectral data acquisition and preprocessing fully in Python utilising Xarray for metadata preservation from start to finish.


“Speed up your python code” Jérémie du Boisberranger · Tutorial (90 minutes)

In this tutorial we will see how to profile and speed up Python code, from a pure Python implementation to an optimized Cython code.


“vtext: fast text processing in Python using Rust” Roman Yurchak · Talk (15 minutes)

In this talk, we present some of the benefits of writing extensions for Python in Rust. We then illustrate this approach on the vtext project, that aims to be a high-performance library for text processing.


“Create CUDA kernels from Python using Numba and CuPy.” Valentin Haenel · Tutorial (90 minutes)

We'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library.


“Histogram-based Gradient Boosting in scikit-learn 0.21” Olivier Grisel · Talk (long) (30 minutes)

In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees.


“Controlling a confounding effect in predictive analysis.” Darya Chyzhyk · Talk (15 minutes)

Confounding effects are often present in observational data: the effect or association studied is observed jointly with other effects that are not desired.


“PyFETI - An easy and massively Dual Domain Decomposition Solver for Python” Guilherme Jenovencio · Talk (15 minutes)

PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver using Domain Decomposition method, where problems are solved locally by Direct Solver and at the interface iteratively.


“Lessons learned from comparing Numba-CUDA and C-CUDA” Lena Oden · Talk (15 minutes)

We compared the performance of GPU-Applications written in C-CUDA and Numba-CUDA. By analyzing the GPU assembly code, we learned about the reasons for the differences. This helped us to optimize our codes written in NUMBA-CUDA and NUMBA itself.


“Driving a 30m Radio Telescope with Python” Francesco Pierfederici · Talk (long) (30 minutes)

The IRAM 30m radio telescope is one of the best in the world. The telescope control software, monitoring, data archiving as well as some of the data processing code is written in Python. We will describe how and why Python is used at the telescope.


“TelApy a Python module to compute free surface flows and sediments transport in geosciences” yoann audouin · Talk (15 minutes)

TelApy a Python module to compute free surface flows and sediments transport in geosciences and examples of how it is used to inter-operate with other Python libraries for Uncertainty Quantification, Optimization, Reduced Order Model.


“Deep Diving into GANs: From Theory to Production with TensorFlow 2.0” Michele "Ubik" De Simoni, Paolo Galeone, Federico Di Mattia, Emanuele Ghelfi · Tutorial (90 minutes)

GANs are one of the hottest topics in the ML arena; however, they present a challenge for the researchers and the engineers alike. This workshop will guide you through both the theory and the code needed to build a GAN and put into production.


“Sufficiently Advanced Testing with Hypothesis” Zac Hatfield-Dodds · Talk (long) (30 minutes)

Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes.


“Sufficiently Advanced Testing with Hypothesis” Zac Hatfield-Dodds · Tutorial (90 minutes)

Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes.


“PSYDAC: a parallel finite element solver with automatic code generation” Yaman Güçlü · Talk (15 minutes)

PSYDAC takes input from SymPDE (a SymPy extension for partial differential equations), applies a finite-element discretization, generates MPI-parallel code, and accelerates it with Numba, Pythran, or Pyccel. We present design, usage and performance.


“What about tests in Machine Learning projects?” Sarah Diot-Girard · Talk (long) (30 minutes)

Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits?


“Dashboarding with Jupyter notebooks, voila and widgets” Maarten Breddels, Martin Renou · Talk (long) (30 minutes)

Turn your Jupyter notebook into a beautiful modern React or Vue based dashboard using voila and Jupyter widgets.


“Constrained Data Synthesis” Nick Radcliffe · Talk (long) (30 minutes)

We introduce a method for creating synthetic data "to order" based on learned (or provided) constraints and data classifications. This includes "good" and "bad" data.


“Effectively using matplotlib” Tim Hoffmann · Tutorial (90 minutes)

It can sometimes be difficult and frustrating to know how to achieve a desired plot. – Have you made this experience as well? Then this tutorial is for you. It will make you more effective and help you generate better looking plots.


“Can we make Python fast without sacrificing readability? numba for Astrodynamics” Juan Luis Cano Rodríguez · Talk (15 minutes)

There are several solutions to make Python faster, and choosing one is not easy: we would want it to be fast without sacrificing its readability and high-level nature. We tried to do it for an Astrodynamics library using numba. How did it turn out?


“PyPy meets SciPy” Ronan Lamy · Talk (long) (30 minutes)

PyPy, the fast and compliant alternative implementation of Python, is now compatible with the SciPy ecosystem. We'll explore how scientific programmers can use it.


“PhonoLAMMPS: Phonopy with LAMMPS made easy” Abel Carreras · Poster (90 minutes)

PhonoLAMMPS is a Phonopy interface with LAMMPS that allows to calculate the interatomic force constants and other phonon properties from a usual LAMMPS input file.


“A Tour of the Data Visualization Ecosystem of Python” Giovanni De Gasperis · Tutorial (90 minutes)

The tutorial will be a a tour of the getting-started how-tos of the major Python data visualization libraries such as Yt-Project, Seaborn, Altair, Plotly


“Never get in a battle of bits without ammunition” Valerio Maggio · Tutorial (90 minutes)

The numpy package takes a central role in Python scientific ecosystem.
This is mainly because numpy code has been designed with
high performance in mind. This tutorial will introduce the main features of in numpy in 90 mins.


“High quality video experience using deep neural networks” Marco Bertini, Tiberio Uricchio · Talk (long) (30 minutes)

Video compression algorithms used to stream videos are lossy, and when compression rates increase they result in strong degradation of visual quality. We show how deep neural networks can eliminate compression artefacts and restore lost details.


“Modin: Scaling the Capabilities of the Data Scientist, not the machine” Devin Petersohn, Devin Petersohn · Talk (15 minutes)

Modern data systems tend to heavily focus on optimizing for the system’s time. In this talk, we discuss the design of Modin, a DataFrame library, and how to optimize for the human system.


“Introduction to SciPy” Gert-Ludwig Ingold · Tutorial (90 minutes)

SciPy is a comprehensive library for scientific computing and one of the central components of the scientific Python ecosystem. As most of its functionality naturally involves NumPy arrays, SciPy works hand in hand with NumPy.


“HPC and Python: Intel’s work in enabling the scientific computing community” David Liu · Keynote (45 minutes)

High Performance Computing (HPC) has been a pillar of the scientific community for years, with many in the Python community contributing to its continued development. However, one of the fundamental links in performance is the relationship between h


“From Galaxies to Brains! - Image processing with Python” Samuel FARRENS · Keynote (45 minutes)

From the smallest microscopic objects to the largest scales of the Universe, our ability to study the world around us is predicated on the quality of the data we have access to.


“In the Shadow of the Black Hole” Sara Issaoun · Keynote (45 minutes)

I will walk through the entire Event Horizon Telescope experiment and the global effort that led to the first-ever direct image of a black hole revealed to the world on April 10th of this year.


“Introduction to pandas” Marc Garcia · Tutorial (90 minutes)

This tutorial is an introduction to pandas for people new to it. We will cover how to open datasets, perform some analysis, apply some transformations and visualize the data


“Astronomical Image Processing” Samuel FARRENS · Tutorial (90 minutes)

This tutorial will introduce the concept of sparsity and demonstrate how it can be used to remove noise from signals. These concepts will then be expanded to demonstrate how noise can be removed from astronomical images in particular.


“Data sciences in a polyglot world with xtensor and xframe” Sylvain Corlay, Wolf Vollprecht · Talk (long) (30 minutes)

The main scientific computing programming languages have different models the main data structures of data science such as dataframes and n-d arrays. In this talk, we present our approach to reconcile the data science tooling in this polyglot world.


“How a voice assistant works” Miren Urteaga Aldalur · Talk (long) (30 minutes)

This talk will focus on the technologies needed to build a voice assistant. It will keep as center point Samsung’s voice assistant Bixby, which is available in 8 languages across the world (5 EU languages) in a variety of Samsung mobile phones.


“3D image processing with scikit-image” Alexandre de Siqueira · Tutorial (90 minutes)

This tutorial will introduce how to analyze three dimensional stacked and volumetric images in Python, mainly using scikit-image.


“Debugging in JupyterLab” Jeremy Tuloup · Talk (15 minutes)

Debugging Jupyter Notebooks has been one of the most requested features. In this presentation we give an overview of the current state and tools for debugging in Jupyter, and offer a glimpse of what is coming next.


“Deep Learning without a PhD” Paige Bailey · Talk (long) (30 minutes)

In this talk, you'll learn how to transition from traditional machine learning tools, like scikit-learn, to deep learning with Keras, TensorFlow, and JAX. No prior experience with machine learning or with deep learning required, and no need to instal