Welcome to our schedule sneak peek!

We prepared a list of exciting talks, so you can get a feel for our conference. Please keep in mind that this is not our full schedule. We will follow up with the full schedule in time, stay tuned!

“Apache Arrow: a cross-language development platform for in-memory data”
Joris Van den Bossche; Talk (long)

Apache Arrow, defining a columnar, in-memory data format standard and communication protocols, provides a cross-language development platform with already several applications in the PyData ecosystem.


“A practical guide towards algorithmic bias and explainability in machine learning”
Alejandro Saucedo; Talk (long)

Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company


“Best Coding Practices in Jupyterlab”
Alexander CS Hendorf; Talk

Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. In this talks I will present some best practices to make code more readable, better to maintain and re-usable.


“Building data pipelines in Python: Airflow vs scripts soup”
Dr. Tania Allard; Tutorial

In this workshop, you will learn how to migrate from ‘scripts soups’ (a set of scripts that should be run in a particular order) to robust, reproducible and easy-to-schedule data pipelines in Airflow. First, we will learn how to write simple recurren


“Caterva: A Compressed And Multidimensional Container For Big Data”
Francesc Alted; Talk (long)

Caterva is a library on top of the Blosc2 compressor that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk.


“CFFI, Ctypes, Cython, Cppyy: how to run C code from Python”
Matti Picus; Tutorial

Python is flexible, C and C++ are fast. How to use them together? There are many ways to call C code from Python, we will learn about the major ones, find out when you would prefer to use one over the other.


“Constrained Data Synthesis”
Nick Radcliffe; Talk (long)

We introduce a method for creating synthetic data "to order" based on learned (or provided) constraints and data classifications. This includes "good" and "bad" data.


“Controlling a confounding effect in predictive analysis.”
Darya Chyzhyk; Talk

Confounding effects are often present in observational data: the effect or association studied is observed jointly with other effects that are not desired.


“Create CUDA kernels from Python using Numba and CuPy.”
Valentin Haenel; Tutorial

We'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library.


“Dashboarding with Jupyter notebooks, voila and widgets”
Maarten Breddels; Talk (long)

Turn your Jupyter notebook into a beautiful modern React or Vue based dashboard using voila and Jupyter widgets.


“Deep Diving into GANs: From Theory to Production with TensorFlow 2.0”
Paolo Galeone, Michele "Ubik" De Simoni; Tutorial

GANs are one of the hottest topics in the ML arena; however, they present a challenge for the researchers and the engineers alike. This workshop will guide you through both the theory and the code needed to build a GAN and put into production.


“Deep Learning for Understanding Human Multi-modal Behavior”
Ricardo Manhães Savii; Talk (long)

Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks.


“Distributed GPU Computing with Dask”
Peter Andreas Entschev; Talk (long)

Dask has evolved over the last year to leverage multi-GPU computing alongside its existing CPU support. We present how this is possible with the use of NumPy-like libraries and how to get started writing distributed GPU software.


“Driving a 30m Radio Telescope with Python”
Francesco Pierfederici; Talk (long)

The IRAM 30m radio telescope is one of the best in the world. The telescope control software, monitoring, data archiving as well as some of the data processing code is written in Python. We will describe how and why Python is used at the telescope.


“Effectively using matplotlib”
Tim Hoffmann; Tutorial

It can sometimes be difficult and frustrating to know how to achieve a desired plot. – Have you made this experience as well? Then this tutorial is for you. It will make you more effective and help you generate better looking plots.


“emzed: a Python based framework for analysis of mass-spectrometry data”
Uwe Schmitt; Talk (long)

This talk is about emzed, a Python library to support biologists with little programming knowledge to implement ad-hoc analyses as well as workflows for mass-spectrometry data.


“Enhancing & re-designing the QGIS user interface – a deep dive”
Sebastian Ernst; Talk (long)

How can one of the largest code bases in open source Geographical Information Science – QGIS – be enhanced and re-designed? Through the powers of Python plugins. This talk demonstrates concepts on how to make QGIS more user-friendly.


“Environmental Research and Citizen Science using fractaL”
Saulo Jacques; Talk

A tool for ecological research, environmental education and digital literacy. The aim of _fractaL_ is to bring the ecological studies to a broad audience using an intuitive approach through synesthetic methods associated to a robust scientific base.


“Exceeding Classical: Probabilistic Data Structures in Data Intensive Applications”
Andrii Gakhov; Talk (long)

We interact with an increasing amount of data but classical data structures and algorithms can't fit our requirements anymore. This talk is to present the probabilistic algorithms and data structures and describe the main areas of their applications.


“From Modeler to Programmer”
Dr. Mike Müller; Poster

The modeling system ueflow allows for customable, dynamic boundary conditions. The modeler can write Python plugins to implement the behavior of these boundary conditions.


“Get Started with Variational Inference using Python”
Suriyadeepan Ramamoorthy; Talk (long)

The objective is to help the audience understand the mechanics of Variational Inference, by implementing it in python. We will build inference algorithms including Coordinate Ascent VI, Black Box VI and Automatic Differentiation VI using PyTorch.


“Getting Started with JupyterLab”
Mike Müller; Tutorial

JupyterLab is used for essentially all other tutorials at EuroSciPy. This tutorial gives an overview over the basic functionality and shows how to use some of the many tools it provides to simplify your Python programming workflow.


“High performance machine learning with dislib”
Javier Álvarez; Talk

This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.


“High Voltage Lab Common Code Basis library: a uniform user-friendly object-oriented API for a high voltage engineering research.”
Mikołaj Rybiński; Talk

The library leverages Python richness to provide a uniform user-friendly API for a zoo of industrial communication protocols used to control high voltage engineering devices, together with abstraction and implementations for such devices.


“Histogram-based Gradient Boosting in scikit-learn 0.21”
Olivier Grisel; Talk (long)

In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees.


“How to process hyperspectral data from a prototype imager using Python”
Matti Eskelinen; Talk (long)

We present a collection of software for handling hyperspectral data acquisition and preprocessing fully in Python utilising Xarray for metadata preservation from start to finish.


“Inside NumPy: preparing for the next decade”
Matti Picus; Talk (long)

Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know?


“Introduction to geospatial data analysis with GeoPandas and the PyData stack”
Joris Van den Bossche; Tutorial

This tutorial is an introduction to geospatial data analysis, with a focus on tabular vector data using GeoPandas. It will show how GeoPandas and related libraries can improve your GIS workflow and fit nicely in the traditional PyData stack.


“Introduction to scikit-learn: from model fitting to model interpretation”
Olivier Grisel, Guillaume Lemaitre; Tutorial

We will present scikit-learn by focusing on the available tools used to train a machine-learning model. Then, we will focus on the challenge linked to model interpretation and the available tools to understand these models.


“Introduction to TensorFlow 2.0”
Brad Miro; Talk (long)

Learn about the updates being made to TensorFlow in its 2.0 version. We’ll give an overview of what’s available in the new version as well as do a deep dive into an example using its central high-level API, Keras.


“kCSD - a Python package for reconstruction of brain activity”
Jakub M. Dzik, Marta Kowalska; Tutorial

_kCSD_ is a Python package for localization of sources of brain electric activity based on recorded electric potentials.


“Lessons learned from comparing Numba-CUDA and C-CUDA”
Lena Oden; Talk

We compared the performance of GPU-Applications written in C-CUDA and Numba-CUDA. By analyzing the GPU assembly code, we learned about the reasons for the differences. This helped us to optimize our codes written in NUMBA-CUDA and NUMBA itself.


“Matrix calculus with SymPy”
Francesco Bonazzi; Talk (long)

In this talk we explore a recent addition to SymPy which allows to find closed-form solutions to matrix derivatives. As a consequence, generation of efficient code for optimization problems is now much easier.


“MNE-Python, a toolkit for neurophysiological data”
Joan Massich; Poster

A summary of the MNE-Python changes introduced during the two last releases and highlights for future directions.


“Modern Data Science: A new approach to DataFrames and pipelines”
Maarten Breddels, Jovan Veljanoski; Talk (long)

We will demonstrate how to explore and analyse massive datasets (>150GB) on a laptop with the Vaex library in Python. Using computational graphs, efficient algorithms and storage (Apache Arrow / hdf5) Vaex can easily handle up to a billion rows.


“Parallelizing Python applications with PyCOMPSs”
Javier Conejero; Tutorial

PyCOMPSs is a task-based programming model that enables the parallel execution of Python scripts by annotating methods with task decorators. At run time, it identifies tasks' data-dependencies, schedules and executes them in distributed environments.


“Performing Quantum Measurements in QuTiP”
Simon Cross; Tutorial

Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you! No previous quantum mechanics experience required!


“PhonoLAMMPS: Phonopy with LAMMPS made easy”
Abel Carreras; Poster

PhonoLAMMPS is a Phonopy interface with LAMMPS that allows to calculate the interatomic force constants and other phonon properties from a usual LAMMPS input file.


“PSYDAC: a parallel finite element solver with automatic code generation”
Yaman Güçlü; Talk

PSYDAC takes input from SymPDE (a SymPy extension for partial differential equations), applies a finite-element discretization, generates MPI-parallel code, and accelerates it with Numba, Pythran, or Pyccel. We present design, usage and performance.


“PyFETI - An easy and massively Dual Domain Decomposition Solver for Python”
Guilherme Jenovencio; Talk

PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver using Domain Decomposition method, where problems are solved locally by Direct Solver and at the interface iteratively.


“PyPy meets SciPy”
Ronan Lamy; Talk (long)

PyPy, the fast and compliant alternative implementation of Python, is now compatible with the SciPy ecosystem. We'll explore how scientific programmers can use it.


“*pystencils*: Speeding up stencil computations on CPUs and GPUs”
Martin Bauer; Talk

[pystencils](https://i10git.cs.fau.de/pycodegen/pystencils) speeds up stencil computations on numpy arrays using a sympy-based high level description, that is compiled into optimized C code.


“PyTorch is not only for deep learning!”
Alexey Sizanov; Talk (long)

PyTorch is one of two major frameworks for deep learning. But in fact, it can be extremely useful in a wide range of problems not dealing with neural nets. We show how we use it in close conjunction with SciPy/NumPy in out daily work.


“QuTiP: the quantum toolbox in Python as an ecosystem for quantum physics exploration and quantum information science”
Alexander Pitchford, Nathan Shammah; Talk (long)

In this talk you will learn how QuTiP, the quantum toolbox in Python (http://qutip.org), has emerged from a library to an *ecosystem*. QuTiP is used for education, to teach quantum physics. In research and industry, for quantum computing simulation.


“Recent advances in python parallel computing”
Pierre Glaser; Talk (long)

*Modern hardware is multi-core*. It is crucial for Python to provide efficient parallelism. This talk exposes the current state and advances in Python parallelism, in order to help practitioners and developers take better decisions on this matter.


“Reproducible Data Science in Python”
Rok Roškar, Chandrasekhar Ramakrishnan; Tutorial

In this tutorial, we will take a detailed look at the concept of _reproducibility_, survey the landscape of existing solutions, and, using one solution in particular, [Renku](https://renkulab.io), we will do some hands-on work.


“Scientific DevOps: Designing Reproducible Data Analysis Pipelines with Containerized Workflow Managers”
Nicholas Del Grosso; Talk (long)

A review of DevOps tools as applied to data analysis pipelines, including workflow managers, software containers, testing frameworks, and online repositories for performing reproducible science that scales.


“scikit-fdiff, a new tool for PDE solving”
Nicolas Cellier; Poster

Scikit-fdiff (formally Triflow) has been developed in order to facilitate mathematic models building. It has been made to quickly build and try many asymptotic falling film modelling with different phenomena coupling (energy and mass transfer).


“Speed up your python code”
Jérémie du Boisberranger; Tutorial

In this tutorial we will see how to profile and speed up Python code, from a pure Python implementation to an optimized Cython code.


“Sufficiently Advanced Testing with Hypothesis”
Zac Hatfield-Dodds; Tutorial

Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes.


“Sufficiently Advanced Testing with Hypothesis”
Zac Hatfield-Dodds; Talk (long)

Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes.


“TelApy a Python module to compute free surface flows and sediments transport in geosciences”
yoann audouin; Talk

TelApy a Python module to compute free surface flows and sediments transport in geosciences and examples of how it is used to inter-operate with other Python libraries for Uncertainty Quantification, Optimization, Reduced Order Model.


“The Magic of Neural Embeddings with TensorFlow 2”
Oliver Zeigermann; Talk (long)

Neural Embeddings are a powerful tool of turning categorical into numerical values. Given reasonable training data semantics present in the categories can be preserved in the numerical representation.


“The Rapid Analytics and Model Prototyping (RAMP) framework: tools for collaborative data science challenges”
Joris Van den Bossche, Guillaume Lemaitre; Talk

The RAMP (Rapid Analytics and Model Prototyping) framework provides a platform to organize reproducible and transparent data challenges. We will present the different framework bricks.


“ToFu - an open-source python/cython library for synthetic tomography diagnostics on Tokamaks”
Didier VEZINET, Laura Mendoza; Talk (long)

We present an open-source parallelized and cythonized python library, ToFu, for modeling tomography diagnostics on nuclear fusion reactors. Its functionalities (with realistic examples), its architecture and its design will be shown.


“Understanding Numba”
Valentin Haenel; Talk (long)

In this talk I will take you on a whirlwind tour of Numba and you will be quipped with a mental model of how Numba works and what it is good at. At the end, you will be able to decide if Numba could be useful for you.


“VeloxChem: Python meets quantum chemistry and HPC”
Olav Vahtras; Talk (long)

A new and efficient Python/C++ modular library for real and complex response functions at the level of Kohn-Sham density functional theory


“Visual Diagnostics at Scale”
Dr. Rebecca Bilbro; Talk (long)

Machine learning is a search for the best combination of features, model, and hyperparameters. But as data grow, so does the search space! Fortunately, visual diagnostics can focus our search and allow us to steer modeling purposefully, and at scale.


“vtext: fast text processing in Python using Rust”
Roman Yurchak; Talk

In this talk, we present some of the benefits of writing extensions for Python in Rust. We then illustrate this approach on the [vtext](https://github.com/rth/vtext) project, that aims to be a high-performance library for text processing.


“What about tests in Machine Learning projects?”
Sarah Diot-Girard, Stephanie Bracaloni; Talk (long)

Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits?