This tutorial will introduce how to analyze three dimensional stacked and volumetric images in Python, mainly using scikit-image.
JupyterLab is used for essentially all other tutorials at EuroSciPy. This tutorial gives an overview over the basic functionality and shows how to use some of the many tools it provides to simplify your Python programming workflow.
A hands-on introduction to TensorFlow 2.0 at an intermediate difficulty level, with code examples for Deep Dream, Style Transfer, and Image Colorization.
GANs are one of the hottest topics in the ML arena; however, they present a challenge for the researchers and the engineers alike. This workshop will guide you through both the theory and the code needed to build a GAN and put into production.
The numpy
package takes a central role in Python scientific ecosystem.
This is mainly because numpy
code has been designed with
high performance in mind. This tutorial will introduce the main features of in numpy in 90
mins.
In this tutorial, we will take a detailed look at the concept of reproducibility, survey the landscape of existing solutions, and, using one solution in particular, Renku, we will do some hands-on work.
In this workshop, you will learn how to migrate from ‘scripts soups’ (a set of scripts that should be run in a particular order) to robust, reproducible and easy-to-schedule data pipelines in Airflow.
We'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library.
This tutorial is an introduction to pandas for people new to it. We will cover how to open datasets, perform some analysis, apply some transformations and visualize the data
Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you!
No previous quantum mechanics experience required!
In this tutorial we will see how to profile and speed up Python code, from a pure Python implementation to an optimized Cython code.
The tutorial will be a a tour of the getting-started how-tos of the major Python data visualization libraries such as Yt-Project, Seaborn, Altair, Plotly
This tutorial is an introduction to geospatial data analysis, with a focus on tabular vector data using GeoPandas. It will show how GeoPandas and related libraries can improve your GIS workflow and fit nicely in the traditional PyData stack.
Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes.
This tutorial will introduce the concept of sparsity and demonstrate how it can be used to remove noise from signals. These concepts will then be expanded to demonstrate how noise can be removed from astronomical images in particular.
It can sometimes be difficult and frustrating to know how to achieve a desired plot. – Have you made this experience as well? Then this tutorial is for you. It will make you more effective and help you generate better looking plots.
SciPy is a comprehensive library for scientific computing and one of the central components of the scientific Python ecosystem. As most of its functionality naturally involves NumPy arrays, SciPy works hand in hand with NumPy.
Python is flexible, C and C++ are fast. How to use them together? There are many ways to call C code from Python, we will learn about the major ones, find out when you would prefer to use one over the other.
We will present scikit-learn by focusing on the available tools used to train a machine-learning model. Then, we will focus on the challenge linked to model interpretation and the available tools to understand these models.
PyCOMPSs is a task-based programming model that enables the parallel execution of Python scripts by annotating methods with task decorators. At run time, it identifies tasks' data-dependencies, schedules and executes them in distributed environments.
kCSD is a Python package for localization of sources of brain electric activity based on recorded electric potentials.
Scikit-fdiff (formally Triflow) has been developed in order to facilitate mathematic models building. It has been made to quickly build and try many asymptotic falling film modelling with different phenomena coupling (energy and mass transfer).
From the smallest microscopic objects to the largest scales of the Universe, our ability to study the world around us is predicated on the quality of the data we have access to.
Dask has evolved over the last year to leverage multi-GPU computing alongside its existing CPU support. We present how this is possible with the use of NumPy-like libraries and how to get started writing distributed GPU software.
This talk will focus on the technologies needed to build a voice assistant. It will keep as center point Samsung’s voice assistant Bixby, which is available in 8 languages across the world (5 EU languages) in a variety of Samsung mobile phones.
Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes.
PhonoLAMMPS is a Phonopy interface with LAMMPS that allows to calculate the interatomic force constants and other phonon properties from a usual LAMMPS input file.
We will demonstrate how to explore and analyse massive datasets (>150GB) on a laptop with the Vaex library in Python. Using computational graphs, efficient algorithms and storage (Apache Arrow / hdf5) Vaex can easily handle up to a billion rows.
In this talk you will learn how QuTiP, the quantum toolbox in Python (http://qutip.org), has emerged from a library to an ecosystem. QuTiP is used for education, to teach quantum physics. In research and industry, for quantum computing simulation.
Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits?
A heavily XKCD themed poster about writing a really reproducible behavioural paper in Python environment.
The poster is also available online.
Apache Arrow, defining a columnar, in-memory data format standard and communication protocols, provides a cross-language development platform with already several applications in the PyData ecosystem.
We introduce a method for creating synthetic data "to order" based on learned (or provided) constraints and data classifications. This includes "good" and "bad" data.
A review of DevOps tools as applied to data analysis pipelines, including workflow managers, software containers, testing frameworks, and online repositories for performing reproducible science that scales.
kESI is a new Python package for kernel-based reconstruction of brain electric activity from recorded electric field potentials using realistic assumptions about brain geometry and conductivity.
Caterva is a library on top of the Blosc2 compressor that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk.
Turn your Jupyter notebook into a beautiful modern React or Vue based dashboard using voila and Jupyter widgets.
We present an open-source parallelized and cythonized python library, ToFu, for modeling tomography diagnostics on nuclear fusion reactors. Its functionalities (with realistic examples), its architecture and its design will be shown.
Debugging Jupyter Notebooks has been one of the most requested features. In this presentation we give an overview of the current state and tools for debugging in Jupyter, and offer a glimpse of what is coming next.
Transonic is a new pure Python package to easily accelerate modern Python-Numpy code with different accelerators (like Cython, Pythran, Numba, Cupy, etc...).
Modern data systems tend to heavily focus on optimizing for the system’s time. In this talk, we discuss the design of Modin, a DataFrame library, and how to optimize for the human system.
The modeling system ueflow allows for customable, dynamic boundary conditions.
The modeler can write Python plugins to implement the behavior of these boundary conditions.
Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. In this talks I will present some best practices to make code more readable, better to maintain and re-usable.
Confounding effects are often present in observational data: the effect or association studied is observed jointly with other effects that are not desired.
PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver using Domain Decomposition method, where problems are solved locally by Direct Solver and at the interface iteratively.
The library leverages Python richness to provide a uniform user-friendly API for a zoo of industrial communication protocols used to control high voltage engineering devices, together with abstraction and implementations for such devices.
We compared the performance of GPU-Applications written in C-CUDA and Numba-CUDA. By analyzing the GPU assembly code, we learned about the reasons for the differences. This helped us to optimize our codes written in NUMBA-CUDA and NUMBA itself.
The RAMP (Rapid Analytics and Model Prototyping) framework provides a platform to organize reproducible and transparent data challenges. We will present the different framework bricks.
A summary of the MNE-Python changes introduced during the two last releases and highlights for future directions.
High Performance Computing (HPC) has been a pillar of the scientific community for years, with many in the Python community contributing to its continued development. However, one of the fundamental links in performance is the relationship between h
Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know?
Machine learning is a search for the best combination of features, model, and hyperparameters. But as data grow, so does the search space! Fortunately, visual diagnostics can focus our search and allow us to steer modeling purposefully, and at scale.
In this talk, you'll learn how to transition from traditional machine learning tools, like scikit-learn, to deep learning with Keras, TensorFlow, and JAX. No prior experience with machine learning or with deep learning required, and no need to instal
We interact with an increasing amount of data but classical data structures and algorithms can't fit our requirements anymore. This talk is to present the probabilistic algorithms and data structures and describe the main areas of their applications.
In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees.
The IRAM 30m radio telescope is one of the best in the world. The telescope control software, monitoring, data archiving as well as some of the data processing code is written in Python. We will describe how and why Python is used at the telescope.
Modern hardware is multi-core. It is crucial for Python to provide
efficient parallelism. This talk exposes the current state and advances
in Python parallelism, in order to help practitioners and developers take
better decisions on this matter.
Neural Embeddings are a powerful tool of turning categorical into numerical values. Given reasonable training data semantics present in the categories can be preserved in the numerical representation.
The main scientific computing programming languages have different models the main data structures of data science such as dataframes and n-d arrays. In this talk, we present our approach to reconcile the data science tooling in this polyglot world.
Video compression algorithms used to stream videos are lossy, and when compression rates increase they result in strong degradation of visual quality. We show how deep neural networks can eliminate compression artefacts and restore lost details.
In this talk we explore a recent addition to SymPy which allows to find closed-form solutions to matrix derivatives. As a consequence, generation of efficient code for optimization problems is now much easier.
I will walk through the entire Event Horizon Telescope experiment and the global effort that led to the first-ever direct image of a black hole revealed to the world on April 10th of this year.
Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company
In this talk I will take you on a whirlwind tour of Numba and you will be quipped with a mental model of how Numba works and what it is good at. At the end, you will be able to decide if Numba could be useful for you.
A new and efficient Python/C++ modular library for real and complex response functions at the
level of Kohn-Sham density functional theory
PyPy, the fast and compliant alternative implementation of Python, is now compatible with the SciPy ecosystem. We'll explore how scientific programmers can use it.
Detect migration flows worldwide using geolocated Twitter data: routes, settlement areas, mobility to more than one country, spatial integration in cities, etc.
This talk is about emzed, a Python library to support biologists with little programming knowledge to implement ad-hoc analyses as well as workflows for mass-spectrometry data.
Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks.
This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.
In this talk, we present some of the benefits of writing extensions for Python in Rust. We then illustrate this approach on the vtext project, that aims to be a high-performance library for text processing.
There are several solutions to make Python faster, and choosing one is not easy: we would want it to be fast without sacrificing its readability and high-level nature. We tried to do it for an Astrodynamics library using numba. How did it turn out?
We present a collection of software for handling hyperspectral data acquisition and preprocessing fully in Python utilising Xarray for metadata preservation from start to finish.
pystencils speeds up stencil computations on numpy arrays using a sympy-based high level description, that is compiled into optimized C code.
How can one of the largest code bases in open source Geographical Information Science – QGIS – be enhanced and re-designed? Through the powers of Python plugins. This talk demonstrates concepts on how to make QGIS more user-friendly.
PSYDAC takes input from SymPDE (a SymPy extension for partial differential equations), applies a finite-element discretization, generates MPI-parallel code, and accelerates it with Numba, Pythran, or Pyccel. We present design, usage and performance.
TelApy a Python module to compute free surface flows and sediments transport in geosciences and examples of how it is used to inter-operate with other Python libraries for Uncertainty Quantification, Optimization, Reduced Order Model.