2.0 -//Pentabarf//Schedule//EN

PUBLISH QPKHMG@@pretalx.com

-QPKHMG

Getting Started with JupyterLab en

20190902T090000 20190902T103000 1.03000

Getting Started with JupyterLab

This tutorial is hands-on. It is designed for participants who haven't used the JupyterLab yet or have only minimal experience with it. Participant will work along with the trainer and learn how a Jupyter Notebook work by using some basic features. Some of the topics are: * Client-server concept * How cells work * Basic markdown * Magic commands overview * Some magic commands in more detail * Debugging basics * Basic timing and profiling * Extensions * History of variables * Saving to files * and more There will be room for questions during the tutorial as well as a dedicated FAQ session at the end. After this tutorial participants should be able to comfortably follow the other tutorials that are delivered with a Jupyter Notebook. # Requirements and set up instructions Training will be doe wit Python 3.7 and the latest Jupyter Lab version. * Install Anaconda alternatively * Install Miniconda and `conda install jupyterlab` alternatively * Create a new conda environment: + `conda create -n jupyterlabtutorial python=3.7 jupyterlab` and activate it with + `conda activate jupyterlabtutorial` PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/QPKHMG/ Track 2 (Baroja) Mike Müller PUBLISH KRNP7Y@@pretalx.com

-KRNP7Y

Never get in a battle of bits without ammunition en

20190902T110000 20190902T123000 1.03000

Never get in a battle of bits without ammunition

# Outline **Part 1** Numpy Basics - Introduction to NumPy Arrays - numpy internals schematics - Reshaping and Resizing - Numerical Data Types - Record Array **Part 2** Indexing and Slicing - Indexing numpy arrays - fancy indexing - array masking - Slicing & Stacking - Vectorization & BroadCasting **Part 3** "Advanced" NumPy - Serialisation & I/O - `.mat` files - Array and Matrix - Matlab compatibility - Memmap - Bits of Data Science with NumPy - NumPy beyond classic `numpy` ### Python version The minimum recommended version of Python to use for this tutorial is **Python 3.5**, although Python 2.7 should be fine, as well as previous versions of Python 3. Py3.5+ is recommended due to a reference to the `@` operator in the linear algebra notebook. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/KRNP7Y/ Track 2 (Baroja) Valerio Maggio PUBLISH G7CTX8@@pretalx.com

-G7CTX8

Introduction to pandas en

20190902T140000 20190902T153000 1.03000

Introduction to pandas

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. This tutorial will use couple of example data sets to show what pandas can do, and get an idea on how to work with data using pandas. It is recommended to bring your own laptop with the latest version of Anaconda, pandas, Jupyter, and the repository of the tutorial cloned. See the exact instructions here: https://github.com/datapythonista/pandas-tutorials PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/G7CTX8/ Track 2 (Baroja) Marc Garcia PUBLISH A8KBUB@@pretalx.com

-A8KBUB

Hands-on TensorFlow 2.0 en

20190902T090000 20190902T103000 1.03000

Hands-on TensorFlow 2.0

A hands-on introduction to TensorFlow 2.0 at an intermediate difficulty level. In this 90 minute tutorial, we will briefly introduce TensorFlow 2.0, then dive in to writing code. We will complete four short exercises on Deep Dream, Style Transfer, Image colorization, and GANs (if time allows). This tutorial is intermediate level, for folks with prior Deep Learning experience. You will need a laptop with an internet connection, there is nothing to install in advance. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/A8KBUB/ Track 3 (Oteiza) Josh Gordon PUBLISH Q79NND@@pretalx.com

-Q79NND

Deep Diving into GANs: From Theory to Production with TensorFlow 2.0 en

20190902T110000 20190902T123000 1.03000

Deep Diving into GANs: From Theory to Production with TensorFlow 2.0

GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production. The workshop aims at providing a complete understanding of both the theory and the practical know-how to code and deploy this family of models in production. By the end of it, the attendees should be able to apply the concepts learned to other models without any issues. We will be showcasing all the shiny new APIs introduced by TensorFlow 2.0 by showing how to build a GAN from scratch and how to "productionize" it by leveraging the AshPy Python package that allows to easily design, prototype, train and export Machine Learning models defined in TensorFlow 2.0. ------ The workshop is composed of - Theoretical introduction - GANs from Scratch in TensorFlow 2.0 - High-performance input data pipeline with TensorFlow Datasets - Introduction to the AshPy API - Implementing, training, and visualizing DCGAN using AshPy - Serving TF2 Models with Google Cloud Functions The materials of the workshop will be openly provided via GitHub (https://github.com/zurutech/gans-from-theory-to-production) prior to the event and will be run on Colab leveraging the free GPU **Note**: the workshop requires Python 3.7 to run, therefore the colab support is still uncertain. The attendees are encouraged to bring their own devices with Python 3.7 installed and ready to use. ## Requirements and set up instructions Two options available: 1. (recommended). Use Google Colab & Binder. Every notebook has a button to lunch the correct tool. Just use it. 2. Local setup: follow the instructions in the README https://github.com/zurutech/gans-from-theory-to-production PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/Q79NND/ Track 3 (Oteiza) Michele "Ubik" De Simoni Paolo Galeone Federico Di Mattia Emanuele Ghelfi PUBLISH L8LMQR@@pretalx.com

-L8LMQR

Create CUDA kernels from Python using Numba and CuPy. en

20190902T140000 20190902T153000 1.03000

Create CUDA kernels from Python using Numba and CuPy.

### Abstract We'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library. Numba is an open source compiler that can translate Python functions for execution on the GPU without requiring users to write any C or C++ code. Numba's just-in-time compilation ability makes it easy to interactively experiment with GPU computing in the Jupyter notebook. Combining Numba with CuPy, a nearly complete implementation of the NumPy API for CUDA, creates a high productivity GPU development environment. Learn the basics of using Numba with CuPy, techniques for automatically parallelizing custom Python functions on arrays, and how to create and launch CUDA kernels entirely from Python. Access to appropriate hardware will be provided in the form of access to GPU based cloud resources. ### Libraries * https://numba.pydata.org/ * https://cupy.chainer.org/ ### Requirements and set up instructions * Cloud based access to GPUs will be provided, please bring a laptop with an operating system and a browser. Chrome is usually fine. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/L8LMQR/ Track 3 (Oteiza) Valentin Haenel PUBLISH MNAGWC@@pretalx.com

-MNAGWC

Speed up your python code en

20190902T160000 20190902T173000 1.03000

Speed up your python code

Through a simple example we will see how to optimize Python code. First we will introduce a few tools to profile and visualize the performances of our code, such as Perf and SnakeViz. Then we will incrementally optimize our code using Cython, a lower level compiled language designed to make a bridge between C and Python. As an alternative, we will also use Numba, a Python just in time compiler. Finally, we will see how to parallelize our code to speed it up further. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/MNAGWC/ Track 3 (Oteiza) Jérémie du Boisberranger PUBLISH DU9CAN@@pretalx.com

-DU9CAN

3D image processing with scikit-image en

20190902T090000 20190902T103000 1.03000

3D image processing with scikit-image

This tutorial will introduce how to analyze three dimensional stacked and volumetric images in Python, mainly using scikit-image. We start the tutorial checking a brief overview of scikit-image and how it relates to packages in the scientific Python ecosystem, such as NumPy, SciPy and matplotlib. Then, we discuss how to process two and three dimensional data through several steps: first, we will pre-process the data using filtering, binarization and segmentation techniques. After that, we cover how to inspect, count and measure attributes of objects and regions of interest in the data. At the end, we present the visualization of large 3D data. Real-world examples are given from domains such as materials science and biology. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/DU9CAN/ Track4 (Chillida) Alexandre de Siqueira PUBLISH TQH9FG@@pretalx.com

-TQH9FG

Reproducible Data Science in Python en

20190902T110000 20190902T123000 1.03000

Reproducible Data Science in Python

The expectation of reproducibility in scientific work has been established for several hundred years, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are now a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. One source of confusion is simply the number of available options. Beyond that, the term "reproducibility" can mean multiple things, making it difficult to compare tools. In this tutorial, we will examine _reproducibility_ from the perspective of the philosophy of science. That will give us the concepts and vocabulary necessary to precisely understand and discuss different definitions of the term and allow us to identify the technologies that provide the building blocks for reproducible data science. We will briefly survey the landscape of existing solutions and then spend the remaining time looking at one solution in particular, Renku, which we will use to work end-to-end through a reproducible data-science scenario. * 0:00 - 0:35 Introduction & Background * 0:00 - 0:15 Reproducibility, a philosophy of science perspective * Overview of reproducibility issues in different domains of science (Nature 2016 survey results) * Definition of different degrees of reproducibility: _Reproducibility_, _replicability_, and _repeatability_ * Examine the function of reproducibility in the scientific process * 0:15 - 0:25 Building blocks for reproducibility: clean code, workflow automation, version control, containerization, provenance tracking * 0:25 - 0:35 Survey of the Tool Landscape: Binderhub, Pachyderm, Beaker, Gigantum, Whole Tale, SingularityHub, DVC, Stencila, dotscience, amie, CodeOcean, Renku * 0:35 - 1:30 Hands-on session with Renku where we will develop a typical data-science use-case, focusing on the building blocks of reproducibility along the way. ## Requirements and set up instructions We will run the tutorial on https://renkulab.io so please register and create an account following [these instructions](https://github.com/SwissDataScienceCenter/reproducible-data-science/blob/master/README-renkulab.md). To follow along with the slides, go [here](https://github.com/SwissDataScienceCenter/reproducible-data-science/blob/euroscipy2019/presentation/index.ipynb) PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/TQH9FG/ Track4 (Chillida) Chandrasekhar Ramakrishnan Rok Roškar PUBLISH 3MG8K3@@pretalx.com

-3MG8K3

Building data pipelines in Python: Airflow vs scripts soup en

20190902T140000 20190902T153000 1.03000

Building data pipelines in Python: Airflow vs scripts soup

## Introduction (5 minutes) Format: presentation Go over the agenda List the relevant resources Make sure everyone has followed the installation instructions ## Intro to data pipelines Format: presentation Go over the components of traditional data science pipelines Presentation of the scripts soup anttipatern ## Creating a script soup Format: hands-on The attendees will perform an ETL task on some data using a set of independent scripts. In this exercise, I will provide and explain the code and explain what we are trying to achieve with this pseudo-pipeline. The attendees will have a chance to try and reproduce it themselves. ## Introduction to Airflow and DAGS Format: presentation Introduce the concept of DAGs (directed acyclic graphs) Present and introduce the components of Airflow Airflow documentation ## Set up a local instance of Airflow Format: hands-on The attendees will create a local instance of Airflow and explore the sample DAGS provided. They will be introduced to the scheduling capabilities of the tool and track the status of the pipelines using the web GUI. ## ETL task on Airflow Format: hands-on I will provide hints on how to transform the scripts soup into Airflow DAGS. For this, I will use the pseudo code and other pedagogical approaches inspired by the software carpentry lessons to direct the attendees to the deployment of their first DAG in Airflow. ## Wrap up and questions Format: Q&A ## Setup <https://opendata-airflow-tutorial.readthedocs.io/en/latest/setup.html> PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/3MG8K3/ Track4 (Chillida) Dr. Tania Allard PUBLISH J3HEDH@@pretalx.com

-J3HEDH

Performing Quantum Measurements in QuTiP en

20190902T160000 20190902T173000 1.03000

Performing Quantum Measurements in QuTiP

Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you! No previous quantum mechanics experience required. It will be helpful to be comfortable with Python and only a little scared of matrix multiplication. The goal of the workshop is for each participant to: * Understand what a qubit is * Be able to create a 1-qubit state * Be able to measure a 1-qubit state * Be able to create a 2-qubit state * Be able to create an entangled 2-qubit state * Be able to measure part of an entangled state * Be able to teleport part a qubit using an entangled state To each of these please add "in Python with QuTiP" and "with a good understanding of what they're doing". The target audience is people who are: * interested in quantum mechanics but are not experts * comfortable with Python basics * only a little scared of matrix multiplication (have learnt it at some point, even if they don't remember it well now) PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/J3HEDH/ Track4 (Chillida) Simon Cross PUBLISH RHUPZ3@@pretalx.com

-RHUPZ3

A Tour of the Data Visualization Ecosystem of Python en

20190903T090000 20190903T103000 1.03000

A Tour of the Data Visualization Ecosystem of Python

Python and it ecosystem is used nowadays in many scientific context as an advanced data visualization tool. There a wide variety of visualization libraries. The tutorial will focus on primarly on : * [Yt](https://yt-project.org) * [Seaborn](https://seaborn.pydata.org) * [Altair](https://altair-viz.github.io) * [Plotly](https://plot.ly) For each one it will be shown how to use it in Jupyter, exploring the getting started examples, and letting the audience propose data set to visualize. At the end of the tutorial, the participants will fill a pros/cons table with an online voting mechanism. If time will allow, a short view of other libraries may be included. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/RHUPZ3/ Track 2 (Baroja) Giovanni De Gasperis PUBLISH WSNPK7@@pretalx.com

-WSNPK7

Introduction to SciPy en

20190903T110000 20190903T123000 1.03000

Introduction to SciPy

SciPy covers a broad variety of typical numerical tasks encountered in scientific computing ranging from the statistical analysis of data, curve fitting, and fast Fourier transform to numerical integration and special functions to name just a few topics. To avoid reinventing the wheel, it is always a good idea to check whether a desired functionality is already provided by SciPy. In the main part of the tutorial, we will demonstrate how some real-world data taken with a smartphone can be analyzed by means of SciPy. #### Installation instructions The tutorial requires the following packages on top of a Python 3 installation: * numpy * scipy * matplotlib * jupyter Any recent version of the [Anaconda distribution](https://anaconda.org) should allow to run the Jupyter notebooks used in this tutorial (see below) just fine. If you do not have the Anaconda distribution installed and are not short of disk space and want to do scientific work with Python, seriously consider installing it. It is free and pretty straightforward to install. Alternatively, you can install miniconda and build a specific environment `euroscipy-scipy-tutorial` for the tutorial by running ``` conda env create -f environment.yml ``` with the `environment.yml` file provided in the [repository of this tutorial](https://github.com/gertingold/euroscipy-scipy-tutorial). For more detailed instruction on how to create a conda environment, see the [conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). Note that you need to activate the environment by means of ``` conda activate euroscipy-scipy-tutorial ``` Finally, it nothing else works, the notebooks can also be run on [binder](https://mybinder.org/v2/gh/gertingold/euroscipy-scipy-tutorial/master?filepath=notebooks) (provided wifi is available during the tutorial session). #### Get the tutorial notebooks Unless you are using binder, you will need the notebooks of the tutorial to actively follow along. You can either clone the repository [gertingold/euroscipy-scipy-tutorial](https://github.com/gertingold/euroscipy-scipy-tutorial) or go to https://github.com/gertingold/euroscipy-scipy-tutorial/archive/master.zip to download a zipped version of the repository. All files needed during the tutorial are located in the directory `notebooks`. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/WSNPK7/ Track 2 (Baroja) Gert-Ludwig Ingold PUBLISH XXJGGG@@pretalx.com

-XXJGGG

Introduction to scikit-learn: from model fitting to model interpretation en

20190903T140000 20190903T153000 1.03000

Introduction to scikit-learn: from model fitting to model interpretation

Our introduction to scikit-learn will be subdivided into 2 parts. We will give a general introduction to scikit-learn presenting basic concepts around cross-validation, pipeline estimator, and hyperparameter search. Then, we will focus on model interpretation presenting the challenges and the available tools to understand a trained machine-learning model: partial independence plot, features importance, LIME, shapley values, etc. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/XXJGGG/ Track 2 (Baroja) Guillaume Lemaitre Olivier Grisel PUBLISH ZHQALW@@pretalx.com

-ZHQALW

Sufficiently Advanced Testing with Hypothesis en

20190903T090000 20190903T103000 1.03000

Sufficiently Advanced Testing with Hypothesis

Hypothesis is a testing package that will search for counterexamples to your assertions – so you can write tests that provide a high-level description of your code or system, and let the computer attempt a Popperian falsification. If it fails, your code is (probably) OK… and if it succeeds you have a minimal input to debug. Come along and learn the principles of property-based testing, how to use Hypothesis, and how to use it to check scientific code – whether highly-polished or quick-and-dirty! You can even use it to test 'black boxes', such as simulations, where we have no way of independently verifying that some input leads to the right output! Intrigued? Come and learn about the power of embedding assertions in your code, and metamorphic relations in your tests! PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/ZHQALW/ Track 3 (Oteiza) Zac Hatfield-Dodds PUBLISH M3RZXE@@pretalx.com

-M3RZXE

Effectively using matplotlib en

20190903T110000 20190903T123000 1.03000

Effectively using matplotlib

Matplotlib is one of the most-used and powerful visualization libraries for python. Nevertheless, there has been and still is some confusion on how use it properly. This has a number of reasons ranging from an evolution of the API and lack of good documentation to the complexity that comes with the large feature set and flexibility. But these issues can be overcome. This tutorial will explain the main concepts and intended usage patterns of matplotlib. Knowing these, lets you effectively use high-level functions for most of the cases. But you will be able to go into the details if you need to fine-tune certain aspects of the plot. We'll also touch some nowadays discouraged ways of working from the past (you should know what not to do - even though that's still found in lots of examples on the web) and we may get a glimpse into the future. Tim Hoffmann joined the matplotlib core development team almost two years ago with the mission to make matplotlib easier to use. *Requirements and set up instructions:* Jupyter plus any recent (>=3.0) matplotlib version will do. To be on the safe side, you may set up a new conda environment using `conda create -n using-mpl matplotlib>=3 jupyterlab pandas ipympl`. Link to tutorial notebook will be posted here soon. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/M3RZXE/ Track 3 (Oteiza) Tim Hoffmann PUBLISH NQMWSX@@pretalx.com

-NQMWSX

CFFI, Ctypes, Cython, Cppyy: how to run C code from Python en

20190903T140000 20190903T153000 1.03000

CFFI, Ctypes, Cython, Cppyy: how to run C code from Python

Using the Jupyter notebook and a compiler, we will start with a pure python implementation of a mandlebrot image. Then we will write the computationally heavy part of the code in C, and learn how to call it from Ctypes (part of the Python standard library), CFFI (a newer and better Ctypes alternative), Cython (a compiler from Python to C), and CPPYY (like Ctypes and CFFI, but for C++). Along the way we will stop to reflect on the advantages and disadvantages of each technique in terms of speed of development, runtime overhead, maintainability, and readability. The participants will come away with an understanding of the tools, their strengths and weaknesses, and how to use them. Please be sure you have a computer with anaconda python installed and a compiler (for windows users - Visual Studio 2019 is recommended. Others should have a functioniong gcc or clang). You should also download the [git repo](https://github.com/mattip/c_from_python) and be sure you can run the first few cells that involve compilation (before the `ctypes` discussion). Also please be sure to preinstall [`cppyy`](https://pypi.org/project/cppyy/). PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/NQMWSX/ Track 3 (Oteiza) Matti Picus PUBLISH HVEBGU@@pretalx.com

-HVEBGU

kCSD - a Python package for reconstruction of brain activity en

20190903T160000 20190903T173000 1.03000

kCSD - a Python package for reconstruction of brain activity

Electric potential measured in the brain is generated by transmembrane ionic currents of neural cells. Due to the long range of electric field simultaneously recorded extracellular potential - EEG, local field potential (LFP) - at different places are typically strongly correlated which complicates their analysis. It is thus useful to reconstruct their current sources which in practice means solving Poisson equation. The first method for estimation of _Current Source Density_ (CSD) from measured potentials was proposed in the early 1950s (1). Despite some developments, a number of limitations were present until recently, in particular, most previous methods required recordings with regular grids of electrodes and overfitted to noise. The _kernel Current Source Density_ method (kCSD) developed in 2012 (2) uses kernel methods to estimate the potential and CSD in the whole space, from arbitrary distribution of electrodes using regularization to minimize the influence of noise on reconstruction. In this tutorial we will demonstrate kCSD-python package (3) which allows reconstruction of CSD in different dimensions. After this tutorial you will be able to: * estimate the distribution of current sources based on the exact values of the electric field potentials, * deal with measurement noise, * diagnose the quality of the obtained reconstruction. # Requirements: * Python 2.7/3.4+ environment (Anaconda with Jupyter Notebook recommended), * numpy, scipy, matplotlib packages installed, * kcsd package installed or possibility to download it from GitHub (4) (network connection etc.). # Authors * Chaitanya Chintaluri, * Marta Kowalska, * Michał Czerwiński, * Władysław Średniawa, * Joanna Jędrzejewska-Szmek, * Daniel K. Wójcik # Bibliography 1. Pitts, W. H. (1952), _Investigations on synaptic transmission_, in 'Cybernetics, Trans. 9th Conf. Josiah Macy Foundation H. von Foerster', pp. 159-166. 2. Potworowski, J., Jakuczun, W., Łęski, S. & Wójcik, D. (2012) _Kernel current source density method_. Neural Comput 24(2), 541-575. 3. _Kernel Current Source Density_ <https://github.com/Neuroinflab/kCSD-python> # Acknowledgement Project funded from the Polish National Science Centre's SYMFONIA (2013/08/W/NZ4/00691) and OPUS (2015/17/B/ST7/04123) grants. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/HVEBGU/ Track 3 (Oteiza) Marta Kowalska Jakub M. Dzik PUBLISH YKPNEE@@pretalx.com

-YKPNEE

Introduction to geospatial data analysis with GeoPandas and the PyData stack en

20190903T090000 20190903T103000 1.03000

Introduction to geospatial data analysis with GeoPandas and the PyData stack

This tutorial is an introduction to geospatial data analysis in Python, with a focus on tabular vector data using GeoPandas. The content focuses on introducing the participants to the different libraries to work with geospatial data and will cover munging geo-data and exploring relations over space. This includes importing data in different formats (e.g. shapefile, GeoJSON), visualizing, combining and tidying them up for analysis, and will use libraries such as pandas, geopandas, shapely, pyproj, matplotlib, cartopy, ... The tutorial will cover the following topics, each of them using Jupyter notebooks and hands-on exercises with real-world data: 1. Introduction to vector data and GeoPandas 2. Visualizing geospatial data 3. Spatial relationships and operations 4. Spatial joins and overlays Materials of previous versions of this tutorial: https://github.com/jorisvandenbossche/geopandas-tutorial PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/YKPNEE/ Track4 (Chillida) Joris Van den Bossche PUBLISH SMLGVL@@pretalx.com

-SMLGVL

Astronomical Image Processing en

20190903T110000 20190903T123000 1.03000

Astronomical Image Processing

### Programme - The tutorial will begin with short introduction to the basic premise of sparsity and highlight some problems in astronomical image processing that can be solved using this methodology. (~15-20min; slides) - Tutees will then follow a hands-on demonstration of how the concept of sparsity can be used to denoise signals. (~30-35min; interactive jupyter notebook with exercises) - Finally the tutees will learn how to denoise an astronomical image and use their newfound skills to recover a nice picture of Saturn. (~35-40min; interactive jupyter notebook with an exercise) ### Requirements - The tutorial contents are available on [GitHub](https://github.com/sfarrens/euroscipy). - Provided tutees have a stable internet connection, the entire tutorial can be run online using [Binder](https://mybinder.org/v2/gh/sfarrens/euroscipy/master). - However, to be safe, tutees should download and install the tutorial materials beforehand. PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/SMLGVL/ Track4 (Chillida) Samuel FARRENS PUBLISH CQCKY9@@pretalx.com

-CQCKY9

Parallelizing Python applications with PyCOMPSs en

20190903T140000 20190903T153000 1.03000

Parallelizing Python applications with PyCOMPSs

## PyCOMPSs! COMPSs is a **task-based programming model that aims to ease the development of parallel applications and their execution in distributed computing environments**, which provides a binding for Python (aka **PyCOMPSs**). It is based on sequential programming, which helps application developers on parallelization and distribution efforts (e.g. thread/process creation, synchronization, data movements, etc.). Application developers simply need to identify which methods will be considered tasks, and the runtime exploits the inherent parallelism of the application at execution time by detecting the task calls and the data dependencies among them. To this end, the runtime is able to spawn the tasks asynchronously on the available resources and orchestrate their data transfers guaranteeing the validity of the execution. PyCOMPSs relies on the usage of decorators for task selection and a tiny API for synchronization. Moreover, it has also integration with Jupyter notebooks, and provides a wide range of supported features, such as task constraint definition, multiple implementations (so that the runtime can choose the most appropriate considering the available resources), and binary tasks (e.g. binary, MPI and OmpSs) among others. In addition, PyCOMPSs' runtime enables to run the applications on top of different infrastructures (such as multi-core machines, clusters, grids, clouds or containers) without modifying a single line of the application. It also provides fault-tolerant mechanisms, a live monitoring tool, it is able to generate post-mortem performance traces using Extrae that can be later analyzed with Paraver, and it is extendible through pluggable connectors (e.g. clouds and schedulers). This rich number of features enables the quick and easy parallelization of Python code, its execution in distributed environments and performance analysis, with current success in scientific fields like numeric algorithms, AI, life and earth sciences. This tutorial has as main objective to instruct **how to program and decorate Python applications using PyCOMPSs** in order to enable them **to run in parallel**. More in detail, the tutorial objectives are: * To give an overview of PyCOMPSs task-based programming model syntax. * To demonstrate how to use PyCOMPSs to parallelize and run applications in distributed platforms. * To illustrate how sample benchmarks from linear algebra and big data can benefit of PyCOMPSs as a programming model. Also, from real use cases from AI, Life and Earth sciences. * To give practical insight of how to use PyCOMPSs programming model with the Jupyter notebook. * To give an overview of the PyCOMPSs runtime and how it interacts with clusters, clusters of docker containers and clouds. **The attendees will benefit by learning how to parallelize their Python application with PyCOMPSs with a simple interface, run them in distributed parallel platforms, the integration with Jupyter notebooks, and how to analyze the execution behaviour.** #### Requirements and setup instructions This tutorial can be followed using a virtual machine or using a docker container. Attendees can choose the best option considering their system. - Using Virtual Appliance: - Install VirtualBox - Download and import the COMPSs 2.5 VM image from http://compss.bsc.es (Downloads section) - Import the VM image - Start the VM image (user: compss password: compss19) - Update the tutorial apps folder: rm -rf tutorial_apps && git clone https://github.com/bsc-wdc/tutorial_apps.git - Using Docker: - Install docker - git clone https://github.com/bsc-wdc/tutorial_apps.git - docker pull compss/compss-tutorial:patc2019 - docker run --name mycompss -p 8888:8888 -p 8080:8080 -v /path/to/tutorial_apps:/home/tutorial_apps -itd compss/compss-tutorial:patc2019 PUBLIC CONFIRMED Tutorial https://pretalx.com/euroscipy-2019/talk/CQCKY9/ Track4 (Chillida) Javier Conejero PUBLISH H8VPAY@@pretalx.com

-H8VPAY

From Galaxies to Brains! - Image processing with Python en

20190904T101500 20190904T110000 0.04500

From Galaxies to Brains! - Image processing with Python

From the smallest microscopic objects to the largest scales of the Universe, our ability to study the world around us is predicated on the quality of the data we have access to. In other words, cleaner and higher resolution images will provide us with more detailed and accurate information. Obtaining the necessary image quality, however, is extremely difficult, particularly as we push instruments to their limits and have to deal with larger and larger amounts of data. In this talk I will introduce some of the current challenges in the realms of astrophysical and biomedical imaging. I will then present some interesting new ideas for tackling these problems and how Python facilitates their implementation. PUBLIC CONFIRMED Keynote https://pretalx.com/euroscipy-2019/talk/H8VPAY/ Track 1 (Mitxelena) Samuel FARRENS PUBLISH 9DPFGM@@pretalx.com

-9DPFGM

Distributed GPU Computing with Dask en

20190904T113000 20190904T120000 0.03000

Distributed GPU Computing with Dask

The need for speed remains important for scientific computing. Historically, computers were limited to few dozens of processors, but with modern GPUs, we can have thousands, or even millions of cores running in parallel on distributed systems. However, developing software for distributed GPU systems can be difficult, both because writing GPU code can be challenging for non-experts, and because distributed systems are inherently complex. We can work to address these challenges by using GPU-enabled libraries that mimic parts of the SciPy ecosystem, such as CuPy, RAPIDS, and Numba, abstracting GPU programming complexity, combined with Dask to abstract distributed computing complexity. We talk about how Dask has come a long way to support distributed GPU-enabled systems by leveraging community standards and protocols, reusing open source libraries for GPU computing, and keeping it simple and complication-free to build highly-configurable accelerated distributed software. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/9DPFGM/ Track 1 (Mitxelena) Peter Andreas Entschev PUBLISH YRJNR8@@pretalx.com

-YRJNR8

Modern Data Science: A new approach to DataFrames and pipelines en

20190904T120000 20190904T123000 0.03000

Modern Data Science: A new approach to DataFrames and pipelines

Working with datasets comprising millions or billions of samples is an increasingly common task, one that is typically tackled with distributed computing. Nodes in high-performance computing clusters have enough RAM to run intensive and well-tested data analysis workflows. More often than not, however, this is preceded by the scientific process of cleaning, filtering, grouping, and other transformations of the data, through continuous visualizations and correlation analysis. In today’s work environments, many data scientists prefer to do this on their laptops or workstations, as to more effectively use their time and not to rely on spotty internet connection to access their remote data and computation resources. Modern laptops have sufficiently fast I/O SSD storage, but upgrading RAM is expensive or impossible. Applying the combined benefits of computational graphs, which are common in neural network libraries, with delayed (a.k.a lazy) evaluations to a DataFrame library enables efficient memory and CPU usage. Together with memory-mapped storage (Apache Arrow, hdf5) and out-of-core algorithms, we can process considerably larger data sets with fewer resources. As an added bonus, the computational graphs ‘remember’ all operations applied to a DataFrame, meaning that data processing pipelines can be generated automatically. In this talk, we will demonstrate Vaex, an open-source DataFrame library that embodies these concepts. Using data from the New York City YellowCab taxi service comprising 1.1 billion samples and taking up over 170 GB on disk, we will showcase how one can conduct an exploratory data analysis, complete with filtering, grouping, calculations of statistics and interactive visualisations on a single laptop in real time. Finally we will show an example of how one can automatically build a machine learning pipeline as a by-product of the exploratory data analysis using the computational graphs in Vaex. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/YRJNR8/ Track 1 (Mitxelena) Jovan Veljanoski Maarten Breddels PUBLISH KZGLXR@@pretalx.com

-KZGLXR

Apache Arrow: a cross-language development platform for in-memory data en

20190904T144500 20190904T151500 0.03000

Apache Arrow: a cross-language development platform for in-memory data

This talk discusses Apache Arrow project and how it already interacts with the Python ecosystem. The Apache Arrow project specifies a standardized language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware. On top of that standard, it provides computational libraries and zero-copy streaming messaging and interprocess communication protocols, and as such, it provides a cross-language development platform for in-memory data. It has support for many languages, including C, C++, Java, JavaScript, MATLAB, Python, R, Rust, .. The Apache Arrow project, although still in active development, has already several applications in the Python ecosystem. For example, it provides the IO functionality for pandas to read the Parquet format (a columnar, binary file format used a lot in the Hadoop ecosystem). Thanks to the standard memory format, it can help improve interoperability between systems, and this is already seen in practice for the Spark / Python interface, by increasing the performance of PySpark. Further, it has the potential to provide a more performant string data type and nested data types (like dicts or lists) for Pandas dataframes, which is already being experimented with in the fletcher package (using the pandas ExtensionArray interface). PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/KZGLXR/ Track 1 (Mitxelena) Joris Van den Bossche PUBLISH BLPA7N@@pretalx.com

-BLPA7N

Caterva: A Compressed And Multidimensional Container For Big Data en

20190904T151500 20190904T154500 0.03000

Caterva: A Compressed And Multidimensional Container For Big Data

# Caterva: A Compressed And Multidimensional Container For Big Data [Caterva](https://github.com/Blosc/Caterva) is a C library on top of [C-Blosc2](https://github.com/Blosc/c-blosc2) that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk. While there are several existing solutions for this scenario (HDF5 is one of the most known), Caterva brings novel features that, when taken toghether, set it appart from them: * __Leverage important features of C-Blosc2__. C-Blosc2 is the next generation of the well-know, high performance C-Blosc compression library (see below for a more in-depth description). * __Fast and seamless interface with the compression engine__. While in other solutions compression seems an after-thought and can implies several copies of buffers internally, the interface of Caterva and C-Blosc2 (its internal compression engine) is meant to be as direct as possible minimizing copies and hence, increasing performance. * __Both in-memory and on-disk paradigms are supported the same way__. This allows for using the same API for data that can be either in-memory or on-disk. * __Support for a plain buffer data layout__. This allows for essentially no-copy data sharing among existing libraries (NumPy), allowing to use existing functionality to be used directly in Caterva without loosing performance. Along this features, there is an important 'mis-feature': Caterva is __type-less__. Lacking the notion of data type means that Caterva containers are not meant to be used in computations directly, but rather in combination with other higher-level libraries. While this can be seen as a drawback, it actually favors simplicity and leaves up to the user the addition of the types that he is more interested in, which is far more flexible than typed-aware libraries (HDF5, NumPy and many others). During our talk, we will describe all these Caterva features by using [cat4py](https://github.com/Blosc/cat4py), a Python wrapper for Caterva. Among the points to be discussed would be: * Introduction to the main features of Caterva. * Description of the basic data container and its usage. * Short discussion of different use cases: * Create and fill high dimensional arrays. * Get multi-dimensional slices out of the arrays. * How different compression codecs and filters in the pipeline affect store/retrieval performance. We have been using Caterva in one of our internal projects for several months now, and we are pretty happy with the flexibility and easy-of-use that it brings to us. This is why we decided to open-source it in the hope that it would benefit others, but also that others may help us in developing it further ;-) ## About C-Blosc and C-Blosc2 [C-Blosc](https://github.com/Blosc/c-blosc) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that we are aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations. [C-Blosc2](https://github.com/Blosc/c-blosc2) is the new major version of C-Blosc, with a revamped API and support for new compressors and new filters (data transformations), including filter pipelining, that is, the capability to apply different filters during the compression pipeline, allowing for more adaptability to the data to be compressed. Dictionaries are also introduced, allowing better handling of redundancies among independent blocks and generally increasing compression ratio and performance. Last but not least, there are new data containers that are meant to overcome the 32-bit limitation of the original C-Blosc. Furthermore, the new data containers are available in various formats, including in-memory and on-disk implementations. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/BLPA7N/ Track 1 (Mitxelena) Francesc Alted PUBLISH H3DRAV@@pretalx.com

-H3DRAV

Modin: Scaling the Capabilities of the Data Scientist, not the machine en

20190904T154500 20190904T160000 0.01500

Modin: Scaling the Capabilities of the Data Scientist, not the machine

Modern data systems tend to heavily focus on optimizing for the system’s time. Some of these optimizations, however, are counterproductive to the end user’s workflow and thought process. In this talk, we discuss the design of Modin, a DataFrame library, and how to optimize for the human system. Modin is a project at UC Berkeley's RISELab designed to optimize for the data scientist’s time. Often when building a data system, the system designers will follow a set of “best practices” in order to optimize performance. These “best practices” often require data scientists to understand and personally optimize concepts and system components that are not central to extracting value from their data. The fundamental goal of data science is to extract value from data. Despite this, data systems are being built with user requirements such as: (1) knowledge of partitioning, (2) understanding laziness and what triggers computation, (3) an entirely new API, and (4) where their code is running (e.g. locally, on-prem cluster, cloud). This overhead is passed to the data scientist, even though there is no overlap between these new requirements and the fundamental goal of their profession. In this talk, we will discuss how we think about the problem of large scale data science and optimizing for the human system. We will discuss the system design of Modin, which enables pluggable backends, runtimes, and APIs. The system is designed to solve the needs of the data science community regardless of an individual user’s environment. Currently, Modin supports the pandas API, and a proof of concept for SQL has been implemented. Modin is completely open-source and can be found on GitHub: https://github.com/modin-project/modin. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/H3DRAV/ Track 1 (Mitxelena) Devin Petersohn Devin Petersohn PUBLISH XBGYZB@@pretalx.com

-XBGYZB

Best Coding Practices in Jupyterlab en

20190904T163000 20190904T164500 0.01500

Best Coding Practices in Jupyterlab

Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. In this talks I will present some best practices to make code more readable, better to maintain and re-usable. This will include: - versioning best practices - how to use submodules - coding methods to avoid (e.g. closures) PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/XBGYZB/ Track 1 (Mitxelena) Alexander CS Hendorf PUBLISH UHMWGH@@pretalx.com

-UHMWGH

Lessons learned from comparing Numba-CUDA and C-CUDA en

20190904T164500 20190904T170000 0.01500

Lessons learned from comparing Numba-CUDA and C-CUDA

Numba allows the development of GPU code in Python style. When a Python script using Numba is executed, the code is compiled just-in-time (JIT) using the LLVM framework. Using Python for GPU programming can mean a considerable simplification in the development of parallel applications compared to C and C-CUDA. Python, however, has to live with the prejudice of low performance, especially in HighPerformance Computing. We wanted to get to the bottom of whether this is really true and where these differences come from. For this reason, we first analyzed the performance of typical micro benchmarks used in HPC. By analyzing the assembly codes, we learned a lot about the difference between codes produced by C-CUDA and NUMBA-CUDA. Some of these insights have helped us to improve the performance of our application - and also of Numba-CUDA. With a few tricks it is possible to achieve very good performance with our Numba-Codes, which are very close - or sometimes even better than the C-CUDA versions. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/UHMWGH/ Track 1 (Mitxelena) Lena Oden PUBLISH YU8EML@@pretalx.com

-YU8EML

How a voice assistant works en

20190904T113000 20190904T120000 0.03000

How a voice assistant works

This talk will focus on the technologies needed to build a voice assistant. It will keep as center point Samsung’s voice assistant Bixby, which is available in 8 languages across the world (5 EU languages) in a variety of Samsung mobile phones. First an overview of the needed infrastructure and the challenges regarding user education will be presented. Then, the talk will offer an overview of the technologies needed in a voice assistant: 1. Automatic Speech Recognition: how a sound wave is transcribed into words 2. Natural Language Understanding: extraction of meaning from a sentence 3. Natural Language Generation: response generation 4. Text To Speech: speech synthesis During the talk the new Bixby IDE will also be presented, with which any developer can create a “voice capsule” that processes natural language to send/retrieve information from their API. Bixby Developers site: https://bixbydevelopers.com/ PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/YU8EML/ Track 2 (Baroja) Miren Urteaga Aldalur PUBLISH JJCQQJ@@pretalx.com

-JJCQQJ

QuTiP: the quantum toolbox in Python as an ecosystem for quantum physics exploration and quantum information science en

20190904T120000 20190904T123000 0.03000

QuTiP: the quantum toolbox in Python as an ecosystem for quantum physics exploration and quantum information science

QuTiP is emerging as a library at the center of a lively ecosystem. In this talk you will learn about the ongoing projects that have invested this project, from providing the framework to simulate quantum machine learning for quantum computers to the development of efficient numerical solvers tackling dynamical problems that are inherently hard to simulate classically. It can be noted that [Astropy](https://www.astropy.org/affiliated/index.html) is a community effort to develop a common core package for Astronomy in Python and "foster an ecosystem of interoperable astronomy packages", It seems an interesting model for the quantum tech landscape. [Qiskit]() did build its own ecosystem of sub-libraries for quantum computing. The physics library for quantum tech is http://qutip.org . About the idea of QuTiP as a super-library, here are some details: - `krotov`, a very recent package for optimal control built on top of QuTiP ( https://arxiv.org/abs/1902.11284). [https://github.com/qucontrol/krotov]. - `piqs`, the permutational invariant quantum solver, now a QuTiP module (see also https://arxiv.org/abs/1805.05129 ); - `matsubara`, a plugin to study the ultrastrong coupling regime with structured baths, http://matsubara.readthedocs.io/ - `QNET`, a computer algebra package for quantum mechanics and photonic quantum networks, which actually calls QuTiP as a plugin, mainly developed at Stanford in Mabuchi Lab https://github.com/mabuchilab/QNET - `qptomographer`, https://qptomographer.readthedocs.io/en/latest/install, a library to derive error bars for experiments in quantum computing and quantum information processing. - `tiqs`, a library to study open quantum systems on extended lattices exploiting the symmetries of such systems, https://github.com/fminga/tiqs - other upcoming integrations relative to pulse control, such as `qupulse`, https://github.com/qutech/qupulse/wiki/Architecture-Proposal This talk will be of interest to the curious coder and researcher, analyzing how QuTiP's impact in the research community has fostered a [*lingua franca* for quantum tech research](https://twitter.com/goerz/status/1118739088595652611). We will also draw comparisons with other larger ecosystems in Python-based scientific projects, such as astropy and scikit-learn. # More about QuTiP - QuTiP is the open-source software to study quantum physics. It develops both an intuitive playground to understand quantum mechanics and cutting-edge tools to investigate it. - QuTiP provides the most comprehensive toolbox to characterize noise and dissipation –realistic processes– affecting quantum systems, as well as tools not only to monitor but also to minimize their impact (quantum optimal control, description of decoherence-free spaces). - For this reason QuTiP is a software born out of the quantum optics community and that has become increasingly relevant for the quantum computing community, as current quantum computing devices are noisy (NISQ definition by Preskill). - `pypinfo` data shows that QuTiP is popular in countries that are strong in quantum tech and quantum computing research, eg, The Netherlands in the top five, as well as countries that benefit in the use of open source software (OSS) for university coursework, eg, India. - In the past three years, there has been an evolution in the quantum tech community, which has embraced OSS. - OSS libraries are used as a means to grow the user base, as well as in a more structural way for quantum computers, as they provide cloud access to quantum devices, e.g., IBM Q. - QuTiP is the only major library that has continued to thrive in this ecosystem, competing with other library packages that are funded by corporations or VC-backed startups/ - Since the tools of QuTiP provide a common ground to study quantum mechanics, it is important that this independent project is provided with the necessary support to thrive - As access to quantum computers becomes more and more widespread also for the use of data scientist and QuTiP's popularity grows even more for undergraduate and graduate courses, becoming the de-facto standard OSS to study quantum optical systems, it is imperative that the QuTiP library makes a quality jump to provide a comprehensive introduction to its tools for a much broader community of users. - QuTiP website: http://www.qutip.org/ - GitHub repository: https://github.com/qutip - GitHub repository (QuTiP code): https://github.com/qutip/qutip - GitHub repository (QuTiP documentation): https://github.com/qutip/qutip-doc - GitHub repository (QuTiP tutorials): https://github.com/qutip/qutip-notebooks - Latest version of the documetnation: http://qutip.org/docs/latest/index.html - Historical archive of released documentation: http://qutip.org/documentation.html ## QuTiP core development team QuTiP core development team: (Alex Pitchford, alex.pitchford@gmail.com). Additional mentors will be the project's core contributors Nathan Shammah (nathan.shammah@gmail.com), Shahnawaz Ahmed (shahnawaz.ahmed95@gmail.com) and Eric Giguere (eric.giguere@usherbrooke.ca). QuTiP is a project started by Robert J. Johansson and Paul Nation. Other core developers have been Arne Grimso, Chris Granade and over other 44 contributors. ## References [1] J. R. Johansson, P. D. Nation, and F. Nori: “QuTiP: An open-source Python framework for the dynamics of open quantum systems.”, Comp. Phys. Comm. 183, 1760–1772 (2012) [2] J. Robert Johansson, Paul D. Nation, and Franco Nori: “QuTiP 2: A Python framework for the dynamics of open quantum systems.”, Comp. Phys. Comm. 184, 1234 (2013) [3] J. Preskill, "Quantum Computing in the NISQ era and beyond." Quantum **2**, 79 (2018) [4] Mark Fingerhuth, Tomáš Babej, and Peter Wittek, Open source software in quantum computing, PLoS ONE 13 (12): e0208561 (2018). [5] N. Shammah, S. Ahmed, N. Lambert, S. De Liberato, and F. Nori, "Open quantum systems with local and collective incoherent processes: Efficient numerical simulation using permutational invariance " Phys. Rev. A 98, 063815 (2018). Code at [http://piqs.readthedocs.io](http://piqs.readthedocs.io) [6] N. Lambert, S. Ahmed, M. Cirio, and F. Nori, "Virtual excitations in the ultra-strongly-coupled spin-boson model: physical results from unphysical modes", arXiv preprint arXiv:1903.05892. Also [http://matsubara.readthedocs.io](http://matsubara.readthedocs.io) **Other relevant material**: - Slides on QuTiP and the quantum-tech open source ecosystem (Nathan Shammah @ Berkeley Lab, 2019). [PDF](https://conferences.lbl.gov/event/195/session/6/contribution/13/material/slides/0.pdf) - ["The rise of open source in quantum physics research"](http://blogs.nature.com/onyourwavelength/2019/01/09/the-rise-of-open-source-in-quantum-physics-research/), Nathan Shammah and Shahnawaz Ahmed, Nature's physics blog, January 9, 2019. - "Bit to QuBit: Data in the age of quantum computers", Shahnawaz Ahmed, PyData 2018, Warsaw, Poland, 2019. [YouTube video](https://www.youtube.com/watch?v=6GAXJhL1mSs). PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/JJCQQJ/ Track 2 (Baroja) Nathan Shammah Alexander Pitchford PUBLISH PDYER8@@pretalx.com

-PDYER8

Constrained Data Synthesis en

20190904T144500 20190904T151500 0.03000

Constrained Data Synthesis

Synthetic data is useful in many contexts, including * providing "safe", non-private alternatives to data containing personally identifiable information * software and pipeline testing * software and service development * enhancing datasets for machine learning. Synthetic data is often created on a bespoke basis, and since the advent of generative adverserial networks (GANs) there has been considerable interest and experimentation with using those as the basis for creating synthetic data. We have taken a different approach. We have worked for some years on developing methods for automatically finding constraints that characterise data, and which can be used for testing data validity (so-called "test-driven data analysis", TDDA). Such constraints form (by design) a useful characterisation of the data from which they were generated. As a result, methods that generate datasets that match the constraints necessarily construct datasets that match many of the original characteristics of the data from which the constraints were extracted. An important aspect of datasets is the relationship between "good" (~ valid) and "bad" (~ invalid) data, both of which are typically present. Systems for creating useful, realistic synthetic data generally need to be able to synthesize both kinds, in realistic mixtures. This talk will discuss data synthesis from constraints, describing what has been achieved so far (which includes synthesizing good and bad data) and future research directions. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/PDYER8/ Track 2 (Baroja) Nick Radcliffe PUBLISH PEJPDG@@pretalx.com

-PEJPDG

ToFu - an open-source python/cython library for synthetic tomography diagnostics on Tokamaks en

20190904T151500 20190904T154500 0.03000

ToFu - an open-source python/cython library for synthetic tomography diagnostics on Tokamaks

Nuclear fusion comes along with great promises of almost limitless energy with little risks and waste. But it also comes with significant scientific and technological complexities. Decades of efforts may find an echo in ITER, an international tokamak being built to address this challenge. A tokamak is a particular kind of advanced experimental nuclear fusion reactor. It is a torus-shaped vacuum vessel in which a hydrogen plasma of very low density is heated up to temperatures (10-100 millions of degrees Celsius) allowing nuclear fusion reactions to occur. The torus-shaped plasma radiates light, which is measured in various wavelength domains by dedicated sets of detectors (called diagnostics), like 2D cameras observing visible light, 1D arrangements of diodes sensitive to X-rays, ultra-violet spectrometers... Due to the torus shape, the plasma is axisymmetric, and like in medical imaging, tomography methods can be used to diagnose the light radiated in a plasma cross-section. For all diagnostics, one can seek to solve the direct or the inverse problem. The direct problem consists in computing the measurements from a known plasma light emissivity, provided by a plasma simulation for example. The inverse problem consists in computing the plasma light emissivity from experimental measurements. The algorithms involved in solving both the direct and inverse problem are very similar, no matter the wavelength domain. Like many, the fusion community tends to suffer from a lack of reproducibility of the results it publishes. This problem is particularly acute in the case of tomography diagnostics since the inverse problem is ill-posed and the solution unicity is not guaranteed. There are also many possible simplifying hypotheses that may, or may not, be relevant for each diagnostic. In this regard, the historical uses of the community display a large variety of single-user black-box codes, each typically designed by a student, and often forgotten or left as is until a new student is hired and starts all over again. In this context, a machine-independent, open-source and documented python library, ToFu, was started to provide the fusion community with a common and free reference tool. We thus aim at improving reproducibility by providing a known and transparent tool, able to efficiently solve both the direct and inverse problem for tomography diagnostics. It can use very simple hypothesis or very complete diagnostics descriptions alike, one of the ideas being that it should allow users to perform accurate calculations easily, sparing them the need to simplify hypotheses that are not always valid. A zero version of tofu, fully operational but not user-friendly enough, was first developed between 2014 and 2016 when it was used for published results. Strong with this first proof of principle, a significant effort was initiated in 2017 to completely re-write the code with a stronger emphasis on python community standards (PEP8), version control (Github), performance (cython), packaging (pip and conda), continuous integration (nosetests and travis), modularity (architecture refurbishing), user-friendliness (renamings, utility tools) and object-oriented coding (class inheritance). This effort is still ongoing to this day and is scheduled to go on for the next 2.5 years. However, the first milestones have been reached, and we would like to present the first re-written modules to the python community, for publicity, advice, feedback, mutually enriching exchanges and more generally because we feel tofu is part of the large open-source python scientific community. The code is composed of several modules: a geometry module, a data visualization module, a meshing module, and an inversion module. We will present the geometry module (containing ray-tracing tools, spatial integration algorithms...) and the data module (making use of matplotlib for pre-defined interactive figures). Using profiling tools, the numerical core of the geometry module was optimized and parallelized recently in `Cython` making the code more than ten thousand times faster than the previous version on some test cases. Memory usage has also been reduced by half on the largest test cases. see [ToFu](https://github.com/ToFuProject/tofu) PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/PEJPDG/ Track 2 (Baroja) Laura Mendoza Didier VEZINET PUBLISH HGNPFF@@pretalx.com

-HGNPFF

Debugging in JupyterLab en

20190904T154500 20190904T160000 0.01500

Debugging in JupyterLab

Layout: ##### 1. Current tools for debugging Jupyter Notebooks - print statements - ipdb - PixieDebugger (IBM) - Visual Studio Code cell debugging ##### 2. Native debugging support for Jupyter Kernels - Jupyter protocol extension - Debug Adapter Protocol in xeus-python ##### 3. Debugger extension for JupyterLab - An IDE-like debugging experience in JupyterLab - Active development, current prototype - Demo PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/HGNPFF/ Track 2 (Baroja) Jeremy Tuloup PUBLISH XZLXZM@@pretalx.com

-XZLXZM

Controlling a confounding effect in predictive analysis. en

20190904T163000 20190904T164500 0.01500

Controlling a confounding effect in predictive analysis.

For instance, when predicting the salary to offer given the descriptions of professional experience, the risk is to capture indirectly a gender bias present in the distribution of salaries. Another example is found in biomedical applications, where for an automated radiology diagnostic system to be useful, it should use more than socio-demographic information to build its prediction. Here I will talk about confounds in predictive models. I will review classic deconfounding techniques developed in a well-established statistical literature, and how they can be adapted to predictive modeling settings. Departing from deconfounding, I will introduce a non-parametric approach –that we named “confound-isolating cross-validation”– adapting cross-validation experiments to measure the performance of a model independently of the confounding effect. The examples are mentioned in this work are related to the common issues in neuroimage analysis, although the approach is not limited to neuroscience and can be useful in another domains. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/XZLXZM/ Track 2 (Baroja) Darya Chyzhyk PUBLISH DVDLRG@@pretalx.com

-DVDLRG

The Rapid Analytics and Model Prototyping (RAMP) framework: tools for collaborative data science challenges en

20190904T164500 20190904T170000 0.01500

The Rapid Analytics and Model Prototyping (RAMP) framework: tools for collaborative data science challenges

We will give an overview of the RAMP framework, which provides a platform to organize reproducible and transparent data challenges. RAMP workflow is a python package used to define and formalize the data science problem to be solved. It can be used as a standalone package and allows a user to prototype different solutions. In addition to RAMP workflow, a set of packages have been developed allowing to share and collaborate around the developer solutions. Therefore, RAMP database provides a database structure to store the solutions of different users and the performance of these solutions. RAMP engine is the package to run the user solutions (possibly on the cloud) and populate the database. Finally, RAMP frontend is the web frontend where users can upload their solutions and which shows the leaderboard of the challenge. The project is open-source and can be deployed on any local server. The framework has been used at the Paris-Saclay Center for Data Science for setting up and solving about twenty scientific problems, for organizing collaborative data challenges, for organizing scientific sub-communities around these events, and for training novice data scientists. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/DVDLRG/ Track 2 (Baroja) Guillaume Lemaitre Joris Van den Bossche PUBLISH SZ8S8G@@pretalx.com

-SZ8S8G

Sufficiently Advanced Testing with Hypothesis en

20190904T113000 20190904T120000 0.03000

Sufficiently Advanced Testing with Hypothesis

Code is now a critical part of almost all research, whether for communication or for data collection and analysis. Unfortunately, producing reliably error-free code remains an open problem in science to an even greater extent than other applications. Soergal (2014) estimates that "any reported scientific result could very well be wrong if data have passed through a computer, and that these errors may remain largely undetected." - though some software errors are much more dramatic, as with the crash of the Mars Climate Orbiter. What can we do to reduce the rate of errors in our own code? There is no silver bullet, but a more efficient way to create tests would certainly help... The answer is to have a computer write your tests for you! Using Hypothesis, you describe valid inputs - from 'an integer' to 'dataframes like this', as complex and precise as needed - and write a test which should always pass... then Hypothesis searches for the smallest inputs that cause an error. This approach is called property based testing, and it regularly catches errors that evaded every human review and hand-written test case (even in Numpy). Even better, it rewards well-designed software - but can also do a quick check of a script in just a few lines of code. We'll cover the theory of property-based testing, a worked example, and then jump into a whirlwind tour of the Hypothesis API: how to use, define, compose, and infer strategies for input; properties and testing tactics for your code; and how to debug your tests if everything seems to go wrong. By the end of this talk, you'll be ready to find real bugs with Hypothesis in anything from data pipelines to the core scientific Python libraries. Be the change you want to see in your team's code - or test someone else's and help push the world into a new age of reliable research software! PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/SZ8S8G/ Track 3 (Oteiza) Zac Hatfield-Dodds PUBLISH YHCP9C@@pretalx.com

-YHCP9C

What about tests in Machine Learning projects? en

20190904T120000 20190904T123000 0.03000

What about tests in Machine Learning projects?

Once your machine learning POC seems promising and your development environment is set up, the next step is to refactor your code and write TESTS. We know that a lot of people think tests are too complicated and boring to write and they are not very useful. Some manual checks can address the need. It is not totally false. Tests can be really boring and time consuming to write when you don't have the right tools, the right APIs, the right environments or the right code structure. But it is always a bad idea to ignore tests or to perform them manually. If you want to be involved in your project life cycle, if you want to bring it from POC to production you need to care about tests. After some years tackling production bugs, you can't feel safe delivering without tests as you can't start driving until your seat belt is fastened. There is more than one way to test. Tests can be split on several levels (unit, component, functional, performances, etc...) to be able to quickly identify the faulty code/data/parameter. Tests must also be automated in a Continuous Integration and run at least on each experiment before merging it in the baseline pipeline as it is done in software engineering (the CI is triggered on each feature branch). This talk is about how to easily write tests and testable code, how to avoid most common traps and what are the benefits of tests on unrealistic data in your Machine Learning project. (Tests on real data are also really important but they are not the main purpose of this talk.) Slides are here: sdg.jlbl.net/slides/tests_for_datascientist/presentation.html PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/YHCP9C/ Track 3 (Oteiza) Sarah Diot-Girard PUBLISH QVCFGE@@pretalx.com

-QVCFGE

Scientific DevOps: Designing Reproducible Data Analysis Pipelines with Containerized Workflow Managers en

20190904T144500 20190904T151500 0.03000

Scientific DevOps: Designing Reproducible Data Analysis Pipelines with Containerized Workflow Managers

Open source and open science come together when the software is accessible, transparent, and owned by all. For data analysis pipelines that grow in complexity beyond a single Jupyter notebook, this can become a challenge as the number of steps and software dependencies increase. In this talk, Nicholas Del Grosso will review a variety of tools for packaging and managing a data analysis pipeline, showing how they fit together and benefit the development, testing, deployment, and publication processes and the scientific community. In particular, this talk will cover: - **Workflow managers** (e.g. Snakemake, PyDoit, Luigi) to combine complex pipelines into single applications. - **Container Solutions** (e.g. Docker and Singularity) to package and deploy the software on others' computers, including high-performance computing clusters. - **The Scientific Filesystem** to build explorable and multi-purpose applications. - **Testing Frameworks** (e.g. PyTest, Hypothesis) to declare and confirm the assumptions and functionality of the analysis pipeline. - **Ease-of-Use Utilities** to share the pipeline online and make it accessible to non-programmers. By writing software that stays manageable, reproducible, and deployable continuously throughout the development cycle, we can better fulfill the goals of open science and good scientific practice in a digital era. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/QVCFGE/ Track 3 (Oteiza) Nicholas Del Grosso PUBLISH UMWUTW@@pretalx.com

-UMWUTW

Dashboarding with Jupyter notebooks, voila and widgets en

20190904T151500 20190904T154500 0.03000

Dashboarding with Jupyter notebooks, voila and widgets

Sharing the result of a Jupyter notebook is currently not an easy path. With voila we are changing this. Voila is a small but important ingredient in the Jupyter ecosystem. Voila can execute notebooks, keeping the kernel connected but does not allow for arbitrary code execution, making it safe to share your notebooks with others. With new libraries built on top of Jupyter widgets/ipywidgets (ipymaterialui and ipyvuetify) we allow beautiful modern React and Vue components to enter the Jupyter notebook. Using voila we can integrate the ipywidgets seamlessly into modern React and Vue pages, to build modern dashboards directly from a Jupyter notebook. I will give a live example on how to transform a Jupyter notebook into a fully functional single page application with a modern (Material Design) look. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/UMWUTW/ Track 3 (Oteiza) Maarten Breddels Martin Renou PUBLISH PFH3QK@@pretalx.com

-PFH3QK

Make your Python code fly at transonic speeds! en

20190904T154500 20190904T160000 0.01500

Make your Python code fly at transonic speeds!

Slides available at https://tiny.cc/euroscipy2019-transonic [Transonic](http://transonic.readthedocs.io/) is a pure Python package (requiring Python >= 3.6) to easily accelerate modern Python-Numpy code with different accelerators (like Cython, [Pythran](https://github.com/serge-sans-paille/pythran), Numba, Cupy, etc...) opportunistically (i.e. if/when they are available). We will first present the context of the creation of this package, i.e. the Python's High Performance Computing (HPC) Landscape. We will show how Transonic can be used to write elegant and very efficient HPC codes with Python, with examples taken from real-life research simulation codes ([fluidfft](https://fluidfft.readthedocs.io) and [fluidsim](https://fluidsim.readthedocs.io)). We will discuss the advantages of using Transonic instead of writing big Cython extensions or using Numba or Pythran directly. A strategy to quickly develop a very efficient scientific application/library with Python and Transonic could be: 1. Use modern Python coding, standard Numpy/Scipy for the computations and all the cool libraries you want. 2. Profile your applications on real cases, detect the bottlenecks and apply standard optimizations with Numpy. 3. Add few lines of Transonic to compile the hot spots. We won't forget to also discuss some limitations of Transonic, and more generally of Python and its numerical ecosystem for High Performance Computing. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/PFH3QK/ Track 3 (Oteiza) Pierre Augier PUBLISH JSCWY7@@pretalx.com

-JSCWY7

PyFETI - An easy and massively Dual Domain Decomposition Solver for Python en

20190904T163000 20190904T164500 0.01500

PyFETI - An easy and massively Dual Domain Decomposition Solver for Python

PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver that uses Domain Decomposition Techniques. FETI methods rely in the solution of a linear system, based on to linear solver algorithm strategies, Direct and Iteratively. A big problem is decomposed in subdomains, generating an additional set of constraints at the interface among subdomains. The local problem solution is formulated based on a new interface force at the interface that must connect the subdomains. Therefore, given an interface force, the local problems are solved based on a direct solver, e.g SuperLU, and the update of interface force is performed by Preconditioned Conjunged Projected Gradient. The library has been tested for large linear elastic problems at the IT4I supercomputer center. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/JSCWY7/ Track 3 (Oteiza) Guilherme Jenovencio PUBLISH JHGWWN@@pretalx.com

-JHGWWN

High Voltage Lab Common Code Basis library: a uniform user-friendly object-oriented API for a high voltage engineering research. en

20190904T164500 20190904T170000 0.01500

High Voltage Lab Common Code Basis library: a uniform user-friendly object-oriented API for a high voltage engineering research.

At the heart of ETH High Voltage Lab's (HVL) research are industrial devices put together into code-automated experiments. It's a zoo of industrial communication protocols one needs to handle when controlling these devices. HVL decided to switch from MATLAB to Python as a programming and analysis tool. Python community provides solutions to majority of technicalities involved in handling multitude of industrial communication protocols used to control high voltage research experiment devices. Moreover Python seems to be a more future-proof choice, meeting industry demand for a more cost-effective and collaborative solution. The HVL Common Code Basis library (`hvl_ccb`) provides a uniform user-friendly object-oriented API as well as implementation for multiple of high voltage engineering devices and their respective communication protocols. The library leverages Python's open source community - implementations of specific communication protocols, but also relies heavily on some of the languages newer features such as typing hints, dataclasses or enums. Python typing hints are used not only for their static type checking and autocompletion support from IDEs, but also for dynamic type checking of the communication protocol's and devices' configurations. The configurations themselves are a customized implementation of Python's 3.7 dataclasses. Configurations properties rely heavily on Python (advanced) enumerations. Currently, the library supports serial port, VISA over TCP, Modbus TCP, LabJack LJM and OPC UA communication protocols. These protocols are used within code abstraction of devices such MBW973 SF6 Analyzer / dew point mirror, LabJack (T7-PRO) device, Schneider Electric ILS2T stepper motor drive, Elektro-Automatik PSI9000 DC power supply, Rhode & Schwarz RTO 1024 oscilloscope, or the Lab's state-of-the-art Supercube platform, which encapsulates safety components, the voltage source, as well as other auxiliary devices. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/JHGWWN/ Track 3 (Oteiza) Mikołaj Rybiński PUBLISH WGE8NA@@pretalx.com

-WGE8NA

scikit-fdiff, a new tool for PDE solving en

20190904T082500 20190904T095500 1.03000

scikit-fdiff, a new tool for PDE solving

Scikit-FDiff (formerly known as Triflow) is a new tool, written in pure Python, that focus on reducing the time between the developpement of the mathematical model and the numerical solving. It allows an easy and automatic finite difference discretization, thanks to a symbolic processing that can deal with systems of multi-dimensional partial differential equation with complex boundary conditions. Using finite differences and the method of lines, it allows the transformation of the original PDE into an ODE, providing a fast computation of the temporal evolution vector and the Jacobian matrix. The later is pre-computed in a symbolic way and sparse by nature. It can be evaluated with as few computational resources as possible, allowing the use of implicit and explicit solvers at a reasonable cost. Classic ODE solvers have been implemented (or made available from dedicated python libraries), including backward and forward Euler scheme, Crank-Nickolson, explicit Runge-Kutta. More complexes ones, like improved Rosenbrock-Wanner schemes up to the 6th order, are also available. The time-step is managed by a built-in error computation, which ensures the accuracy of the solution. The main goal of the software is to minimize the time spent writting numerical solvers to focus on model development and data analysis. Scikit-Fdiff is then able to solve toy cases in a few line of code as well as complex models. Extra tools are available, such as data saving during the simulation, real-time plotting and post-processing. It has been validated with the shallow-water equation on dam-breaks and the steady-lake case. It has also been applied to heated falling-films, dropplet spread and simple moisture flow in porous medium. PUBLIC CONFIRMED Poster https://pretalx.com/euroscipy-2019/talk/WGE8NA/ Posters at 16:00 Nicolas Cellier PUBLISH CUXPCN@@pretalx.com

-CUXPCN

PhonoLAMMPS: Phonopy with LAMMPS made easy en

20190904T114000 20190904T131000 1.03000

PhonoLAMMPS: Phonopy with LAMMPS made easy

In recent years Phonopy[1] has become a very well known software in the materials science field for calculating the phonon properties of crystals. While Phonopy provides interfaces for many popular First Principles calculations software such as VASP, WIEN2K, SIESTA, etc., the implementation of interfaces for software based on empirical potentials is usually more challenging. This fact is due to the large variability of input structure and potential definitions that these kind of software require in comparison to the ones based on First Principles. In this poster I present PhonoLAMMPS[2], a Phonopy interface with LAMMPS[3] written in python that makes use of the LAMMPS official python API to allow to calculate the interatomic 2nd order force constants from a usual LAMMPS input file. PhonoLAMMPS can be used either as a python module with a similar phonopy-like interface or as a simple comandline script. [1] A. Togo and I. Tanaka, Scr. Mater., 108, 1-5 (2015) [2] https://github.com/abelcarreras/phonolammps [3] S. Plimpton, J Comp Phys., 117, 1-19 (1995) PUBLIC CONFIRMED Poster https://pretalx.com/euroscipy-2019/talk/CUXPCN/ Posters at 16:00 Abel Carreras PUBLISH HUPE99@@pretalx.com

-HUPE99

Really reproducible behavioural paper en

20190904T131500 20190904T144500 1.03000

Really reproducible behavioural paper

In recent years replication crisis in life sciences has received significant attention. Reproducibility of behavioural experiments may be affected by many factors, such as lack of standardisation of experimental conditions or human errors. While use of standardized systems for automated phenotyping (such as _IntelliCage_) leads to interlaboratory replicability of experiments (1), manual analysis of the obtained data still remains a potential source of irreproducibility due to human errors. Luckily, a countermeasurement for that issue is known for more than least twenty years: automation of data analysis with a non-interactive computer program (2). To facilitate development of Python programs for automated analysis of mice behavioural data obtained from IntelliCage system _PyMICE_ library (RRID:nlx\_158570) has been developed. The title paper is the publication presenting the library to the scientific community (3). As it has been written according to literate programming paradigm (4), all programs used for analysing the experimental data are embedded in [the source code of the paper itself](https://github.com/Neuroinflab/PyMICE_SM/) which makes the presented results highly reproducible and the methodology of analysis transparent. # Authors * Jakub M. Dzik, * Alicja Puścian, * Zofia Mijakowska, * Kasia Radwanska, * Szymon Łęski # Bibliography 1. A. Codita, A. H. Mohammed, A. Willuweit, A. Reichelt, E. Alleva, I. Branchi, F. Cirulli, G. Colacicco, V. Voikar, D. P. Wolfer, F. J. U. Buschmann, H.-P. Lipp, E. Vannoni, S. Krackow (2012) Effects of Spatial and Cognitive Enrichment on Activity Pattern and Learning Performance in Three Strains of Mice in the IntelliMaze. Behavior Genetics [doi:10.1007/s10519-011-9512-z](https://dx.doi.org/10.1007/s10519-011-9512-z) 2. J. B. Buckheit, D. L. Donoho (1995) WaveLab and Reproducible Research. Lecture Notes in Statistics. [doi:10.1007/978-1-4612-2544-7\_5](https://dx.doi.org/10.1007/978-1-4612-2544-7\_5) 3. J. M. Dzik, A. Puścian, Z. Mijakowska, K. Radwanska, S. Łęski (2017) PyMICE: A Python library for analysis of IntelliCage data. Behavior Research Methods. [doi:10.3758/s13428-017-0907-5](https://dx.doi.org/10.3758/s13428-017-0907-5) 4. D. E. Knuth (1984) Literate Programming. The Computer Journal. [doi:10.1093/comjnl/27.2.97](https://dx.doi.org/10.1093/comjnl/27.2.97) # Acknowledgement Project funded from the Polish National Science Centre's SYMFONIA (2013/08/W/NZ4/00691) grant. PUBLIC CONFIRMED Poster https://pretalx.com/euroscipy-2019/talk/HUPE99/ Posters at 16:00 Jakub M. Dzik PUBLISH FZTEQ9@@pretalx.com

-FZTEQ9

kESI - a kernel-based method for reconstruction of sources of brain electric activity in realistic brain geometries en

20190904T145000 20190904T162000 1.03000

kESI - a kernel-based method for reconstruction of sources of brain electric activity in realistic brain geometries

Epilepsy affects around 50 million people worldwide (1). 30% of epilepsy cases are drug-resistant and surgical removal of the the neural tissue generating seizures (epileptogenic) may be the only way to prevent seizures. When removing the epileptogenic tissue it is crucial to minimize the lesioned area, because removing too much of the brain may lead to serious impairment of its function. To identify the epileptogenic zone, neurosurgeon typically implants electrode on the cortex (ECoG) or deep in the brain (SEEG). The measured potentials are used as indicators localizing the epileptic source. We argue that reconstruced source of this brain activity are better predictors of areas for resection. Here we present a method - kernel Electrical Source Imaging (kESI) - and its Python implementation which allow reconstruction of current sources taking into account the actual geometry of the patient's brain and the conductivity distribution. This method extends the _kernel Current Source Density_ (kCSD) method (3, 4) to realistic geometries and complex conductivity models. In the poster we present our most recent results in development of Python tools for reconstruction of brain activity and the progress report of kESI development. # Authors * Marta Kowalska, * Jakub M. Dzik, * Chaitanya Chintaluri, * Daniel K. Wójcik # Bibliography 1. World Health Organization, _Epilepsy_, available at: <https://www.who.int/news-room/fact-sheets/detail/epilepsy> 2. Pitts, W. H. (1952), _Investigations on synaptic transmission_, in 'Cybernetics, Trans. 9th Conf. Josiah Macy Foundation H. von Foerster', pp. 159-166. 3. Potworowski, J., Jakuczun, W., Łęski, S. & Wójcik, D. (2012) _Kernel current source density method_. Neural Comput 24(2), 541-575. 4. _Kernel Current Source Density_ <https://github.com/Neuroinflab/kCSD-python> # Acknowledgement Project funded from the Polish National Science Centre's OPUS grant (2015/17/B/ST7/04123). PUBLIC CONFIRMED Poster https://pretalx.com/euroscipy-2019/talk/FZTEQ9/ Posters at 16:00 Jakub M. Dzik Marta Kowalska PUBLISH CW97MN@@pretalx.com

-CW97MN

From Modeler to Programmer en

20190904T162500 20190904T175500 1.03000

From Modeler to Programmer

Boundary conditions are essential for groundwater models. The user can specify values for these boundary conditions such as a well at a certain location with a given pumping rate for a specified duration. For some special applications, however, the specified values may further depend on internal model conditions. For example, the flow rate of an infiltration well that re-infiltrates water is equal to the pumping rate of the extraction well. This can be useful for geothermal applications within groundwater bodies. The newly developed model, ueflow, allows the user to implement such a scheme by writing a plugin. In addition to just using the pumping rate as infiltration rate, the user can incorporate other constrains such as energy costs for pumping, capacities of water treatment facilities, maintenance schedules for pumps based on pumping regimes, or other technical constrains. The poster gives a short overview of ueflow that is based on the finite volume model framework FiPy (Guyer et al. 2009). FiPy is implemented in Python and offers multiple, high-performance solvers as well as several tools for generating grids and other input data. Guyer, J. E., Wheeler, D., Warren, J. A. (2009). FiPy: Partial Differential Equations with Python. Computing in Science & Engineering 11(3) pp. 6—15 (2009), doi:10.1109/MCSE.2009.52, http://www.ctcms.nist.gov/fipy PUBLIC CONFIRMED Poster https://pretalx.com/euroscipy-2019/talk/CW97MN/ Posters at 16:00 Mike Müller PUBLISH LWHPHN@@pretalx.com

-LWHPHN

MNE-Python, a toolkit for neurophysiological data en

20190904T180000 20190904T193000 1.03000

MNE-Python, a toolkit for neurophysiological data

MNE-Python software is an open-source Python package for exploring, visualizing, and analyzing human neurophysiological data such as MEG, EEG, sEEG, ECoG, and more. It includes modules for data input/output, preprocessing, visualization, source estimation, time-frequency analysis, connectivity analysis, machine learning, and statistics. PUBLIC CONFIRMED Poster https://pretalx.com/euroscipy-2019/talk/LWHPHN/ Posters at 16:00 Joan Massich PUBLISH PRGASS@@pretalx.com

-PRGASS

HPC and Python: Intel’s work in enabling the scientific computing community en

20190905T091500 20190905T100000 0.04500

HPC and Python: Intel’s work in enabling the scientific computing community

High Performance Computing (HPC) has been a pillar of the scientific community for years, with many in the Python community contributing to its continued development. However, one of the fundamental links in performance is the relationship between hardware and software. Intel is hard at work on the Intel® Distribution for Python*, producing optimized packages and upstreaming changes to open source that help take advantage of current and future Intel® Architecture, and hardware that is purpose built to target HPC, Machine Learning, and AI workloads. Getting the performance out of these workloads has been a challenging journey, one in which good lessons and learnings were made. From Intel’s Python community contributions to the new architectures Intel created for a generation of more accessible scientific compute, Intel’s work continues on delivering more approachable HPC in Python. PUBLIC CONFIRMED Keynote https://pretalx.com/euroscipy-2019/talk/PRGASS/ Track 1 (Mitxelena) David Liu PUBLISH R3TJLP@@pretalx.com

-R3TJLP

Inside NumPy: preparing for the next decade en

20190905T103000 20190905T110000 0.03000

Inside NumPy: preparing for the next decade

Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know? We will give an overview of the actions we’ve taken, both successful and unsuccessful, to improve sustainability of the NumPy project and its community. We will draw some lessons from a first year of grant-funded activity, discuss key obstacles faced, attempt to quantify what we need to operate sustainably, and present a vision for the project and how we plan to realize it. Topics we will cover include the following: - Invigorating the community - what did we do, and are we correct in our opinion that it invigorated the community? - doing things in the open as much as possible - creating a roadmap - NumPy Enhancement Proposal process - commit rights - in-person meetings - Measuring community/project health. We will use a number of published or proposed metrics to quantify this. Which ones do we think accurately represent the state of the project? - Lessons from the first grant and introducing paid work into a previously fully volunteer-driven project. - What is the best profile for a salaried employee? - Social profile - From inside or outside? - Have we succeeded in encouragin diversity? - A vision for future sustainabity - Models for obtaining and funneling funding PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/R3TJLP/ Track 1 (Mitxelena) Matti Picus PUBLISH 3LXMC8@@pretalx.com

-3LXMC8

Deep Learning without a PhD en

20190905T110000 20190905T113000 0.03000

Deep Learning without a PhD

In this talk, you'll learn how to transition from traditional machine learning tools, like scikit-learn, to deep learning with Keras, TensorFlow, and JAX. No prior experience with machine learning or with deep learning required, and no need to install anything to follow along - all examples will be run on Google Colab. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/3LXMC8/ Track 1 (Mitxelena) Paige Bailey PUBLISH QGZTDZ@@pretalx.com

-QGZTDZ

The Magic of Neural Embeddings with TensorFlow 2 en

20190905T113000 20190905T120000 0.03000

The Magic of Neural Embeddings with TensorFlow 2

Symbols, words, categories etc. need to be converted into numbers before they can be processed by neural networks or used into other ML methods like clustering or outlier detection. It is desirable to have the converted numbers represent semantics of the encoded categories. That means, numbers close to each other indicate similar semantics. In this session you will learn what you need to train a neural network for such embeddings. I will bring a complete example including code that I will share using TensorFlow 2 functional API and the Colab service. I will also share some tricks how to stabilize embeddings when either the model changes or you get more training data. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/QGZTDZ/ Track 1 (Mitxelena) Oliver Zeigermann PUBLISH SKNH3X@@pretalx.com

-SKNH3X

High quality video experience using deep neural networks en

20190905T120000 20190905T123000 0.03000

High quality video experience using deep neural networks

Video compression algorithms result in a reduction of image quality, because of their lossy approach to reduce the required bandwidth. This affects commercial streaming services such as Netflix, or Amazon Prime Video, but affects also video conferencing and video surveillance systems. In all these cases it is possible to improve the video quality, both for human view and for automatic video analysis, without changing the compression pipeline, through a post-processing that eliminates the visual artefacts created by the compression algorithms. In this presentation we show how deep convolutional neural networks implemented in Python using TensorFlow, Scikit-Learn and Scipy can be used to reduce compression artefacts and reconstruct missing high frequency details that were eliminated by the compression algorithm. In particular, we follow an approach based on Generative Adversarial Networks, that in the scientific literature have obtained extremely high quality results in image enhancement tasks. However, to obtain these results, typically, large generators are employed, resulting in high computational costs and processing time, and thus the method can be implemented using GPUs usually available only on desktop machines. In this presentation we show also an architecture that can be used to reduce the computational cost and that can be implemented also on mobile devices. A possible application is to improve video conferencing, or live streaming. Since in these cases there is no original uncompressed video stream available, we report results using no-reference video quality metric showing high naturalness and quality even for efficient networks. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/SKNH3X/ Track 1 (Mitxelena) Marco Bertini Tiberio Uricchio PUBLISH GLSVQA@@pretalx.com

-GLSVQA

In the Shadow of the Black Hole en

20190905T140000 20190905T144500 0.04500

In the Shadow of the Black Hole

The Event Horizon Telescope (EHT) is a global network of millimeter-wavelength radio telescopes that uses Very Long Baseline Interferometry (VLBI) to synthesize the resolution of a single, Earth-sized telescope. In April 2017 the EHT observed the black hole at the center of the giant galaxy M87. Turning these observations into an image required the development of new software tools across the global EHT collaboration, and relied on a wealth of open-source software made available to the broader scientific community. In this talk, I will walk through the entire EHT experiment from the individual telescopes that record the data through the calibration, imaging, and interpretation of the observations that lead to the first-ever direct image of a black hole released to the world on April 10th of this year. PUBLIC CONFIRMED Keynote https://pretalx.com/euroscipy-2019/talk/GLSVQA/ Track 1 (Mitxelena) Sara Issaoun PUBLISH SKAH3U@@pretalx.com

-SKAH3U

A practical guide towards algorithmic bias and explainability in machine learning en

20190905T144500 20190905T151500 0.03000

A practical guide towards algorithmic bias and explainability in machine learning

Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents that have been covered by the media. It is certainly a challenging topic, as it could even be said that the concept of societal bias is inherently biased in itself depending on an individual’s (or group’s) perspective. In this talk we avoid re-inventing the wheel, instead we use traditional methods to simplify this issue so it can be tackled from a practical perspective. # Content In this talk we will cover the high level definitions of bias in machine learning to remove ambiguity, and we will demistify it through a hands on example. Our objective will be to automate the loan approval process for a company using machine learning. This will allow us to go through this challenge step by step, using key tools and techniques from latest research that will allow us to assess and mitigate undesired bias in our machine learning models. # Definitions We will begin by providing a high level definition of undesired bias as two constituent parts: “a-priori societal bias” and “a-posteriori statistical bias”. We will provide tangible examples of how undesired bias is introduced in each step. This initial section will introduce very interesting research findings in this topic. Spolier alert: We will take a pragmatic approach, showing how any non-trivial system will always have an inherent bias, so the objective is not to remove bias, but to make sure 1) you can get as close as possible to your objectives, and 2) you can make sure your objectives are as close as possible to the “ideal solution”. # Process In this talk we introduce a pragmatic process to assess bias in machine learning models through three key steps: 1) Data analysis, 2) Inference result analysis, and 3) Production metrics analysis. For each of these three steps we will walk through a real life example. We will be tasked with the automation of a loan approval process. We will show how some bias may affect our results in a negative way, as well as how we can use various techniques to ensure we perform a reasonable analysis. Our objective is not to show how to completely remove bias from a machine learning model, but instead what are the tools and techniques available, as well as the key touch-points & metrics to ensure the right domain experts are involved. # Topics covered We will cover fundamental topics in data science such as feature importance analysis, class imbalance assessment, model evaluation metrics, partial dependence, feature correlation, etc. More importantly, we will cover how these fundamentals can interact at different touch-points with the right domain experts to ensure undesired bias is identified and documented. All will be covered with a hands on example through a practical jupyter notebook experience. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/SKAH3U/ Track 1 (Mitxelena) Alejandro Saucedo PUBLISH HBHY9Q@@pretalx.com

-HBHY9Q

Tracking migration flows with geolocated Twitter data en

20190905T151500 20190905T154500 0.03000

Tracking migration flows with geolocated Twitter data

Traditionally, migration and refugee flows information is obtained from surveys and border control operatives. Here we propose a method to detect migration flows worldwide using geolocated Twitter data. In particular and as a practical example, we focus on the current migratory crisis in Venezuela. We study if the flows calculated are quantitatively reliable when compared with official numbers at the country level. Our method is versatile and can be used to study different features of migration such as the routes, settlement areas, mobility to more than one country, spatial integration in cities, etc. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/HBHY9Q/ Track 1 (Mitxelena) Antònia Tugores PUBLISH UQHFD8@@pretalx.com

-UQHFD8

Deep Learning for Understanding Human Multi-modal Behavior en

20190905T154500 20190905T160000 0.01500

Deep Learning for Understanding Human Multi-modal Behavior

Multimedia automatic learning has drawn attention from companies and governments for a significant number of applications for automated recommendations, classification, and human brain understatement. In recent years, and an increased amount of research has explored using deep neural networks for multimedia related tasks. Some government security and surveillance applications are automated detections of illegal and violent behaviors, child pornography and traffic infractions. Companies worldwide are looking for content-based recommendation systems that can personalize clients consumption and interactions by understanding the human perception of memorability, interestingness, attractiveness, aesthetics. For these fields like event detection, multimedia affect and perceptual analysis are turning towards Artificial Neural Networks. In this talk, I will present the theory behind multi-modal fusion using deep learning and some open challenges and their state-of-the-art. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/UQHFD8/ Track 1 (Mitxelena) Ricardo Manhães Savii PUBLISH VKDH9K@@pretalx.com

-VKDH9K

How to process hyperspectral data from a prototype imager using Python en

20190905T163000 20190905T164500 0.01500

How to process hyperspectral data from a prototype imager using Python

Our lab specializes in hyperspectral imaging using a spectral imager that combines tunable filters with colour sensors. Compared to simpler, more established imaging systems, this results in some unique challenges for the data processing. Especially, many of the original imaging parameters need to be preserved an d joined with calibration-derived values to actually compute radiance values from the raw sensor data since they are not automatically handled by the hardware. Handling this metadata with the resulting hyperspectral images results in combined datasets of large 3-dimensional datacube, and multiple smaller 2D and 1D arrays with linked dimensions. We have built our solution to this problem utilizing Xarray for handling the multiple arrays of data as well as the existing Dask integration for providing easy parallelization for the required preprocessing. Xarray also provides us many other advantages, such as: * Exploration of very complex multi-dimensional datasets (especially when utilizing holoviews) * Interoperability with the scikit ecosystem * Serialization to NetCDF preserving all the data in a single file However, our extensive and somewhat non-conventional use of Xarray does also bring out it's shortcomings when trying to develop such a library as ours, such as indexing issues with multiple possible overlapping coordinates and performance issues with complex datasets. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/VKDH9K/ Track 1 (Mitxelena) Matti Eskelinen PUBLISH HXH3GN@@pretalx.com

-HXH3GN

Enhancing & re-designing the QGIS user interface – a deep dive en

20190905T164500 20190905T170000 0.01500

Enhancing & re-designing the QGIS user interface – a deep dive

Having been around for two decades, QGIS clearly is an organically grown project. It has primarily been fulfilling the various special needs of its developers. From an outsider's perspective, it is an amazingly rich patchwork of features. However, some are deeply hidden in numerous layers of user interface elements, requiring intense training for getting used to. Others are only accessibly through APIs, requiring not only training but also programming skills. Being confronted with QGIS as professional users on a regular basis, we thought about what would make working with QGIS more attractive. What if QGIS has a pleasant, coherent theme, including not only colors but also icons? What if QGIS had the ability to store workbench configurations? What if QGIS had dedicated interface configurations for specific workflows? What if much more of the API's functionality was accessible through the GUI in a well-organized way? How could QGIS work in a useful manner with ribbons? How could the incredible amount of dialogs be tamed into tabs? We demonstrate (live) a series of user interface experiments – all of which are or will be [available online](https://github.com/qgist) as Python plugins. In this context, the current state of play with respect to Python and QGIS is explained in detail. The way QGIS is typically being distributed puts quite a few unusual limitations on Python plugin code. The case is made that some of those limitations are simply out of date and must be overcome, which may require help from the broader (scientific) Python community. We seek a conversation with the audience. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/HXH3GN/ Track 1 (Mitxelena) Sebastian M. Ernst PUBLISH D7WAFW@@pretalx.com

-D7WAFW

Visual Diagnostics at Scale en

20190905T103000 20190905T110000 0.03000

Visual Diagnostics at Scale

Even with a modestly-sized dataset, the hunt for the most effective machine learning model is *hard*. Arriving at the optimal combination of features, algorithm, and hyperparameters frequently requires significant experimentation and iteration. This leads some of us to stay inside algorithmic comfort zones, some to trail off on random walks, and others to resort to automated processes like gridsearch. But whatever path we take, we are often left in doubt about whether our final solution really is the optimal one. And as our datasets grow in size and dimension, so too does this ambiguity. Fortunately, many of us have developed strategies for steering model search. Open source libraries like [seaborn](https://seaborn.pydata.org/), [pandas](https://pandas.pydata.org/) and [yellowbrick](https://www.scikit-yb.org/en/latest/) can help make machine learning more informed with visual diagnostic tools like histograms, correlation matrices, parallel coordinates, manifold embeddings, validation and learning curves, residuals plots, and classification heatmaps. These tools enable us to tune our models with visceral cues that allow us to be more strategic in our choices. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the multi-dimensional realm in which our models operate. However, large, high-dimensional datasets can prove particularly difficult to explore. Not only do the majority of people struggle to visualize anything beyond two- or three-dimensional space, many of our favorite open source Python tools are not designed to be performant with arbitrarily big data. So how well *do* our favorite visualization techniques hold up to large, complex datasets? In this talk, we'll consider a suite of visual diagnostics — some familiar and some new — and explore their strengths and weaknesses with several publicly available datasets of varying size. Which suffer most from the curse of dimensionality in face of increasingly big data? What are the workarounds (e.g. sampling, brushing, filtering, etc.) and when should we use them? And most importantly, how can we continue to steer the machine learning process — not only purposefully but at scale? PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/D7WAFW/ Track 2 (Baroja) Dr. Rebecca Bilbro PUBLISH H3NTLX@@pretalx.com

-H3NTLX

Histogram-based Gradient Boosting in scikit-learn 0.21 en

20190905T110000 20190905T113000 0.03000

Histogram-based Gradient Boosting in scikit-learn 0.21

scikit-learn 0.21 was recently released and this presentation will give an overview its main new features in general and present the new implementation of Gradient Boosted Trees. Gradient Boosted Trees (also known as Gradient Boosting Machines) are very competitive supervised machine learning models especially on tabular data. Scikit-learn offered a traditional implementation of this family of methods for many years. However its computational performance was no longer competitive and was dramatically dominated by specialized state of the art libraries such as XGBoost and LightGBM. The new implementation in version 0.21 uses histograms of binned features to evaluate the tree node spit candidates. This implementation can efficiently leverage multi-core CPUs and is competitive with XGBoost and LightGBM. We will also introduce pygbm, a numba-based implementation of gradient boosted trees that was used as prototype for the scikit-learn implementation and compare the numba vs cython developer experience. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/H3NTLX/ Track 2 (Baroja) Olivier Grisel PUBLISH EQNGSQ@@pretalx.com

-EQNGSQ

Recent advances in python parallel computing en

20190905T113000 20190905T120000 0.03000

Recent advances in python parallel computing

# Parallel computing in Python: Current state and recent advances *Modern hardware is multi-core*. It is crucial for Python to provide high-performance parallelism. This talk will expose to both data-scientists and library developers the current state of affairs and the recent advances for parallel computing with Python. The goal is to help practitioners and developers to make better decisions on this matter. I will first cover how Python can interface with parallelism, from leveraging external parallelism of C-extensions –especially the BLAS family– to Python's multiprocessing and multithreading API. I will touch upon use cases, e.g single vs multi machine, as well as and pros and cons of the various solutions for each use case. Most of these considerations will be backed by benchmarks from the [scikit-learn](https://scikit-learn.org/stable/) machine learning library. From these low-level interfaces emerged higher-level parallel processing libraries, such as concurrent.futures, [joblib](https://joblib.readthedocs.io/en/latest/) and [loky](https://loky.readthedocs.io/en/latest/) (used by [dask](https://dask.org/) and [scikit-learn](https://dask.org/)) These libraries make it easy for Python programmers to use safe and reliable parallelism in their code. They can even work in more exotic situations, such as interactive sessions, in which Python’s native multiprocessing support tends to fail. I will describe their purpose as well as the canonical use-cases they address. The last part of this talk will focus on the most recent advances in the Python standard library, addressing one of the principal performance bottlenecks of multi-core/multi-machine processing, which is data communication. We will present a [new API](https://docs.python.org/3.8/library/multiprocessing.shared_memory.html) for shared-memory management between different Python processes, and performance improvements for the serialization of large Python objects ([PEP 574](https://www.python.org/dev/peps/pep-0574/), [pickle extensions](https://github.com/cloudpipe/cloudpickle)). These performance improvements will be leveraged by distributed data science frameworks such as dask, [ray](https://ray.readthedocs.io/en/latest/) and [pyspark](https://spark.apache.org/docs/latest/api/python/index.html). PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/EQNGSQ/ Track 2 (Baroja) Pierre Glaser PUBLISH QU88B8@@pretalx.com

-QU88B8

Data sciences in a polyglot world with xtensor and xframe en

20190905T120000 20190905T123000 0.03000

Data sciences in a polyglot world with xtensor and xframe

In this presentation, we demonstrate how xtensor can be used to implement numerical methods very efficiently in C++, with a high-level numpy-style API, and expose it to Python, Julia, and R for free. The resulting native extension operates in-place on Python, Julia, and R infrastructures without overhead. We then dive into the xframe package, a dataframe project for the C++ programming language, exposing an API very similar to Python's xarray. Features of xtensor and xframe will be demonstrated using the xeus-cling jupyter kernel, enabling interactive use of the C++ programming language in the notebook. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/QU88B8/ Track 2 (Baroja) Sylvain Corlay Wolf Vollprecht PUBLISH EDNVGJ@@pretalx.com

-EDNVGJ

Understanding Numba en

20190905T144500 20190905T151500 0.03000

Understanding Numba

In this talk I will take you on a whirlwind tour of Numba, the just-in-time, type-specializing, function compiler for accelerating numerically-focused Python. Numba can compile the computationally intensive functions of your numerical programs and libraries from Python/NumPy to highly optimized binary code. It does this by inferring the data types used inside these functions and uses that information to generate code that is specific to those data types and specialised for your target hardware. On top of that, it does all of this on-the-fly---or just-in-time---as your program runs. This significantly reduces the potential complexity that traditionally comes with pre-compiling and shipping numerical code for a variety of operating systems, Python versions and hardware architectures. All you need in principle, is to `conda install numba` and decorate your compute intensive functions with `@nuba.jit`! This talk will equip you with a mental model of how Numba is implemented and how it works at the algorithmic level. You will gain a deeper understanding of the types of use-cases where Numba excels and why. Also, you will understand the limitations and caveats that exist within Numba, including any potential ideas and strategies that might alleviate these. At the end of the talk you will be in a good position to decide if Numba is for you and you will have learnt about the concrete steps you need to take to include it as a dependency in your program or library. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/EDNVGJ/ Track 2 (Baroja) Valentin Haenel PUBLISH ULT3M7@@pretalx.com

-ULT3M7

PyPy meets SciPy en

20190905T151500 20190905T154500 0.03000

PyPy meets SciPy

PyPy is a fast and compliant implementation of Python. In other words, it's an interpreter for the Python language that can act as a full replacement for the reference interpreter, CPython. It's optimised to enable efficient just-in-time compilation of Python code to machine code, and has releases matching versions 2.7, and 3.6. It now also supports the main pillars of the scientific ecosystem (numpy, Cython, scipy, pandas, ...) thanks to its emulation layer for the C API of CPython. Performance is a major concern for Python programmers. When using CPython, this leads to splitting out the performance-sensitive parts of the computation and rewriting them in a faster, but less convenient, language such as C or Cython. With PyPy, there is no need to choose between clear, Pythonic code and good performance. This talk aims to convince the audience that PyPy should be part of every scientific programmer's toolbox. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/ULT3M7/ Track 2 (Baroja) Ronan Lamy PUBLISH SUFHZT@@pretalx.com

-SUFHZT

High performance machine learning with dislib en

20190905T154500 20190905T160000 0.01500

High performance machine learning with dislib

PyCOMPSs is a distributed programming model and runtime for Python. PyCOMPSs' main goal is to make distributed computing accessible to non-expert developers by providing a simple programming model, and a runtime that automates many aspects of the parallel execution. In addition to this, PyCOMPSs is infrastructure agnostic, and can run on top of a wide range of platforms, from HPC clusters to clouds, and from GPUs to FPGAs. This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs. Inspired by scikit-learn, dislib programming interface is based on the concept of *estimators*. This provides a clean and easy-to-use API that highly increases the productivity of building large-scale machine learning pipelines. Thanks to PyCOMPSs, dislib can run in multiple distributed platforms without changes in the source code, and can handle up to billions of input samples using thousands of CPU cores. This makes dislib a perfect tool for scientists (and other users) that are not machine learning experts, but that still want to extract useful knowledge from extremely large data sets. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/SUFHZT/ Track 2 (Baroja) Javier Álvarez PUBLISH WCXYRQ@@pretalx.com

-WCXYRQ

Can we make Python fast without sacrificing readability? numba for Astrodynamics en

20190905T163000 20190905T164500 0.01500

Can we make Python fast without sacrificing readability? numba for Astrodynamics

We are lucky there are very diverse solutions to make Python faster that have been in use for a while: from wrapping compiled languages (NumPy), to altering the Python syntax to make it more suitable to compilers (Cython), to using a subset of it which can in turn be accelerated (numba). However, each of these options has a tradeoff, and there is no silver bullet. poliastro is a library for Astrodynamics written in pure Python. All its core algorithms are accelerated with numba, which allows poliastro to be decently fast while having minimal code complexity and avoid using other languages. However, even though numba is quite mature as a library and most of the Python syntax and NumPy functions are supported, there are still some limitations that affect its usage. In particular, we strive to offer a high-level API with support for physical units and reusable functions which can be passed as arguments, which sometimes require using complex objects or introspective Python behavior which is not available. In this talk we will discuss the strategies and workarounds we have developed to overcome these problems, and what advanced numba features we can leverage. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/WCXYRQ/ Track 2 (Baroja) Juan Luis Cano Rodríguez PUBLISH 7A3ZQF@@pretalx.com

-7A3ZQF

PSYDAC: a parallel finite element solver with automatic code generation en

20190905T164500 20190905T170000 0.01500

PSYDAC: a parallel finite element solver with automatic code generation

PSYDAC is a Python 3 library for the solution of partial differential equations. Its current focus is on isogeometric analysis using B-spline finite elements, but extensions to other methodologies are under consideration. In order to use PSYDAC, the user defines geometry and model equations in an abstract form using SymPDE, an extension of Sympy that provides the mathematical expressions and checks their semantic validity. Once a finite element discretization has been chosen, PSYDAC maps the abstract concepts into concrete objects, the basic building blocks being MPI-distributed vectors and matrices. Python code is generated for all the computationally intensive operations (matrix and vector assembly, matrix-vector products, etc.), and it is accelerated using either Numba, Pythran, or Pyccel. We present the library design, the user interface, and the performance results. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/7A3ZQF/ Track 2 (Baroja) Yaman Güçlü PUBLISH TXQW9H@@pretalx.com

-TXQW9H

Exceeding Classical: Probabilistic Data Structures in Data Intensive Applications en

20190905T110000 20190905T113000 0.03000

Exceeding Classical: Probabilistic Data Structures in Data Intensive Applications

*Nowadays, research in every scientific domain, from medicine to astronomy, is impossible without processing huge amounts of data to check hypotheses, find new relations, and make discoveries. However, the traditional technologies which include data structures and algorithms, become ineffective or require too many resources. This creates a demand for various optimization techniques, new data processing paradigms, and, finally, appropriate algorithms.* The presentation is dedicated to *probabilistic data structures*, that is a common name for advanced data structures based mostly on different hashing techniques. Unlike classical ones, these provide approximated answers but with reliable ways to estimate possible errors and uncertainty. They are designed for extremely low memory requirements, constant query time, and scaling, the factors that are essential for data applications. It is hard to imagine a branch that requires learning from data, where they cannot be applicable. They are not necessarily new. Probably, everybody knows about the Bloom filter data structure, designed in the 70s, it efficiently solves the problem of performing membership queries (a task to decide whether some element belongs to the dataset or not) in a constant time without requirements to store all elements. This is an example of a probabilistic data structure, but there are much more that have been designed for various tasks in many domains. In this talk, I explain **the five most important problems in data processing** that occurred in different domains but **can be efficiently solved with probabilistic data structures and algorithms**. We cover the *membership querying*, *counting* of unique elements, *frequency* and *rank* estimation in data streams, and *similarity*. Everybody interested in such a topic is welcome to participate in contributing a free and open-source Python (Cython) library called [PDSA](https://github.com/gakhov/pdsa). PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/TXQW9H/ Track 3 (Oteiza) Andrii Gakhov PUBLISH 8K7TA9@@pretalx.com

-8K7TA9

Driving a 30m Radio Telescope with Python en

20190905T113000 20190905T120000 0.03000

Driving a 30m Radio Telescope with Python

The IRAM 30m radio telescope is one of the best in the world. It has been in operation non-stop since the mid 80s and is used to observe 24-hours a day, 365 days a year. All of the high-level telescope control software, monitoring, data archiving as well as some of the data processing software is written in Python. This choice, controversial at first, proved to be extremely successful making the IRAM 30m telescope extremely efficient. This talk will describe how Python is used at the telescope, the reasons behind these choices, lessons learned and future developments. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/8K7TA9/ Track 3 (Oteiza) Francesco Pierfederici PUBLISH F8X9BY@@pretalx.com

-F8X9BY

Matrix calculus with SymPy en

20190905T120000 20190905T123000 0.03000

Matrix calculus with SymPy

The recent popularization of libraries relying on tensor algebra operations has led to a rise in the requirement of computational tools to calculate the gradient and hessian of tensorial expressions. The derivative of a tensor *A* by tensor *B* is the tensor containing all combinations of the elements of *A* derived by the elements of *B*. While tensor derivative operations are commonly supported by most computer algebra systems and frameworks through iterative algorithms, these derivatives can be expressed mathematically in closed-form solutions, which are computationally many orders of magnitude faster. SymPy has been recently extended in order to support the computation of symbolic matrix derivatives, and is currently the only computer algebra system endowed with this feature (lacking even in Wolfram Mathematica). Matrix calculus plays indeed a central role in optimization and machine learning, but was unfortunately often limited to pen on papers or chalk on blackboards. In this talk, we will introduce matrix expressions in SymPy, and address the three ways they can be represented: 1. explicit matrices with symbolic entries, 2. indexed symbols with proper summation convention, 3. implicit matrix expressions. We illustrate the way matrix derivatives are implemented for all three representations, with special emphasis to the third way, the fastest and most elegant. The derived expressions can then be passed to SymPy's code generation utilities and the resulting code can be compared in speed with other frameworks, such as TensorFlow. The support of matrix derivatives can turn SymPy into a simple tool to create the code for optimization algorithms or the code to train machine learning algorithms. The code generation utilities of SymPy are indeed aware of how to export matrix expressions into other programming languages and frameworks. We will give some examples using maximum likelihood estimation and the expectation-maximization algorithms. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/F8X9BY/ Track 3 (Oteiza) Francesco Bonazzi PUBLISH FLM8R7@@pretalx.com

-FLM8R7

VeloxChem: Python meets quantum chemistry and HPC en

20190905T144500 20190905T151500 0.03000

VeloxChem: Python meets quantum chemistry and HPC

Zilvinas Rinkevicius, Xin Li, Olav Vahtras, Manuel Brand, Karan Ahmadzadeh, Magnus Ringholm, Nanna List, and Patrick Norman With the ease of Python library modules, VeloxChem offers a front end to quantum chemical calculations on contemporary high-performance computing (HPC) systems and aims at harnessing the future compute power within the EuroHPC initiative. At the heart of this software lies a module for the evaluation of electron-repulsion integrals (ERIs) using the ObaraSaika recurrence scheme, where a high degree of efficiency is achieved by employing architecture-independent vectorization via OpenMP SIMD pragmas in the auto-generated C++ source code. The software is topology aware and with a Python-controlled work and task flow, the idle time is minimized using an MPI/OpenMP partitioning of resources. In the second software layer, we have implemented a highly accurate SCF start guess based on atomic densities and a first-level of iterations in a reduced version of the user-defined basis set, leading to a very smooth convergence in the subsequent standard DIIS scheme. This layer also includes vectorized and OpenMP/MPI parallelized modules for efficient generation of DFT grid points and weights as well as kernel integration. In the third software layer, we present real and complex response functions as to address dispersive and absorptive molecular properties in spectroscopy. The kernel module in this layer is the iterative linear response equation solver that we have formulated and implemented for a combination of multiple optical frequencies and multiple perturbation operators. With efficient use of computer memory, we enable the simultaneous reference to, and solving of, in the order of 1,000 response equations for sizable biochemical systems without spatial symmetry, and we can thereby determine electronic response spectra in arbitrary wavelength regions, including UV/vis and X-Ray, without resolving the sometimes embedded excited states in the spectrum. E.g. the electronic CD spectrum (involving the Cartesian sets of electric and magnetic perturbations) over a range of some 10 eV is obtained at a computational cost comparable to that of determining the transition energy of the lowest excited state, or optimizing the electronic structure of the reference state. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/FLM8R7/ Track 3 (Oteiza) Olav Vahtras PUBLISH YJQH7M@@pretalx.com

-YJQH7M

emzed: a Python based framework for analysis of mass-spectrometry data en

20190905T151500 20190905T154500 0.03000

emzed: a Python based framework for analysis of mass-spectrometry data

Many of the existing mass spectrometry data analysis tools are desktop applications designed for specific applications without support for customization. In addition, many of the commercial solutions offer no or only limited functionality for exporting results. In addition, the existing programming libraries in this area are scattered across different languages, mostly R, Java and Python. As a result, data analysis in this area often consists of manual import/export steps from/to various tools and self-developed scripts that prevent the reproducibility of results obtained or automated execution on high-performance infrastructures. emzed tries to avoid these problems by integrating existing libraries and tools from Python, R (and in the near future also Java) into an easy-to-use API. To support workflow development and increase confidence in end results emzed also offers tools for interactive visualization of mass spectrometry related data structures. The presentation introduces basics and concepts of emzed, some lessons learned and current development of the next version of emzed. PUBLIC CONFIRMED Talk (long) https://pretalx.com/euroscipy-2019/talk/YJQH7M/ Track 3 (Oteiza) Uwe Schmitt PUBLISH PUCWVY@@pretalx.com

-PUCWVY

vtext: fast text processing in Python using Rust en

20190905T154500 20190905T160000 0.01500

vtext: fast text processing in Python using Rust

Scientific Python has historically relied on compiled extensions for performance critical parts of the code. In this talk, we outline how to write Rust extensions for Python using [rust-numpy](https://github.com/rust-numpy/rust-numpy), project. Advantages and limitations of this approach as compared to Cython or wrapping Fortran, C or C++ are also discussed. In the second part, we introduce the [vtext](https://github.com/rth/vtext) project that allows fast text processing in Python using Rust. In particular, we consider the problems of text tokenization, and (parallel) token counting resulting in a sparse vector representation of documents. These can then be used as input in machine learning or information retrieval applications. We outline the approach used in vtext and compare to existing solutions of these problems in the Python ecosystem. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/PUCWVY/ Track 3 (Oteiza) Roman Yurchak PUBLISH 7EYS3W@@pretalx.com

-7EYS3W

pystencils: Speeding up stencil computations on CPUs and GPUs en

20190905T163000 20190905T164500 0.01500

pystencils: Speeding up stencil computations on CPUs and GPUs

[Interactive Notebooks are available here](https://mybinder.org/v2/gh/mabau/pystencils/master?filepath=doc%2Fnotebooks). Many operations on structured arrays can be formulated as stencil codes, where the update of one array cell depends only on values in its local neighborhood. Stencil codes arise in many different fields, for example in image processing or in computational fluid dynamics by discretizing partial differential equations (PDEs) using finite differences or finite volume schemes. We present the [pystencils](https://i10git.cs.fau.de/pycodegen/pystencils) package that allows for fast execution of stencil codes on numpy arrays using code generation techniques. The stencil is formulated in sympy and transformed into an intermediate representation (IR). *pystencils* comes with a set of optimizing transformations that can be applied on this IR, for example cache blocking or explicit SIMD vectorization with intrinsics. The intermediate representation is transformed into C or CUDA code and automatically loaded as a C extension module. This approach yields highly efficient implementations, outperforming current acceleration techniques like Cython or numba. Additionally, together with the [waLBerla](https://www.walberla.net/) package, the resulting stencil codes can be run on large computing clusters, using MPI parallelization. *pystencils* also comes with functions to automatically derive the sympy-based stencil representation from a continuous PDE. Symbolic, continuous differential operators are automatically discretized by finite difference schemes of arbitrary order. We show two examples of large-scale setups run with *pystencils*: a phase-field method simulating solidification of alloys and a CFD simulation based on the lattice-Boltzmann method. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/7EYS3W/ Track 3 (Oteiza) Martin Bauer PUBLISH LE7AAH@@pretalx.com

-LE7AAH

TelApy a Python module to compute free surface flows and sediments transport in geosciences en

20190905T164500 20190905T170000 0.01500

TelApy a Python module to compute free surface flows and sediments transport in geosciences

This talk is focused on the application of TelApy module (www.opentelemac.org). TelApy aims to provide a Python wrapper of TELEMAC-MASCARET API (Application Program Interface). The goal of TelApy is to have a full control on the simulation while running a case. For example, it must allow the user to stop the simulation at any time step, get values of some variables and change them. In order to make this possible, a Fortran structure called instantiation was developed with the API. It contains a list of strings pointing to TELEMAC variables. This gives direct access to the physical memory of variables, and allows therefore to get and set their values. Furthermore, changes have been made in TELEMAC-MASCARET main subroutines to make hydraulic cases execution possible time step by time step. It is useful to drive the TELEMAC-MASCARET SYSTEM APIs using Python programming language. In fact, Python is a portable, dynamic, extensible, free language, which allows (without imposing) a modular approach and object oriented programming. In addition of benefits of this programming language, Python offers a large amounts of interoperable libraries. The link between various interoperable libraries with TELEMAC-MASCARET SYSTEM APIs allows the creation of an ever more efficient computing chain able to more finely respond to various complex problems. Therefore, the TelApy module has the ambition to enable a new way of use for the TELEMAC-MASCARET system. In particular one can think about high performance computing for the calculation of uncertainties, optimization, code coupling and so on. The objectives of this talk is to present some examples of the TelApy module in the case of Uncertainty Quantification, Optimization, Reduced Order Model. PUBLIC CONFIRMED Talk https://pretalx.com/euroscipy-2019/talk/LE7AAH/ Track 3 (Oteiza) yoann audouin