EuroSciPy 2019

0.10 EuroSciPy 2019 euroscipy-2019 2019-09-02 2019-09-06 5 00:05 https://pretalx.com https://pretalx.com/media/euroscipy-2019/img/euroscipy_logo.png UTC Track 2 (Baroja) Getting Started with JupyterLab Tutorial 2019-09-02T09:00:00+00:00 09:00 01:30 JupyterLab is used for essentially all other tutorials at EuroSciPy. This tutorial gives an overview over the basic functionality and shows how to use some of the many tools it provides to simplify your Python programming workflow. euroscipy-2019-1396-getting-started-with-jupyterlab Mike Müller en This tutorial is hands-on. It is designed for participants who haven't used the JupyterLab yet or have only minimal experience with it. Participant will work along with the trainer and learn how a Jupyter Notebook work by using some basic features. Some of the topics are: * Client-server concept * How cells work * Basic markdown * Magic commands overview * Some magic commands in more detail * Debugging basics * Basic timing and profiling * Extensions * History of variables * Saving to files * and more There will be room for questions during the tutorial as well as a dedicated FAQ session at the end. After this tutorial participants should be able to comfortably follow the other tutorials that are delivered with a Jupyter Notebook. # Requirements and set up instructions Training will be doe wit Python 3.7 and the latest Jupyter Lab version. * Install Anaconda alternatively * Install Miniconda and `conda install jupyterlab` alternatively * Create a new conda environment: + `conda create -n jupyterlabtutorial python=3.7 jupyterlab` and activate it with + `conda activate jupyterlabtutorial` false https://pretalx.com/euroscipy-2019/talk/QPKHMG/ https://pretalx.com/euroscipy-2019/talk/QPKHMG/feedback/ Track 2 (Baroja) Never get in a battle of bits without ammunition Tutorial 2019-09-02T11:00:00+00:00 11:00 01:30 The `numpy` package takes a central role in Python scientific ecosystem. This is mainly because `numpy` code has been designed with high performance in mind. This tutorial will introduce the main features of in numpy in `90` mins. euroscipy-2019-2502-never-get-in-a-battle-of-bits-without-ammunition Valerio Maggio en # Outline **Part 1** Numpy Basics - Introduction to NumPy Arrays - numpy internals schematics - Reshaping and Resizing - Numerical Data Types - Record Array **Part 2** Indexing and Slicing - Indexing numpy arrays - fancy indexing - array masking - Slicing & Stacking - Vectorization & BroadCasting **Part 3** "Advanced" NumPy - Serialisation & I/O - `.mat` files - Array and Matrix - Matlab compatibility - Memmap - Bits of Data Science with NumPy - NumPy beyond classic `numpy` ### Python version The minimum recommended version of Python to use for this tutorial is **Python 3.5**, although Python 2.7 should be fine, as well as previous versions of Python 3. Py3.5+ is recommended due to a reference to the `@` operator in the linear algebra notebook. false https://pretalx.com/euroscipy-2019/talk/KRNP7Y/ https://pretalx.com/euroscipy-2019/talk/KRNP7Y/feedback/ Track 2 (Baroja) Introduction to pandas Tutorial 2019-09-02T14:00:00+00:00 14:00 01:30 This tutorial is an introduction to pandas for people new to it. We will cover how to open datasets, perform some analysis, apply some transformations and visualize the data euroscipy-2019-2641-introduction-to-pandas Marc Garcia en pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal. This tutorial will use couple of example data sets to show what pandas can do, and get an idea on how to work with data using pandas. It is recommended to bring your own laptop with the latest version of Anaconda, pandas, Jupyter, and the repository of the tutorial cloned. See the exact instructions here: https://github.com/datapythonista/pandas-tutorials false https://pretalx.com/euroscipy-2019/talk/G7CTX8/ https://pretalx.com/euroscipy-2019/talk/G7CTX8/feedback/ Track 3 (Oteiza) Hands-on TensorFlow 2.0 Tutorial 2019-09-02T09:00:00+00:00 09:00 01:30 A hands-on introduction to TensorFlow 2.0 at an intermediate difficulty level, with code examples for Deep Dream, Style Transfer, and Image Colorization. euroscipy-2019-1389-hands-on-tensorflow-2-0 Josh Gordon en A hands-on introduction to TensorFlow 2.0 at an intermediate difficulty level. In this 90 minute tutorial, we will briefly introduce TensorFlow 2.0, then dive in to writing code. We will complete four short exercises on Deep Dream, Style Transfer, Image colorization, and GANs (if time allows). This tutorial is intermediate level, for folks with prior Deep Learning experience. You will need a laptop with an internet connection, there is nothing to install in advance. false https://pretalx.com/euroscipy-2019/talk/A8KBUB/ https://pretalx.com/euroscipy-2019/talk/A8KBUB/feedback/ Track 3 (Oteiza) Deep Diving into GANs: From Theory to Production with TensorFlow 2.0 Tutorial 2019-09-02T11:00:00+00:00 11:00 01:30 GANs are one of the hottest topics in the ML arena; however, they present a challenge for the researchers and the engineers alike. This workshop will guide you through both the theory and the code needed to build a GAN and put into production. euroscipy-2019-1796-deep-diving-into-gans-from-theory-to-production-with-tensorflow-2-0 Michele "Ubik" De SimoniPaolo GaleoneFederico Di MattiaEmanuele Ghelfi en GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production. The workshop aims at providing a complete understanding of both the theory and the practical know-how to code and deploy this family of models in production. By the end of it, the attendees should be able to apply the concepts learned to other models without any issues. We will be showcasing all the shiny new APIs introduced by TensorFlow 2.0 by showing how to build a GAN from scratch and how to "productionize" it by leveraging the AshPy Python package that allows to easily design, prototype, train and export Machine Learning models defined in TensorFlow 2.0. ------ The workshop is composed of - Theoretical introduction - GANs from Scratch in TensorFlow 2.0 - High-performance input data pipeline with TensorFlow Datasets - Introduction to the AshPy API - Implementing, training, and visualizing DCGAN using AshPy - Serving TF2 Models with Google Cloud Functions The materials of the workshop will be openly provided via GitHub (https://github.com/zurutech/gans-from-theory-to-production) prior to the event and will be run on Colab leveraging the free GPU **Note**: the workshop requires Python 3.7 to run, therefore the colab support is still uncertain. The attendees are encouraged to bring their own devices with Python 3.7 installed and ready to use. ## Requirements and set up instructions Two options available: 1. (recommended). Use Google Colab & Binder. Every notebook has a button to lunch the correct tool. Just use it. 2. Local setup: follow the instructions in the README https://github.com/zurutech/gans-from-theory-to-production false https://pretalx.com/euroscipy-2019/talk/Q79NND/ https://pretalx.com/euroscipy-2019/talk/Q79NND/feedback/ Track 3 (Oteiza) Create CUDA kernels from Python using Numba and CuPy. Tutorial 2019-09-02T14:00:00+00:00 14:00 01:30 We'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library. euroscipy-2019-1494-create-cuda-kernels-from-python-using-numba-and-cupy- Valentin Haenel en ### Abstract We'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library. Numba is an open source compiler that can translate Python functions for execution on the GPU without requiring users to write any C or C++ code. Numba's just-in-time compilation ability makes it easy to interactively experiment with GPU computing in the Jupyter notebook. Combining Numba with CuPy, a nearly complete implementation of the NumPy API for CUDA, creates a high productivity GPU development environment. Learn the basics of using Numba with CuPy, techniques for automatically parallelizing custom Python functions on arrays, and how to create and launch CUDA kernels entirely from Python. Access to appropriate hardware will be provided in the form of access to GPU based cloud resources. ### Libraries * https://numba.pydata.org/ * https://cupy.chainer.org/ ### Requirements and set up instructions * Cloud based access to GPUs will be provided, please bring a laptop with an operating system and a browser. Chrome is usually fine. false https://pretalx.com/euroscipy-2019/talk/L8LMQR/ https://pretalx.com/euroscipy-2019/talk/L8LMQR/feedback/ Track 3 (Oteiza) Speed up your python code Tutorial 2019-09-02T16:00:00+00:00 16:00 01:30 In this tutorial we will see how to profile and speed up Python code, from a pure Python implementation to an optimized Cython code. euroscipy-2019-1455-speed-up-your-python-code Jérémie du Boisberranger en Through a simple example we will see how to optimize Python code. First we will introduce a few tools to profile and visualize the performances of our code, such as Perf and SnakeViz. Then we will incrementally optimize our code using Cython, a lower level compiled language designed to make a bridge between C and Python. As an alternative, we will also use Numba, a Python just in time compiler. Finally, we will see how to parallelize our code to speed it up further. false https://pretalx.com/euroscipy-2019/talk/MNAGWC/ https://pretalx.com/euroscipy-2019/talk/MNAGWC/feedback/ Track4 (Chillida) 3D image processing with scikit-image Tutorial 2019-09-02T09:00:00+00:00 09:00 01:30 This tutorial will introduce how to analyze three dimensional stacked and volumetric images in Python, mainly using scikit-image. euroscipy-2019-2696-3d-image-processing-with-scikit-image Alexandre de Siqueira en This tutorial will introduce how to analyze three dimensional stacked and volumetric images in Python, mainly using scikit-image. We start the tutorial checking a brief overview of scikit-image and how it relates to packages in the scientific Python ecosystem, such as NumPy, SciPy and matplotlib. Then, we discuss how to process two and three dimensional data through several steps: first, we will pre-process the data using filtering, binarization and segmentation techniques. After that, we cover how to inspect, count and measure attributes of objects and regions of interest in the data. At the end, we present the visualization of large 3D data. Real-world examples are given from domains such as materials science and biology. false https://pretalx.com/euroscipy-2019/talk/DU9CAN/ https://pretalx.com/euroscipy-2019/talk/DU9CAN/feedback/ Track4 (Chillida) Reproducible Data Science in Python Tutorial 2019-09-02T11:00:00+00:00 11:00 01:30 In this tutorial, we will take a detailed look at the concept of _reproducibility_, survey the landscape of existing solutions, and, using one solution in particular, [Renku](https://renkulab.io), we will do some hands-on work. euroscipy-2019-1406-reproducible-data-science-in-python Chandrasekhar RamakrishnanRok Roškar en The expectation of reproducibility in scientific work has been established for several hundred years, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are now a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. One source of confusion is simply the number of available options. Beyond that, the term "reproducibility" can mean multiple things, making it difficult to compare tools. In this tutorial, we will examine _reproducibility_ from the perspective of the philosophy of science. That will give us the concepts and vocabulary necessary to precisely understand and discuss different definitions of the term and allow us to identify the technologies that provide the building blocks for reproducible data science. We will briefly survey the landscape of existing solutions and then spend the remaining time looking at one solution in particular, Renku, which we will use to work end-to-end through a reproducible data-science scenario. * 0:00 - 0:35 Introduction & Background * 0:00 - 0:15 Reproducibility, a philosophy of science perspective * Overview of reproducibility issues in different domains of science (Nature 2016 survey results) * Definition of different degrees of reproducibility: _Reproducibility_, _replicability_, and _repeatability_ * Examine the function of reproducibility in the scientific process * 0:15 - 0:25 Building blocks for reproducibility: clean code, workflow automation, version control, containerization, provenance tracking * 0:25 - 0:35 Survey of the Tool Landscape: Binderhub, Pachyderm, Beaker, Gigantum, Whole Tale, SingularityHub, DVC, Stencila, dotscience, amie, CodeOcean, Renku * 0:35 - 1:30 Hands-on session with Renku where we will develop a typical data-science use-case, focusing on the building blocks of reproducibility along the way. ## Requirements and set up instructions We will run the tutorial on https://renkulab.io so please register and create an account following [these instructions](https://github.com/SwissDataScienceCenter/reproducible-data-science/blob/master/README-renkulab.md). To follow along with the slides, go [here](https://github.com/SwissDataScienceCenter/reproducible-data-science/blob/euroscipy2019/presentation/index.ipynb) false https://pretalx.com/euroscipy-2019/talk/TQH9FG/ https://pretalx.com/euroscipy-2019/talk/TQH9FG/feedback/ Track4 (Chillida) Building data pipelines in Python: Airflow vs scripts soup Tutorial 2019-09-02T14:00:00+00:00 14:00 01:30 In this workshop, you will learn how to migrate from ‘scripts soups’ (a set of scripts that should be run in a particular order) to robust, reproducible and easy-to-schedule data pipelines in Airflow. euroscipy-2019-1394-building-data-pipelines-in-python-airflow-vs-scripts-soup Dr. Tania Allard en ## Introduction (5 minutes) Format: presentation Go over the agenda List the relevant resources Make sure everyone has followed the installation instructions ## Intro to data pipelines Format: presentation Go over the components of traditional data science pipelines Presentation of the scripts soup anttipatern ## Creating a script soup Format: hands-on The attendees will perform an ETL task on some data using a set of independent scripts. In this exercise, I will provide and explain the code and explain what we are trying to achieve with this pseudo-pipeline. The attendees will have a chance to try and reproduce it themselves. ## Introduction to Airflow and DAGS Format: presentation Introduce the concept of DAGs (directed acyclic graphs) Present and introduce the components of Airflow Airflow documentation ## Set up a local instance of Airflow Format: hands-on The attendees will create a local instance of Airflow and explore the sample DAGS provided. They will be introduced to the scheduling capabilities of the tool and track the status of the pipelines using the web GUI. ## ETL task on Airflow Format: hands-on I will provide hints on how to transform the scripts soup into Airflow DAGS. For this, I will use the pseudo code and other pedagogical approaches inspired by the software carpentry lessons to direct the attendees to the deployment of their first DAG in Airflow. ## Wrap up and questions Format: Q&A ## Setup <https://opendata-airflow-tutorial.readthedocs.io/en/latest/setup.html> false https://pretalx.com/euroscipy-2019/talk/3MG8K3/ https://pretalx.com/euroscipy-2019/talk/3MG8K3/feedback/ Track4 (Chillida) Performing Quantum Measurements in QuTiP Tutorial 2019-09-02T16:00:00+00:00 16:00 01:30 Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you! No previous quantum mechanics experience required! euroscipy-2019-1390-performing-quantum-measurements-in-qutip Simon Cross en Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you! No previous quantum mechanics experience required. It will be helpful to be comfortable with Python and only a little scared of matrix multiplication. The goal of the workshop is for each participant to: * Understand what a qubit is * Be able to create a 1-qubit state * Be able to measure a 1-qubit state * Be able to create a 2-qubit state * Be able to create an entangled 2-qubit state * Be able to measure part of an entangled state * Be able to teleport part a qubit using an entangled state To each of these please add "in Python with QuTiP" and "with a good understanding of what they're doing". The target audience is people who are: * interested in quantum mechanics but are not experts * comfortable with Python basics * only a little scared of matrix multiplication (have learnt it at some point, even if they don't remember it well now) false https://pretalx.com/euroscipy-2019/talk/J3HEDH/ https://pretalx.com/euroscipy-2019/talk/J3HEDH/feedback/ Track 2 (Baroja) A Tour of the Data Visualization Ecosystem of Python Tutorial 2019-09-03T09:00:00+00:00 09:00 01:30 The tutorial will be a a tour of the getting-started how-tos of the major Python data visualization libraries such as Yt-Project, Seaborn, Altair, Plotly euroscipy-2019-2383-a-tour-of-the-data-visualization-ecosystem-of-python Giovanni De Gasperis en Python and it ecosystem is used nowadays in many scientific context as an advanced data visualization tool. There a wide variety of visualization libraries. The tutorial will focus on primarly on : * [Yt](https://yt-project.org) * [Seaborn](https://seaborn.pydata.org) * [Altair](https://altair-viz.github.io) * [Plotly](https://plot.ly) For each one it will be shown how to use it in Jupyter, exploring the getting started examples, and letting the audience propose data set to visualize. At the end of the tutorial, the participants will fill a pros/cons table with an online voting mechanism. If time will allow, a short view of other libraries may be included. false https://pretalx.com/euroscipy-2019/talk/RHUPZ3/ https://pretalx.com/euroscipy-2019/talk/RHUPZ3/feedback/ Track 2 (Baroja) Introduction to SciPy Tutorial 2019-09-03T11:00:00+00:00 11:00 01:30 SciPy is a comprehensive library for scientific computing and one of the central components of the scientific Python ecosystem. As most of its functionality naturally involves NumPy arrays, SciPy works hand in hand with NumPy. euroscipy-2019-2615-introduction-to-scipy Gert-Ludwig Ingold en SciPy covers a broad variety of typical numerical tasks encountered in scientific computing ranging from the statistical analysis of data, curve fitting, and fast Fourier transform to numerical integration and special functions to name just a few topics. To avoid reinventing the wheel, it is always a good idea to check whether a desired functionality is already provided by SciPy. In the main part of the tutorial, we will demonstrate how some real-world data taken with a smartphone can be analyzed by means of SciPy. #### Installation instructions The tutorial requires the following packages on top of a Python 3 installation: * numpy * scipy * matplotlib * jupyter Any recent version of the [Anaconda distribution](https://anaconda.org) should allow to run the Jupyter notebooks used in this tutorial (see below) just fine. If you do not have the Anaconda distribution installed and are not short of disk space and want to do scientific work with Python, seriously consider installing it. It is free and pretty straightforward to install. Alternatively, you can install miniconda and build a specific environment `euroscipy-scipy-tutorial` for the tutorial by running ``` conda env create -f environment.yml ``` with the `environment.yml` file provided in the [repository of this tutorial](https://github.com/gertingold/euroscipy-scipy-tutorial). For more detailed instruction on how to create a conda environment, see the [conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). Note that you need to activate the environment by means of ``` conda activate euroscipy-scipy-tutorial ``` Finally, it nothing else works, the notebooks can also be run on [binder](https://mybinder.org/v2/gh/gertingold/euroscipy-scipy-tutorial/master?filepath=notebooks) (provided wifi is available during the tutorial session). #### Get the tutorial notebooks Unless you are using binder, you will need the notebooks of the tutorial to actively follow along. You can either clone the repository [gertingold/euroscipy-scipy-tutorial](https://github.com/gertingold/euroscipy-scipy-tutorial) or go to https://github.com/gertingold/euroscipy-scipy-tutorial/archive/master.zip to download a zipped version of the repository. All files needed during the tutorial are located in the directory `notebooks`. false https://pretalx.com/euroscipy-2019/talk/WSNPK7/ https://pretalx.com/euroscipy-2019/talk/WSNPK7/feedback/ Track 2 (Baroja) Introduction to scikit-learn: from model fitting to model interpretation Tutorial 2019-09-03T14:00:00+00:00 14:00 01:30 We will present scikit-learn by focusing on the available tools used to train a machine-learning model. Then, we will focus on the challenge linked to model interpretation and the available tools to understand these models. euroscipy-2019-1407-introduction-to-scikit-learn-from-model-fitting-to-model-interpretation Guillaume LemaitreOlivier Grisel en Our introduction to scikit-learn will be subdivided into 2 parts. We will give a general introduction to scikit-learn presenting basic concepts around cross-validation, pipeline estimator, and hyperparameter search. Then, we will focus on model interpretation presenting the challenges and the available tools to understand a trained machine-learning model: partial independence plot, features importance, LIME, shapley values, etc. false https://pretalx.com/euroscipy-2019/talk/XXJGGG/ https://pretalx.com/euroscipy-2019/talk/XXJGGG/feedback/ Track 3 (Oteiza) Sufficiently Advanced Testing with Hypothesis Tutorial 2019-09-03T09:00:00+00:00 09:00 01:30 Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes. euroscipy-2019-1803-sufficiently-advanced-testing-with-hypothesis Zac Hatfield-Dodds en Hypothesis is a testing package that will search for counterexamples to your assertions – so you can write tests that provide a high-level description of your code or system, and let the computer attempt a Popperian falsification. If it fails, your code is (probably) OK… and if it succeeds you have a minimal input to debug. Come along and learn the principles of property-based testing, how to use Hypothesis, and how to use it to check scientific code – whether highly-polished or quick-and-dirty! You can even use it to test 'black boxes', such as simulations, where we have no way of independently verifying that some input leads to the right output! Intrigued? Come and learn about the power of embedding assertions in your code, and metamorphic relations in your tests! false https://pretalx.com/euroscipy-2019/talk/ZHQALW/ https://pretalx.com/euroscipy-2019/talk/ZHQALW/feedback/ Track 3 (Oteiza) Effectively using matplotlib Tutorial 2019-09-03T11:00:00+00:00 11:00 01:30 It can sometimes be difficult and frustrating to know how to achieve a desired plot. – Have you made this experience as well? Then this tutorial is for you. It will make you more effective and help you generate better looking plots. euroscipy-2019-1823-effectively-using-matplotlib Tim Hoffmann en Matplotlib is one of the most-used and powerful visualization libraries for python. Nevertheless, there has been and still is some confusion on how use it properly. This has a number of reasons ranging from an evolution of the API and lack of good documentation to the complexity that comes with the large feature set and flexibility. But these issues can be overcome. This tutorial will explain the main concepts and intended usage patterns of matplotlib. Knowing these, lets you effectively use high-level functions for most of the cases. But you will be able to go into the details if you need to fine-tune certain aspects of the plot. We'll also touch some nowadays discouraged ways of working from the past (you should know what not to do - even though that's still found in lots of examples on the web) and we may get a glimpse into the future. Tim Hoffmann joined the matplotlib core development team almost two years ago with the mission to make matplotlib easier to use. *Requirements and set up instructions:* Jupyter plus any recent (>=3.0) matplotlib version will do. To be on the safe side, you may set up a new conda environment using `conda create -n using-mpl matplotlib>=3 jupyterlab pandas ipympl`. Link to tutorial notebook will be posted here soon. false https://pretalx.com/euroscipy-2019/talk/M3RZXE/ https://pretalx.com/euroscipy-2019/talk/M3RZXE/feedback/ Track 3 (Oteiza) CFFI, Ctypes, Cython, Cppyy: how to run C code from Python Tutorial 2019-09-03T14:00:00+00:00 14:00 01:30 Python is flexible, C and C++ are fast. How to use them together? There are many ways to call C code from Python, we will learn about the major ones, find out when you would prefer to use one over the other. euroscipy-2019-1161-cffi-ctypes-cython-cppyy-how-to-run-c-code-from-python Matti Picus en Using the Jupyter notebook and a compiler, we will start with a pure python implementation of a mandlebrot image. Then we will write the computationally heavy part of the code in C, and learn how to call it from Ctypes (part of the Python standard library), CFFI (a newer and better Ctypes alternative), Cython (a compiler from Python to C), and CPPYY (like Ctypes and CFFI, but for C++). Along the way we will stop to reflect on the advantages and disadvantages of each technique in terms of speed of development, runtime overhead, maintainability, and readability. The participants will come away with an understanding of the tools, their strengths and weaknesses, and how to use them. Please be sure you have a computer with anaconda python installed and a compiler (for windows users - Visual Studio 2019 is recommended. Others should have a functioniong gcc or clang). You should also download the [git repo](https://github.com/mattip/c_from_python) and be sure you can run the first few cells that involve compilation (before the `ctypes` discussion). Also please be sure to preinstall [`cppyy`](https://pypi.org/project/cppyy/). false https://pretalx.com/euroscipy-2019/talk/NQMWSX/ https://pretalx.com/euroscipy-2019/talk/NQMWSX/feedback/ Track 3 (Oteiza) kCSD - a Python package for reconstruction of brain activity Tutorial 2019-09-03T16:00:00+00:00 16:00 01:30 _kCSD_ is a Python package for localization of sources of brain electric activity based on recorded electric potentials. euroscipy-2019-1447-kcsd-a-python-package-for-reconstruction-of-brain-activity Marta KowalskaJakub M. Dzik en Electric potential measured in the brain is generated by transmembrane ionic currents of neural cells. Due to the long range of electric field simultaneously recorded extracellular potential - EEG, local field potential (LFP) - at different places are typically strongly correlated which complicates their analysis. It is thus useful to reconstruct their current sources which in practice means solving Poisson equation. The first method for estimation of _Current Source Density_ (CSD) from measured potentials was proposed in the early 1950s (1). Despite some developments, a number of limitations were present until recently, in particular, most previous methods required recordings with regular grids of electrodes and overfitted to noise. The _kernel Current Source Density_ method (kCSD) developed in 2012 (2) uses kernel methods to estimate the potential and CSD in the whole space, from arbitrary distribution of electrodes using regularization to minimize the influence of noise on reconstruction. In this tutorial we will demonstrate kCSD-python package (3) which allows reconstruction of CSD in different dimensions. After this tutorial you will be able to: * estimate the distribution of current sources based on the exact values of the electric field potentials, * deal with measurement noise, * diagnose the quality of the obtained reconstruction. # Requirements: * Python 2.7/3.4+ environment (Anaconda with Jupyter Notebook recommended), * numpy, scipy, matplotlib packages installed, * kcsd package installed or possibility to download it from GitHub (4) (network connection etc.). # Authors * Chaitanya Chintaluri, * Marta Kowalska, * Michał Czerwiński, * Władysław Średniawa, * Joanna Jędrzejewska-Szmek, * Daniel K. Wójcik # Bibliography 1. Pitts, W. H. (1952), _Investigations on synaptic transmission_, in 'Cybernetics, Trans. 9th Conf. Josiah Macy Foundation H. von Foerster', pp. 159-166. 2. Potworowski, J., Jakuczun, W., Łęski, S. & Wójcik, D. (2012) _Kernel current source density method_. Neural Comput 24(2), 541-575. 3. _Kernel Current Source Density_ <https://github.com/Neuroinflab/kCSD-python> # Acknowledgement Project funded from the Polish National Science Centre's SYMFONIA (2013/08/W/NZ4/00691) and OPUS (2015/17/B/ST7/04123) grants. false https://pretalx.com/euroscipy-2019/talk/HVEBGU/ https://pretalx.com/euroscipy-2019/talk/HVEBGU/feedback/ Track4 (Chillida) Introduction to geospatial data analysis with GeoPandas and the PyData stack Tutorial 2019-09-03T09:00:00+00:00 09:00 01:30 This tutorial is an introduction to geospatial data analysis, with a focus on tabular vector data using GeoPandas. It will show how GeoPandas and related libraries can improve your GIS workflow and fit nicely in the traditional PyData stack. euroscipy-2019-1433-introduction-to-geospatial-data-analysis-with-geopandas-and-the-pydata-stack Joris Van den Bossche en This tutorial is an introduction to geospatial data analysis in Python, with a focus on tabular vector data using GeoPandas. The content focuses on introducing the participants to the different libraries to work with geospatial data and will cover munging geo-data and exploring relations over space. This includes importing data in different formats (e.g. shapefile, GeoJSON), visualizing, combining and tidying them up for analysis, and will use libraries such as pandas, geopandas, shapely, pyproj, matplotlib, cartopy, ... The tutorial will cover the following topics, each of them using Jupyter notebooks and hands-on exercises with real-world data: 1. Introduction to vector data and GeoPandas 2. Visualizing geospatial data 3. Spatial relationships and operations 4. Spatial joins and overlays Materials of previous versions of this tutorial: https://github.com/jorisvandenbossche/geopandas-tutorial false https://pretalx.com/euroscipy-2019/talk/YKPNEE/ https://pretalx.com/euroscipy-2019/talk/YKPNEE/feedback/ Track4 (Chillida) Astronomical Image Processing Tutorial 2019-09-03T11:00:00+00:00 11:00 01:30 This tutorial will introduce the concept of *sparsity* and demonstrate how it can be used to remove noise from signals. These concepts will then be expanded to demonstrate how noise can be removed from astronomical images in particular. euroscipy-2019-2667-astronomical-image-processing Samuel FARRENS en ### Programme - The tutorial will begin with short introduction to the basic premise of sparsity and highlight some problems in astronomical image processing that can be solved using this methodology. (~15-20min; slides) - Tutees will then follow a hands-on demonstration of how the concept of sparsity can be used to denoise signals. (~30-35min; interactive jupyter notebook with exercises) - Finally the tutees will learn how to denoise an astronomical image and use their newfound skills to recover a nice picture of Saturn. (~35-40min; interactive jupyter notebook with an exercise) ### Requirements - The tutorial contents are available on [GitHub](https://github.com/sfarrens/euroscipy). - Provided tutees have a stable internet connection, the entire tutorial can be run online using [Binder](https://mybinder.org/v2/gh/sfarrens/euroscipy/master). - However, to be safe, tutees should download and install the tutorial materials beforehand. false https://pretalx.com/euroscipy-2019/talk/SMLGVL/ https://pretalx.com/euroscipy-2019/talk/SMLGVL/feedback/ Track4 (Chillida) Parallelizing Python applications with PyCOMPSs Tutorial 2019-09-03T14:00:00+00:00 14:00 01:30 PyCOMPSs is a task-based programming model that enables the parallel execution of Python scripts by annotating methods with task decorators. At run time, it identifies tasks' data-dependencies, schedules and executes them in distributed environments. euroscipy-2019-1295-parallelizing-python-applications-with-pycompss Javier Conejero en ## PyCOMPSs! COMPSs is a **task-based programming model that aims to ease the development of parallel applications and their execution in distributed computing environments**, which provides a binding for Python (aka **PyCOMPSs**). It is based on sequential programming, which helps application developers on parallelization and distribution efforts (e.g. thread/process creation, synchronization, data movements, etc.). Application developers simply need to identify which methods will be considered tasks, and the runtime exploits the inherent parallelism of the application at execution time by detecting the task calls and the data dependencies among them. To this end, the runtime is able to spawn the tasks asynchronously on the available resources and orchestrate their data transfers guaranteeing the validity of the execution. PyCOMPSs relies on the usage of decorators for task selection and a tiny API for synchronization. Moreover, it has also integration with Jupyter notebooks, and provides a wide range of supported features, such as task constraint definition, multiple implementations (so that the runtime can choose the most appropriate considering the available resources), and binary tasks (e.g. binary, MPI and OmpSs) among others. In addition, PyCOMPSs' runtime enables to run the applications on top of different infrastructures (such as multi-core machines, clusters, grids, clouds or containers) without modifying a single line of the application. It also provides fault-tolerant mechanisms, a live monitoring tool, it is able to generate post-mortem performance traces using Extrae that can be later analyzed with Paraver, and it is extendible through pluggable connectors (e.g. clouds and schedulers). This rich number of features enables the quick and easy parallelization of Python code, its execution in distributed environments and performance analysis, with current success in scientific fields like numeric algorithms, AI, life and earth sciences. This tutorial has as main objective to instruct **how to program and decorate Python applications using PyCOMPSs** in order to enable them **to run in parallel**. More in detail, the tutorial objectives are: * To give an overview of PyCOMPSs task-based programming model syntax. * To demonstrate how to use PyCOMPSs to parallelize and run applications in distributed platforms. * To illustrate how sample benchmarks from linear algebra and big data can benefit of PyCOMPSs as a programming model. Also, from real use cases from AI, Life and Earth sciences. * To give practical insight of how to use PyCOMPSs programming model with the Jupyter notebook. * To give an overview of the PyCOMPSs runtime and how it interacts with clusters, clusters of docker containers and clouds. **The attendees will benefit by learning how to parallelize their Python application with PyCOMPSs with a simple interface, run them in distributed parallel platforms, the integration with Jupyter notebooks, and how to analyze the execution behaviour.** #### Requirements and setup instructions This tutorial can be followed using a virtual machine or using a docker container. Attendees can choose the best option considering their system. - Using Virtual Appliance: - Install VirtualBox - Download and import the COMPSs 2.5 VM image from http://compss.bsc.es (Downloads section) - Import the VM image - Start the VM image (user: compss password: compss19) - Update the tutorial apps folder: rm -rf tutorial_apps && git clone https://github.com/bsc-wdc/tutorial_apps.git - Using Docker: - Install docker - git clone https://github.com/bsc-wdc/tutorial_apps.git - docker pull compss/compss-tutorial:patc2019 - docker run --name mycompss -p 8888:8888 -p 8080:8080 -v /path/to/tutorial_apps:/home/tutorial_apps -itd compss/compss-tutorial:patc2019 false https://pretalx.com/euroscipy-2019/talk/CQCKY9/ https://pretalx.com/euroscipy-2019/talk/CQCKY9/feedback/ Track 1 (Mitxelena) From Galaxies to Brains! - Image processing with Python Keynote 2019-09-04T10:15:00+00:00 10:15 00:45 From the smallest microscopic objects to the largest scales of the Universe, our ability to study the world around us is predicated on the quality of the data we have access to. euroscipy-2019-2637-from-galaxies-to-brains-image-processing-with-python Samuel FARRENS en From the smallest microscopic objects to the largest scales of the Universe, our ability to study the world around us is predicated on the quality of the data we have access to. In other words, cleaner and higher resolution images will provide us with more detailed and accurate information. Obtaining the necessary image quality, however, is extremely difficult, particularly as we push instruments to their limits and have to deal with larger and larger amounts of data. In this talk I will introduce some of the current challenges in the realms of astrophysical and biomedical imaging. I will then present some interesting new ideas for tackling these problems and how Python facilitates their implementation. false https://pretalx.com/euroscipy-2019/talk/H8VPAY/ https://pretalx.com/euroscipy-2019/talk/H8VPAY/feedback/ Track 1 (Mitxelena) Distributed GPU Computing with Dask Talk (long) 2019-09-04T11:30:00+00:00 11:30 00:30 Dask has evolved over the last year to leverage multi-GPU computing alongside its existing CPU support. We present how this is possible with the use of NumPy-like libraries and how to get started writing distributed GPU software. euroscipy-2019-1405-distributed-gpu-computing-with-dask Peter Andreas Entschev en The need for speed remains important for scientific computing. Historically, computers were limited to few dozens of processors, but with modern GPUs, we can have thousands, or even millions of cores running in parallel on distributed systems. However, developing software for distributed GPU systems can be difficult, both because writing GPU code can be challenging for non-experts, and because distributed systems are inherently complex. We can work to address these challenges by using GPU-enabled libraries that mimic parts of the SciPy ecosystem, such as CuPy, RAPIDS, and Numba, abstracting GPU programming complexity, combined with Dask to abstract distributed computing complexity. We talk about how Dask has come a long way to support distributed GPU-enabled systems by leveraging community standards and protocols, reusing open source libraries for GPU computing, and keeping it simple and complication-free to build highly-configurable accelerated distributed software. false https://pretalx.com/euroscipy-2019/talk/9DPFGM/ https://pretalx.com/euroscipy-2019/talk/9DPFGM/feedback/ Track 1 (Mitxelena) Modern Data Science: A new approach to DataFrames and pipelines Talk (long) 2019-09-04T12:00:00+00:00 12:00 00:30 We will demonstrate how to explore and analyse massive datasets (>150GB) on a laptop with the Vaex library in Python. Using computational graphs, efficient algorithms and storage (Apache Arrow / hdf5) Vaex can easily handle up to a billion rows. euroscipy-2019-1440-modern-data-science-a-new-approach-to-dataframes-and-pipelines Jovan VeljanoskiMaarten Breddels en Working with datasets comprising millions or billions of samples is an increasingly common task, one that is typically tackled with distributed computing. Nodes in high-performance computing clusters have enough RAM to run intensive and well-tested data analysis workflows. More often than not, however, this is preceded by the scientific process of cleaning, filtering, grouping, and other transformations of the data, through continuous visualizations and correlation analysis. In today’s work environments, many data scientists prefer to do this on their laptops or workstations, as to more effectively use their time and not to rely on spotty internet connection to access their remote data and computation resources. Modern laptops have sufficiently fast I/O SSD storage, but upgrading RAM is expensive or impossible. Applying the combined benefits of computational graphs, which are common in neural network libraries, with delayed (a.k.a lazy) evaluations to a DataFrame library enables efficient memory and CPU usage. Together with memory-mapped storage (Apache Arrow, hdf5) and out-of-core algorithms, we can process considerably larger data sets with fewer resources. As an added bonus, the computational graphs ‘remember’ all operations applied to a DataFrame, meaning that data processing pipelines can be generated automatically. In this talk, we will demonstrate Vaex, an open-source DataFrame library that embodies these concepts. Using data from the New York City YellowCab taxi service comprising 1.1 billion samples and taking up over 170 GB on disk, we will showcase how one can conduct an exploratory data analysis, complete with filtering, grouping, calculations of statistics and interactive visualisations on a single laptop in real time. Finally we will show an example of how one can automatically build a machine learning pipeline as a by-product of the exploratory data analysis using the computational graphs in Vaex. false https://pretalx.com/euroscipy-2019/talk/YRJNR8/ https://pretalx.com/euroscipy-2019/talk/YRJNR8/feedback/ Track 1 (Mitxelena) Apache Arrow: a cross-language development platform for in-memory data Talk (long) 2019-09-04T14:45:00+00:00 14:45 00:30 Apache Arrow, defining a columnar, in-memory data format standard and communication protocols, provides a cross-language development platform with already several applications in the PyData ecosystem. euroscipy-2019-1443-apache-arrow-a-cross-language-development-platform-for-in-memory-data Joris Van den Bossche en This talk discusses Apache Arrow project and how it already interacts with the Python ecosystem. The Apache Arrow project specifies a standardized language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware. On top of that standard, it provides computational libraries and zero-copy streaming messaging and interprocess communication protocols, and as such, it provides a cross-language development platform for in-memory data. It has support for many languages, including C, C++, Java, JavaScript, MATLAB, Python, R, Rust, .. The Apache Arrow project, although still in active development, has already several applications in the Python ecosystem. For example, it provides the IO functionality for pandas to read the Parquet format (a columnar, binary file format used a lot in the Hadoop ecosystem). Thanks to the standard memory format, it can help improve interoperability between systems, and this is already seen in practice for the Spark / Python interface, by increasing the performance of PySpark. Further, it has the potential to provide a more performant string data type and nested data types (like dicts or lists) for Pandas dataframes, which is already being experimented with in the fletcher package (using the pandas ExtensionArray interface). false https://pretalx.com/euroscipy-2019/talk/KZGLXR/ https://pretalx.com/euroscipy-2019/talk/KZGLXR/feedback/ Track 1 (Mitxelena) Caterva: A Compressed And Multidimensional Container For Big Data Talk (long) 2019-09-04T15:15:00+00:00 15:15 00:30 Caterva is a library on top of the Blosc2 compressor that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk. euroscipy-2019-1412-caterva-a-compressed-and-multidimensional-container-for-big-data Francesc Alted en # Caterva: A Compressed And Multidimensional Container For Big Data [Caterva](https://github.com/Blosc/Caterva) is a C library on top of [C-Blosc2](https://github.com/Blosc/c-blosc2) that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk. While there are several existing solutions for this scenario (HDF5 is one of the most known), Caterva brings novel features that, when taken toghether, set it appart from them: * __Leverage important features of C-Blosc2__. C-Blosc2 is the next generation of the well-know, high performance C-Blosc compression library (see below for a more in-depth description). * __Fast and seamless interface with the compression engine__. While in other solutions compression seems an after-thought and can implies several copies of buffers internally, the interface of Caterva and C-Blosc2 (its internal compression engine) is meant to be as direct as possible minimizing copies and hence, increasing performance. * __Both in-memory and on-disk paradigms are supported the same way__. This allows for using the same API for data that can be either in-memory or on-disk. * __Support for a plain buffer data layout__. This allows for essentially no-copy data sharing among existing libraries (NumPy), allowing to use existing functionality to be used directly in Caterva without loosing performance. Along this features, there is an important 'mis-feature': Caterva is __type-less__. Lacking the notion of data type means that Caterva containers are not meant to be used in computations directly, but rather in combination with other higher-level libraries. While this can be seen as a drawback, it actually favors simplicity and leaves up to the user the addition of the types that he is more interested in, which is far more flexible than typed-aware libraries (HDF5, NumPy and many others). During our talk, we will describe all these Caterva features by using [cat4py](https://github.com/Blosc/cat4py), a Python wrapper for Caterva. Among the points to be discussed would be: * Introduction to the main features of Caterva. * Description of the basic data container and its usage. * Short discussion of different use cases: * Create and fill high dimensional arrays. * Get multi-dimensional slices out of the arrays. * How different compression codecs and filters in the pipeline affect store/retrieval performance. We have been using Caterva in one of our internal projects for several months now, and we are pretty happy with the flexibility and easy-of-use that it brings to us. This is why we decided to open-source it in the hope that it would benefit others, but also that others may help us in developing it further ;-) ## About C-Blosc and C-Blosc2 [C-Blosc](https://github.com/Blosc/c-blosc) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that we are aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations. [C-Blosc2](https://github.com/Blosc/c-blosc2) is the new major version of C-Blosc, with a revamped API and support for new compressors and new filters (data transformations), including filter pipelining, that is, the capability to apply different filters during the compression pipeline, allowing for more adaptability to the data to be compressed. Dictionaries are also introduced, allowing better handling of redundancies among independent blocks and generally increasing compression ratio and performance. Last but not least, there are new data containers that are meant to overcome the 32-bit limitation of the original C-Blosc. Furthermore, the new data containers are available in various formats, including in-memory and on-disk implementations. false https://pretalx.com/euroscipy-2019/talk/BLPA7N/ https://pretalx.com/euroscipy-2019/talk/BLPA7N/feedback/ Track 1 (Mitxelena) Modin: Scaling the Capabilities of the Data Scientist, not the machine Talk 2019-09-04T15:45:00+00:00 15:45 00:15 Modern data systems tend to heavily focus on optimizing for the system’s time. In this talk, we discuss the design of Modin, a DataFrame library, and how to optimize for the human system. euroscipy-2019-2582-modin-scaling-the-capabilities-of-the-data-scientist-not-the-machine Devin PetersohnDevin Petersohn en Modern data systems tend to heavily focus on optimizing for the system’s time. Some of these optimizations, however, are counterproductive to the end user’s workflow and thought process. In this talk, we discuss the design of Modin, a DataFrame library, and how to optimize for the human system. Modin is a project at UC Berkeley's RISELab designed to optimize for the data scientist’s time. Often when building a data system, the system designers will follow a set of “best practices” in order to optimize performance. These “best practices” often require data scientists to understand and personally optimize concepts and system components that are not central to extracting value from their data. The fundamental goal of data science is to extract value from data. Despite this, data systems are being built with user requirements such as: (1) knowledge of partitioning, (2) understanding laziness and what triggers computation, (3) an entirely new API, and (4) where their code is running (e.g. locally, on-prem cluster, cloud). This overhead is passed to the data scientist, even though there is no overlap between these new requirements and the fundamental goal of their profession. In this talk, we will discuss how we think about the problem of large scale data science and optimizing for the human system. We will discuss the system design of Modin, which enables pluggable backends, runtimes, and APIs. The system is designed to solve the needs of the data science community regardless of an individual user’s environment. Currently, Modin supports the pandas API, and a proof of concept for SQL has been implemented. Modin is completely open-source and can be found on GitHub: https://github.com/modin-project/modin. false https://pretalx.com/euroscipy-2019/talk/H3DRAV/ https://pretalx.com/euroscipy-2019/talk/H3DRAV/feedback/ Track 1 (Mitxelena) Best Coding Practices in Jupyterlab Talk 2019-09-04T16:30:00+00:00 16:30 00:15 Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. In this talks I will present some best practices to make code more readable, better to maintain and re-usable. euroscipy-2019-1420-best-coding-practices-in-jupyterlab Alexander CS Hendorf en Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. In this talks I will present some best practices to make code more readable, better to maintain and re-usable. This will include: - versioning best practices - how to use submodules - coding methods to avoid (e.g. closures) false https://pretalx.com/euroscipy-2019/talk/XBGYZB/ https://pretalx.com/euroscipy-2019/talk/XBGYZB/feedback/ Track 1 (Mitxelena) Lessons learned from comparing Numba-CUDA and C-CUDA Talk 2019-09-04T16:45:00+00:00 16:45 00:15 We compared the performance of GPU-Applications written in C-CUDA and Numba-CUDA. By analyzing the GPU assembly code, we learned about the reasons for the differences. This helped us to optimize our codes written in NUMBA-CUDA and NUMBA itself. euroscipy-2019-1767-lessons-learned-from-comparing-numba-cuda-and-c-cuda Lena Oden en Numba allows the development of GPU code in Python style. When a Python script using Numba is executed, the code is compiled just-in-time (JIT) using the LLVM framework. Using Python for GPU programming can mean a considerable simplification in the development of parallel applications compared to C and C-CUDA. Python, however, has to live with the prejudice of low performance, especially in HighPerformance Computing. We wanted to get to the bottom of whether this is really true and where these differences come from. For this reason, we first analyzed the performance of typical micro benchmarks used in HPC. By analyzing the assembly codes, we learned a lot about the difference between codes produced by C-CUDA and NUMBA-CUDA. Some of these insights have helped us to improve the performance of our application - and also of Numba-CUDA. With a few tricks it is possible to achieve very good performance with our Numba-Codes, which are very close - or sometimes even better than the C-CUDA versions. false https://pretalx.com/euroscipy-2019/talk/UHMWGH/ https://pretalx.com/euroscipy-2019/talk/UHMWGH/feedback/ Track 2 (Baroja) How a voice assistant works Talk (long) 2019-09-04T11:30:00+00:00 11:30 00:30 This talk will focus on the technologies needed to build a voice assistant. It will keep as center point Samsung’s voice assistant Bixby, which is available in 8 languages across the world (5 EU languages) in a variety of Samsung mobile phones. euroscipy-2019-2679-how-a-voice-assistant-works Miren Urteaga Aldalur en This talk will focus on the technologies needed to build a voice assistant. It will keep as center point Samsung’s voice assistant Bixby, which is available in 8 languages across the world (5 EU languages) in a variety of Samsung mobile phones. First an overview of the needed infrastructure and the challenges regarding user education will be presented. Then, the talk will offer an overview of the technologies needed in a voice assistant: 1. Automatic Speech Recognition: how a sound wave is transcribed into words 2. Natural Language Understanding: extraction of meaning from a sentence 3. Natural Language Generation: response generation 4. Text To Speech: speech synthesis During the talk the new Bixby IDE will also be presented, with which any developer can create a “voice capsule” that processes natural language to send/retrieve information from their API. Bixby Developers site: https://bixbydevelopers.com/ false https://pretalx.com/euroscipy-2019/talk/YU8EML/ https://pretalx.com/euroscipy-2019/talk/YU8EML/feedback/ Track 2 (Baroja) QuTiP: the quantum toolbox in Python as an ecosystem for quantum physics exploration and quantum information science Talk (long) 2019-09-04T12:00:00+00:00 12:00 00:30 In this talk you will learn how QuTiP, the quantum toolbox in Python (http://qutip.org), has emerged from a library to an *ecosystem*. QuTiP is used for education, to teach quantum physics. In research and industry, for quantum computing simulation. euroscipy-2019-1311-qutip-the-quantum-toolbox-in-python-as-an-ecosystem-for-quantum-physics-exploration-and-quantum-information-science Nathan ShammahAlexander Pitchford en QuTiP is emerging as a library at the center of a lively ecosystem. In this talk you will learn about the ongoing projects that have invested this project, from providing the framework to simulate quantum machine learning for quantum computers to the development of efficient numerical solvers tackling dynamical problems that are inherently hard to simulate classically. It can be noted that [Astropy](https://www.astropy.org/affiliated/index.html) is a community effort to develop a common core package for Astronomy in Python and "foster an ecosystem of interoperable astronomy packages", It seems an interesting model for the quantum tech landscape. [Qiskit]() did build its own ecosystem of sub-libraries for quantum computing. The physics library for quantum tech is http://qutip.org . About the idea of QuTiP as a super-library, here are some details: - `krotov`, a very recent package for optimal control built on top of QuTiP ( https://arxiv.org/abs/1902.11284). [https://github.com/qucontrol/krotov]. - `piqs`, the permutational invariant quantum solver, now a QuTiP module (see also https://arxiv.org/abs/1805.05129 ); - `matsubara`, a plugin to study the ultrastrong coupling regime with structured baths, http://matsubara.readthedocs.io/ - `QNET`, a computer algebra package for quantum mechanics and photonic quantum networks, which actually calls QuTiP as a plugin, mainly developed at Stanford in Mabuchi Lab https://github.com/mabuchilab/QNET - `qptomographer`, https://qptomographer.readthedocs.io/en/latest/install, a library to derive error bars for experiments in quantum computing and quantum information processing. - `tiqs`, a library to study open quantum systems on extended lattices exploiting the symmetries of such systems, https://github.com/fminga/tiqs - other upcoming integrations relative to pulse control, such as `qupulse`, https://github.com/qutech/qupulse/wiki/Architecture-Proposal This talk will be of interest to the curious coder and researcher, analyzing how QuTiP's impact in the research community has fostered a [*lingua franca* for quantum tech research](https://twitter.com/goerz/status/1118739088595652611). We will also draw comparisons with other larger ecosystems in Python-based scientific projects, such as astropy and scikit-learn. # More about QuTiP - QuTiP is the open-source software to study quantum physics. It develops both an intuitive playground to understand quantum mechanics and cutting-edge tools to investigate it. - QuTiP provides the most comprehensive toolbox to characterize noise and dissipation –realistic processes– affecting quantum systems, as well as tools not only to monitor but also to minimize their impact (quantum optimal control, description of decoherence-free spaces). - For this reason QuTiP is a software born out of the quantum optics community and that has become increasingly relevant for the quantum computing community, as current quantum computing devices are noisy (NISQ definition by Preskill). - `pypinfo` data shows that QuTiP is popular in countries that are strong in quantum tech and quantum computing research, eg, The Netherlands in the top five, as well as countries that benefit in the use of open source software (OSS) for university coursework, eg, India. - In the past three years, there has been an evolution in the quantum tech community, which has embraced OSS. - OSS libraries are used as a means to grow the user base, as well as in a more structural way for quantum computers, as they provide cloud access to quantum devices, e.g., IBM Q. - QuTiP is the only major library that has continued to thrive in this ecosystem, competing with other library packages that are funded by corporations or VC-backed startups/ - Since the tools of QuTiP provide a common ground to study quantum mechanics, it is important that this independent project is provided with the necessary support to thrive - As access to quantum computers becomes more and more widespread also for the use of data scientist and QuTiP's popularity grows even more for undergraduate and graduate courses, becoming the de-facto standard OSS to study quantum optical systems, it is imperative that the QuTiP library makes a quality jump to provide a comprehensive introduction to its tools for a much broader community of users. - QuTiP website: http://www.qutip.org/ - GitHub repository: https://github.com/qutip - GitHub repository (QuTiP code): https://github.com/qutip/qutip - GitHub repository (QuTiP documentation): https://github.com/qutip/qutip-doc - GitHub repository (QuTiP tutorials): https://github.com/qutip/qutip-notebooks - Latest version of the documetnation: http://qutip.org/docs/latest/index.html - Historical archive of released documentation: http://qutip.org/documentation.html ## QuTiP core development team QuTiP core development team: (Alex Pitchford, alex.pitchford@gmail.com). Additional mentors will be the project's core contributors Nathan Shammah (nathan.shammah@gmail.com), Shahnawaz Ahmed (shahnawaz.ahmed95@gmail.com) and Eric Giguere (eric.giguere@usherbrooke.ca). QuTiP is a project started by Robert J. Johansson and Paul Nation. Other core developers have been Arne Grimso, Chris Granade and over other 44 contributors. ## References [1] J. R. Johansson, P. D. Nation, and F. Nori: “QuTiP: An open-source Python framework for the dynamics of open quantum systems.”, Comp. Phys. Comm. 183, 1760–1772 (2012) [2] J. Robert Johansson, Paul D. Nation, and Franco Nori: “QuTiP 2: A Python framework for the dynamics of open quantum systems.”, Comp. Phys. Comm. 184, 1234 (2013) [3] J. Preskill, "Quantum Computing in the NISQ era and beyond." Quantum **2**, 79 (2018) [4] Mark Fingerhuth, Tomáš Babej, and Peter Wittek, Open source software in quantum computing, PLoS ONE 13 (12): e0208561 (2018). [5] N. Shammah, S. Ahmed, N. Lambert, S. De Liberato, and F. Nori, "Open quantum systems with local and collective incoherent processes: Efficient numerical simulation using permutational invariance " Phys. Rev. A 98, 063815 (2018). Code at [http://piqs.readthedocs.io](http://piqs.readthedocs.io) [6] N. Lambert, S. Ahmed, M. Cirio, and F. Nori, "Virtual excitations in the ultra-strongly-coupled spin-boson model: physical results from unphysical modes", arXiv preprint arXiv:1903.05892. Also [http://matsubara.readthedocs.io](http://matsubara.readthedocs.io) **Other relevant material**: - Slides on QuTiP and the quantum-tech open source ecosystem (Nathan Shammah @ Berkeley Lab, 2019). [PDF](https://conferences.lbl.gov/event/195/session/6/contribution/13/material/slides/0.pdf) - ["The rise of open source in quantum physics research"](http://blogs.nature.com/onyourwavelength/2019/01/09/the-rise-of-open-source-in-quantum-physics-research/), Nathan Shammah and Shahnawaz Ahmed, Nature's physics blog, January 9, 2019. - "Bit to QuBit: Data in the age of quantum computers", Shahnawaz Ahmed, PyData 2018, Warsaw, Poland, 2019. [YouTube video](https://www.youtube.com/watch?v=6GAXJhL1mSs). false https://pretalx.com/euroscipy-2019/talk/JJCQQJ/ https://pretalx.com/euroscipy-2019/talk/JJCQQJ/feedback/ Track 2 (Baroja) Constrained Data Synthesis Talk (long) 2019-09-04T14:45:00+00:00 14:45 00:30 We introduce a method for creating synthetic data "to order" based on learned (or provided) constraints and data classifications. This includes "good" and "bad" data. euroscipy-2019-1822-constrained-data-synthesis Nick Radcliffe en Synthetic data is useful in many contexts, including * providing "safe", non-private alternatives to data containing personally identifiable information * software and pipeline testing * software and service development * enhancing datasets for machine learning. Synthetic data is often created on a bespoke basis, and since the advent of generative adverserial networks (GANs) there has been considerable interest and experimentation with using those as the basis for creating synthetic data. We have taken a different approach. We have worked for some years on developing methods for automatically finding constraints that characterise data, and which can be used for testing data validity (so-called "test-driven data analysis", TDDA). Such constraints form (by design) a useful characterisation of the data from which they were generated. As a result, methods that generate datasets that match the constraints necessarily construct datasets that match many of the original characteristics of the data from which the constraints were extracted. An important aspect of datasets is the relationship between "good" (~ valid) and "bad" (~ invalid) data, both of which are typically present. Systems for creating useful, realistic synthetic data generally need to be able to synthesize both kinds, in realistic mixtures. This talk will discuss data synthesis from constraints, describing what has been achieved so far (which includes synthesizing good and bad data) and future research directions. false https://pretalx.com/euroscipy-2019/talk/PDYER8/ https://pretalx.com/euroscipy-2019/talk/PDYER8/feedback/ Track 2 (Baroja) ToFu - an open-source python/cython library for synthetic tomography diagnostics on Tokamaks Talk (long) 2019-09-04T15:15:00+00:00 15:15 00:30 We present an open-source parallelized and cythonized python library, ToFu, for modeling tomography diagnostics on nuclear fusion reactors. Its functionalities (with realistic examples), its architecture and its design will be shown. euroscipy-2019-1445-tofu-an-open-source-python-cython-library-for-synthetic-tomography-diagnostics-on-tokamaks Laura MendozaDidier VEZINET en Nuclear fusion comes along with great promises of almost limitless energy with little risks and waste. But it also comes with significant scientific and technological complexities. Decades of efforts may find an echo in ITER, an international tokamak being built to address this challenge. A tokamak is a particular kind of advanced experimental nuclear fusion reactor. It is a torus-shaped vacuum vessel in which a hydrogen plasma of very low density is heated up to temperatures (10-100 millions of degrees Celsius) allowing nuclear fusion reactions to occur. The torus-shaped plasma radiates light, which is measured in various wavelength domains by dedicated sets of detectors (called diagnostics), like 2D cameras observing visible light, 1D arrangements of diodes sensitive to X-rays, ultra-violet spectrometers... Due to the torus shape, the plasma is axisymmetric, and like in medical imaging, tomography methods can be used to diagnose the light radiated in a plasma cross-section. For all diagnostics, one can seek to solve the direct or the inverse problem. The direct problem consists in computing the measurements from a known plasma light emissivity, provided by a plasma simulation for example. The inverse problem consists in computing the plasma light emissivity from experimental measurements. The algorithms involved in solving both the direct and inverse problem are very similar, no matter the wavelength domain. Like many, the fusion community tends to suffer from a lack of reproducibility of the results it publishes. This problem is particularly acute in the case of tomography diagnostics since the inverse problem is ill-posed and the solution unicity is not guaranteed. There are also many possible simplifying hypotheses that may, or may not, be relevant for each diagnostic. In this regard, the historical uses of the community display a large variety of single-user black-box codes, each typically designed by a student, and often forgotten or left as is until a new student is hired and starts all over again. In this context, a machine-independent, open-source and documented python library, ToFu, was started to provide the fusion community with a common and free reference tool. We thus aim at improving reproducibility by providing a known and transparent tool, able to efficiently solve both the direct and inverse problem for tomography diagnostics. It can use very simple hypothesis or very complete diagnostics descriptions alike, one of the ideas being that it should allow users to perform accurate calculations easily, sparing them the need to simplify hypotheses that are not always valid. A zero version of tofu, fully operational but not user-friendly enough, was first developed between 2014 and 2016 when it was used for published results. Strong with this first proof of principle, a significant effort was initiated in 2017 to completely re-write the code with a stronger emphasis on python community standards (PEP8), version control (Github), performance (cython), packaging (pip and conda), continuous integration (nosetests and travis), modularity (architecture refurbishing), user-friendliness (renamings, utility tools) and object-oriented coding (class inheritance). This effort is still ongoing to this day and is scheduled to go on for the next 2.5 years. However, the first milestones have been reached, and we would like to present the first re-written modules to the python community, for publicity, advice, feedback, mutually enriching exchanges and more generally because we feel tofu is part of the large open-source python scientific community. The code is composed of several modules: a geometry module, a data visualization module, a meshing module, and an inversion module. We will present the geometry module (containing ray-tracing tools, spatial integration algorithms...) and the data module (making use of matplotlib for pre-defined interactive figures). Using profiling tools, the numerical core of the geometry module was optimized and parallelized recently in `Cython` making the code more than ten thousand times faster than the previous version on some test cases. Memory usage has also been reduced by half on the largest test cases. see [ToFu](https://github.com/ToFuProject/tofu) false https://pretalx.com/euroscipy-2019/talk/PEJPDG/ https://pretalx.com/euroscipy-2019/talk/PEJPDG/feedback/ Track 2 (Baroja) Debugging in JupyterLab Talk 2019-09-04T15:45:00+00:00 15:45 00:15 Debugging Jupyter Notebooks has been one of the most requested features. In this presentation we give an overview of the current state and tools for debugging in Jupyter, and offer a glimpse of what is coming next. euroscipy-2019-2748-debugging-in-jupyterlab Jeremy Tuloup en Layout: ##### 1. Current tools for debugging Jupyter Notebooks - print statements - ipdb - PixieDebugger (IBM) - Visual Studio Code cell debugging ##### 2. Native debugging support for Jupyter Kernels - Jupyter protocol extension - Debug Adapter Protocol in xeus-python ##### 3. Debugger extension for JupyterLab - An IDE-like debugging experience in JupyterLab - Active development, current prototype - Demo false https://pretalx.com/euroscipy-2019/talk/HGNPFF/ https://pretalx.com/euroscipy-2019/talk/HGNPFF/feedback/ Track 2 (Baroja) Controlling a confounding effect in predictive analysis. Talk 2019-09-04T16:30:00+00:00 16:30 00:15 Confounding effects are often present in observational data: the effect or association studied is observed jointly with other effects that are not desired. euroscipy-2019-1567-controlling-a-confounding-effect-in-predictive-analysis- Darya Chyzhyk en For instance, when predicting the salary to offer given the descriptions of professional experience, the risk is to capture indirectly a gender bias present in the distribution of salaries. Another example is found in biomedical applications, where for an automated radiology diagnostic system to be useful, it should use more than socio-demographic information to build its prediction. Here I will talk about confounds in predictive models. I will review classic deconfounding techniques developed in a well-established statistical literature, and how they can be adapted to predictive modeling settings. Departing from deconfounding, I will introduce a non-parametric approach –that we named “confound-isolating cross-validation”– adapting cross-validation experiments to measure the performance of a model independently of the confounding effect. The examples are mentioned in this work are related to the common issues in neuroimage analysis, although the approach is not limited to neuroscience and can be useful in another domains. false https://pretalx.com/euroscipy-2019/talk/XZLXZM/ https://pretalx.com/euroscipy-2019/talk/XZLXZM/feedback/ Track 2 (Baroja) The Rapid Analytics and Model Prototyping (RAMP) framework: tools for collaborative data science challenges Talk 2019-09-04T16:45:00+00:00 16:45 00:15 The RAMP (Rapid Analytics and Model Prototyping) framework provides a platform to organize reproducible and transparent data challenges. We will present the different framework bricks. euroscipy-2019-1429-the-rapid-analytics-and-model-prototyping-ramp-framework-tools-for-collaborative-data-science-challenges Guillaume LemaitreJoris Van den Bossche en We will give an overview of the RAMP framework, which provides a platform to organize reproducible and transparent data challenges. RAMP workflow is a python package used to define and formalize the data science problem to be solved. It can be used as a standalone package and allows a user to prototype different solutions. In addition to RAMP workflow, a set of packages have been developed allowing to share and collaborate around the developer solutions. Therefore, RAMP database provides a database structure to store the solutions of different users and the performance of these solutions. RAMP engine is the package to run the user solutions (possibly on the cloud) and populate the database. Finally, RAMP frontend is the web frontend where users can upload their solutions and which shows the leaderboard of the challenge. The project is open-source and can be deployed on any local server. The framework has been used at the Paris-Saclay Center for Data Science for setting up and solving about twenty scientific problems, for organizing collaborative data challenges, for organizing scientific sub-communities around these events, and for training novice data scientists. false https://pretalx.com/euroscipy-2019/talk/DVDLRG/ https://pretalx.com/euroscipy-2019/talk/DVDLRG/feedback/ Track 3 (Oteiza) Sufficiently Advanced Testing with Hypothesis Talk (long) 2019-09-04T11:30:00+00:00 11:30 00:30 Testing research code can be difficult, but is essential for robust results. Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex "black box" codes. euroscipy-2019-1801-sufficiently-advanced-testing-with-hypothesis Zac Hatfield-Dodds en Code is now a critical part of almost all research, whether for communication or for data collection and analysis. Unfortunately, producing reliably error-free code remains an open problem in science to an even greater extent than other applications. Soergal (2014) estimates that "any reported scientific result could very well be wrong if data have passed through a computer, and that these errors may remain largely undetected." - though some software errors are much more dramatic, as with the crash of the Mars Climate Orbiter. What can we do to reduce the rate of errors in our own code? There is no silver bullet, but a more efficient way to create tests would certainly help... The answer is to have a computer write your tests for you! Using Hypothesis, you describe valid inputs - from 'an integer' to 'dataframes like this', as complex and precise as needed - and write a test which should always pass... then Hypothesis searches for the smallest inputs that cause an error. This approach is called property based testing, and it regularly catches errors that evaded every human review and hand-written test case (even in Numpy). Even better, it rewards well-designed software - but can also do a quick check of a script in just a few lines of code. We'll cover the theory of property-based testing, a worked example, and then jump into a whirlwind tour of the Hypothesis API: how to use, define, compose, and infer strategies for input; properties and testing tactics for your code; and how to debug your tests if everything seems to go wrong. By the end of this talk, you'll be ready to find real bugs with Hypothesis in anything from data pipelines to the core scientific Python libraries. Be the change you want to see in your team's code - or test someone else's and help push the world into a new age of reliable research software! false https://pretalx.com/euroscipy-2019/talk/SZ8S8G/ https://pretalx.com/euroscipy-2019/talk/SZ8S8G/feedback/ Track 3 (Oteiza) What about tests in Machine Learning projects? Talk (long) 2019-09-04T12:00:00+00:00 12:00 00:30 Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits? euroscipy-2019-1810-what-about-tests-in-machine-learning-projects- Sarah Diot-Girard en Once your machine learning POC seems promising and your development environment is set up, the next step is to refactor your code and write TESTS. We know that a lot of people think tests are too complicated and boring to write and they are not very useful. Some manual checks can address the need. It is not totally false. Tests can be really boring and time consuming to write when you don't have the right tools, the right APIs, the right environments or the right code structure. But it is always a bad idea to ignore tests or to perform them manually. If you want to be involved in your project life cycle, if you want to bring it from POC to production you need to care about tests. After some years tackling production bugs, you can't feel safe delivering without tests as you can't start driving until your seat belt is fastened. There is more than one way to test. Tests can be split on several levels (unit, component, functional, performances, etc...) to be able to quickly identify the faulty code/data/parameter. Tests must also be automated in a Continuous Integration and run at least on each experiment before merging it in the baseline pipeline as it is done in software engineering (the CI is triggered on each feature branch). This talk is about how to easily write tests and testable code, how to avoid most common traps and what are the benefits of tests on unrealistic data in your Machine Learning project. (Tests on real data are also really important but they are not the main purpose of this talk.) Slides are here: sdg.jlbl.net/slides/tests_for_datascientist/presentation.html false https://pretalx.com/euroscipy-2019/talk/YHCP9C/ https://pretalx.com/euroscipy-2019/talk/YHCP9C/feedback/ Track 3 (Oteiza) Scientific DevOps: Designing Reproducible Data Analysis Pipelines with Containerized Workflow Managers Talk (long) 2019-09-04T14:45:00+00:00 14:45 00:30 A review of DevOps tools as applied to data analysis pipelines, including workflow managers, software containers, testing frameworks, and online repositories for performing reproducible science that scales. euroscipy-2019-1436-scientific-devops-designing-reproducible-data-analysis-pipelines-with-containerized-workflow-managers Nicholas Del Grosso en Open source and open science come together when the software is accessible, transparent, and owned by all. For data analysis pipelines that grow in complexity beyond a single Jupyter notebook, this can become a challenge as the number of steps and software dependencies increase. In this talk, Nicholas Del Grosso will review a variety of tools for packaging and managing a data analysis pipeline, showing how they fit together and benefit the development, testing, deployment, and publication processes and the scientific community. In particular, this talk will cover: - **Workflow managers** (e.g. Snakemake, PyDoit, Luigi) to combine complex pipelines into single applications. - **Container Solutions** (e.g. Docker and Singularity) to package and deploy the software on others' computers, including high-performance computing clusters. - **The Scientific Filesystem** to build explorable and multi-purpose applications. - **Testing Frameworks** (e.g. PyTest, Hypothesis) to declare and confirm the assumptions and functionality of the analysis pipeline. - **Ease-of-Use Utilities** to share the pipeline online and make it accessible to non-programmers. By writing software that stays manageable, reproducible, and deployable continuously throughout the development cycle, we can better fulfill the goals of open science and good scientific practice in a digital era. false https://pretalx.com/euroscipy-2019/talk/QVCFGE/ https://pretalx.com/euroscipy-2019/talk/QVCFGE/feedback/ Track 3 (Oteiza) Dashboarding with Jupyter notebooks, voila and widgets Talk (long) 2019-09-04T15:15:00+00:00 15:15 00:30 Turn your Jupyter notebook into a beautiful modern React or Vue based dashboard using voila and Jupyter widgets. euroscipy-2019-1813-dashboarding-with-jupyter-notebooks-voila-and-widgets Maarten BreddelsMartin Renou en Sharing the result of a Jupyter notebook is currently not an easy path. With voila we are changing this. Voila is a small but important ingredient in the Jupyter ecosystem. Voila can execute notebooks, keeping the kernel connected but does not allow for arbitrary code execution, making it safe to share your notebooks with others. With new libraries built on top of Jupyter widgets/ipywidgets (ipymaterialui and ipyvuetify) we allow beautiful modern React and Vue components to enter the Jupyter notebook. Using voila we can integrate the ipywidgets seamlessly into modern React and Vue pages, to build modern dashboards directly from a Jupyter notebook. I will give a live example on how to transform a Jupyter notebook into a fully functional single page application with a modern (Material Design) look. false https://pretalx.com/euroscipy-2019/talk/UMWUTW/ https://pretalx.com/euroscipy-2019/talk/UMWUTW/feedback/ Track 3 (Oteiza) Make your Python code fly at transonic speeds! Talk 2019-09-04T15:45:00+00:00 15:45 00:15 [Transonic](http://transonic.readthedocs.io) is a new pure Python package to easily accelerate modern Python-Numpy code with different accelerators (like Cython, Pythran, Numba, Cupy, etc...). euroscipy-2019-1403-make-your-python-code-fly-at-transonic-speeds- Pierre Augier en Slides available at https://tiny.cc/euroscipy2019-transonic [Transonic](http://transonic.readthedocs.io/) is a pure Python package (requiring Python >= 3.6) to easily accelerate modern Python-Numpy code with different accelerators (like Cython, [Pythran](https://github.com/serge-sans-paille/pythran), Numba, Cupy, etc...) opportunistically (i.e. if/when they are available). We will first present the context of the creation of this package, i.e. the Python's High Performance Computing (HPC) Landscape. We will show how Transonic can be used to write elegant and very efficient HPC codes with Python, with examples taken from real-life research simulation codes ([fluidfft](https://fluidfft.readthedocs.io) and [fluidsim](https://fluidsim.readthedocs.io)). We will discuss the advantages of using Transonic instead of writing big Cython extensions or using Numba or Pythran directly. A strategy to quickly develop a very efficient scientific application/library with Python and Transonic could be: 1. Use modern Python coding, standard Numpy/Scipy for the computations and all the cool libraries you want. 2. Profile your applications on real cases, detect the bottlenecks and apply standard optimizations with Numpy. 3. Add few lines of Transonic to compile the hot spots. We won't forget to also discuss some limitations of Transonic, and more generally of Python and its numerical ecosystem for High Performance Computing. false https://pretalx.com/euroscipy-2019/talk/PFH3QK/ https://pretalx.com/euroscipy-2019/talk/PFH3QK/feedback/ Track 3 (Oteiza) PyFETI - An easy and massively Dual Domain Decomposition Solver for Python Talk 2019-09-04T16:30:00+00:00 16:30 00:15 PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver using Domain Decomposition method, where problems are solved locally by Direct Solver and at the interface iteratively. euroscipy-2019-1573-pyfeti-an-easy-and-massively-dual-domain-decomposition-solver-for-python Guilherme Jenovencio en PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver that uses Domain Decomposition Techniques. FETI methods rely in the solution of a linear system, based on to linear solver algorithm strategies, Direct and Iteratively. A big problem is decomposed in subdomains, generating an additional set of constraints at the interface among subdomains. The local problem solution is formulated based on a new interface force at the interface that must connect the subdomains. Therefore, given an interface force, the local problems are solved based on a direct solver, e.g SuperLU, and the update of interface force is performed by Preconditioned Conjunged Projected Gradient. The library has been tested for large linear elastic problems at the IT4I supercomputer center. false https://pretalx.com/euroscipy-2019/talk/JSCWY7/ https://pretalx.com/euroscipy-2019/talk/JSCWY7/feedback/ Track 3 (Oteiza) High Voltage Lab Common Code Basis library: a uniform user-friendly object-oriented API for a high voltage engineering research. Talk 2019-09-04T16:45:00+00:00 16:45 00:15 The library leverages Python richness to provide a uniform user-friendly API for a zoo of industrial communication protocols used to control high voltage engineering devices, together with abstraction and implementations for such devices. euroscipy-2019-1450-high-voltage-lab-common-code-basis-library-a-uniform-user-friendly-object-oriented-api-for-a-high-voltage-engineering-research- Mikołaj Rybiński en At the heart of ETH High Voltage Lab's (HVL) research are industrial devices put together into code-automated experiments. It's a zoo of industrial communication protocols one needs to handle when controlling these devices. HVL decided to switch from MATLAB to Python as a programming and analysis tool. Python community provides solutions to majority of technicalities involved in handling multitude of industrial communication protocols used to control high voltage research experiment devices. Moreover Python seems to be a more future-proof choice, meeting industry demand for a more cost-effective and collaborative solution. The HVL Common Code Basis library (`hvl_ccb`) provides a uniform user-friendly object-oriented API as well as implementation for multiple of high voltage engineering devices and their respective communication protocols. The library leverages Python's open source community - implementations of specific communication protocols, but also relies heavily on some of the languages newer features such as typing hints, dataclasses or enums. Python typing hints are used not only for their static type checking and autocompletion support from IDEs, but also for dynamic type checking of the communication protocol's and devices' configurations. The configurations themselves are a customized implementation of Python's 3.7 dataclasses. Configurations properties rely heavily on Python (advanced) enumerations. Currently, the library supports serial port, VISA over TCP, Modbus TCP, LabJack LJM and OPC UA communication protocols. These protocols are used within code abstraction of devices such MBW973 SF6 Analyzer / dew point mirror, LabJack (T7-PRO) device, Schneider Electric ILS2T stepper motor drive, Elektro-Automatik PSI9000 DC power supply, Rhode & Schwarz RTO 1024 oscilloscope, or the Lab's state-of-the-art Supercube platform, which encapsulates safety components, the voltage source, as well as other auxiliary devices. false https://pretalx.com/euroscipy-2019/talk/JHGWWN/ https://pretalx.com/euroscipy-2019/talk/JHGWWN/feedback/ Posters at 16:00 scikit-fdiff, a new tool for PDE solving Poster 2019-09-04T08:25:00+00:00 08:25 01:30 Scikit-fdiff (formally Triflow) has been developed in order to facilitate mathematic models building. It has been made to quickly build and try many asymptotic falling film modelling with different phenomena coupling (energy and mass transfer). euroscipy-2019-1248-scikit-fdiff-a-new-tool-for-pde-solving Nicolas Cellier en Scikit-FDiff (formerly known as Triflow) is a new tool, written in pure Python, that focus on reducing the time between the developpement of the mathematical model and the numerical solving. It allows an easy and automatic finite difference discretization, thanks to a symbolic processing that can deal with systems of multi-dimensional partial differential equation with complex boundary conditions. Using finite differences and the method of lines, it allows the transformation of the original PDE into an ODE, providing a fast computation of the temporal evolution vector and the Jacobian matrix. The later is pre-computed in a symbolic way and sparse by nature. It can be evaluated with as few computational resources as possible, allowing the use of implicit and explicit solvers at a reasonable cost. Classic ODE solvers have been implemented (or made available from dedicated python libraries), including backward and forward Euler scheme, Crank-Nickolson, explicit Runge-Kutta. More complexes ones, like improved Rosenbrock-Wanner schemes up to the 6th order, are also available. The time-step is managed by a built-in error computation, which ensures the accuracy of the solution. The main goal of the software is to minimize the time spent writting numerical solvers to focus on model development and data analysis. Scikit-Fdiff is then able to solve toy cases in a few line of code as well as complex models. Extra tools are available, such as data saving during the simulation, real-time plotting and post-processing. It has been validated with the shallow-water equation on dam-breaks and the steady-lake case. It has also been applied to heated falling-films, dropplet spread and simple moisture flow in porous medium. false https://pretalx.com/euroscipy-2019/talk/WGE8NA/ https://pretalx.com/euroscipy-2019/talk/WGE8NA/feedback/ Posters at 16:00 PhonoLAMMPS: Phonopy with LAMMPS made easy Poster 2019-09-04T11:40:00+00:00 11:40 01:30 PhonoLAMMPS is a Phonopy interface with LAMMPS that allows to calculate the interatomic force constants and other phonon properties from a usual LAMMPS input file. euroscipy-2019-2272-phonolammps-phonopy-with-lammps-made-easy Abel Carreras en In recent years Phonopy[1] has become a very well known software in the materials science field for calculating the phonon properties of crystals. While Phonopy provides interfaces for many popular First Principles calculations software such as VASP, WIEN2K, SIESTA, etc., the implementation of interfaces for software based on empirical potentials is usually more challenging. This fact is due to the large variability of input structure and potential definitions that these kind of software require in comparison to the ones based on First Principles. In this poster I present PhonoLAMMPS[2], a Phonopy interface with LAMMPS[3] written in python that makes use of the LAMMPS official python API to allow to calculate the interatomic 2nd order force constants from a usual LAMMPS input file. PhonoLAMMPS can be used either as a python module with a similar phonopy-like interface or as a simple comandline script. [1] A. Togo and I. Tanaka, Scr. Mater., 108, 1-5 (2015) [2] https://github.com/abelcarreras/phonolammps [3] S. Plimpton, J Comp Phys., 117, 1-19 (1995) false https://pretalx.com/euroscipy-2019/talk/CUXPCN/ https://pretalx.com/euroscipy-2019/talk/CUXPCN/feedback/ Posters at 16:00 Really reproducible behavioural paper Poster 2019-09-04T13:15:00+00:00 13:15 01:30 A heavily _XKCD_ themed poster about writing a really reproducible behavioural paper in Python environment. [The poster is also available online.](https://tinyurl.com/y35otadt) euroscipy-2019-1446-really-reproducible-behavioural-paper Jakub M. Dzik en In recent years replication crisis in life sciences has received significant attention. Reproducibility of behavioural experiments may be affected by many factors, such as lack of standardisation of experimental conditions or human errors. While use of standardized systems for automated phenotyping (such as _IntelliCage_) leads to interlaboratory replicability of experiments (1), manual analysis of the obtained data still remains a potential source of irreproducibility due to human errors. Luckily, a countermeasurement for that issue is known for more than least twenty years: automation of data analysis with a non-interactive computer program (2). To facilitate development of Python programs for automated analysis of mice behavioural data obtained from IntelliCage system _PyMICE_ library (RRID:nlx\_158570) has been developed. The title paper is the publication presenting the library to the scientific community (3). As it has been written according to literate programming paradigm (4), all programs used for analysing the experimental data are embedded in [the source code of the paper itself](https://github.com/Neuroinflab/PyMICE_SM/) which makes the presented results highly reproducible and the methodology of analysis transparent. # Authors * Jakub M. Dzik, * Alicja Puścian, * Zofia Mijakowska, * Kasia Radwanska, * Szymon Łęski # Bibliography 1. A. Codita, A. H. Mohammed, A. Willuweit, A. Reichelt, E. Alleva, I. Branchi, F. Cirulli, G. Colacicco, V. Voikar, D. P. Wolfer, F. J. U. Buschmann, H.-P. Lipp, E. Vannoni, S. Krackow (2012) Effects of Spatial and Cognitive Enrichment on Activity Pattern and Learning Performance in Three Strains of Mice in the IntelliMaze. Behavior Genetics [doi:10.1007/s10519-011-9512-z](https://dx.doi.org/10.1007/s10519-011-9512-z) 2. J. B. Buckheit, D. L. Donoho (1995) WaveLab and Reproducible Research. Lecture Notes in Statistics. [doi:10.1007/978-1-4612-2544-7\_5](https://dx.doi.org/10.1007/978-1-4612-2544-7\_5) 3. J. M. Dzik, A. Puścian, Z. Mijakowska, K. Radwanska, S. Łęski (2017) PyMICE: A Python library for analysis of IntelliCage data. Behavior Research Methods. [doi:10.3758/s13428-017-0907-5](https://dx.doi.org/10.3758/s13428-017-0907-5) 4. D. E. Knuth (1984) Literate Programming. The Computer Journal. [doi:10.1093/comjnl/27.2.97](https://dx.doi.org/10.1093/comjnl/27.2.97) # Acknowledgement Project funded from the Polish National Science Centre's SYMFONIA (2013/08/W/NZ4/00691) grant. false https://pretalx.com/euroscipy-2019/talk/HUPE99/ https://pretalx.com/euroscipy-2019/talk/HUPE99/feedback/ Posters at 16:00 kESI - a kernel-based method for reconstruction of sources of brain electric activity in realistic brain geometries Poster 2019-09-04T14:50:00+00:00 14:50 01:30 _kESI_ is a new Python package for kernel-based reconstruction of brain electric activity from recorded electric field potentials using realistic assumptions about brain geometry and conductivity. euroscipy-2019-1444-kesi-a-kernel-based-method-for-reconstruction-of-sources-of-brain-electric-activity-in-realistic-brain-geometries Jakub M. DzikMarta Kowalska en Epilepsy affects around 50 million people worldwide (1). 30% of epilepsy cases are drug-resistant and surgical removal of the the neural tissue generating seizures (epileptogenic) may be the only way to prevent seizures. When removing the epileptogenic tissue it is crucial to minimize the lesioned area, because removing too much of the brain may lead to serious impairment of its function. To identify the epileptogenic zone, neurosurgeon typically implants electrode on the cortex (ECoG) or deep in the brain (SEEG). The measured potentials are used as indicators localizing the epileptic source. We argue that reconstruced source of this brain activity are better predictors of areas for resection. Here we present a method - kernel Electrical Source Imaging (kESI) - and its Python implementation which allow reconstruction of current sources taking into account the actual geometry of the patient's brain and the conductivity distribution. This method extends the _kernel Current Source Density_ (kCSD) method (3, 4) to realistic geometries and complex conductivity models. In the poster we present our most recent results in development of Python tools for reconstruction of brain activity and the progress report of kESI development. # Authors * Marta Kowalska, * Jakub M. Dzik, * Chaitanya Chintaluri, * Daniel K. Wójcik # Bibliography 1. World Health Organization, _Epilepsy_, available at: <https://www.who.int/news-room/fact-sheets/detail/epilepsy> 2. Pitts, W. H. (1952), _Investigations on synaptic transmission_, in 'Cybernetics, Trans. 9th Conf. Josiah Macy Foundation H. von Foerster', pp. 159-166. 3. Potworowski, J., Jakuczun, W., Łęski, S. & Wójcik, D. (2012) _Kernel current source density method_. Neural Comput 24(2), 541-575. 4. _Kernel Current Source Density_ <https://github.com/Neuroinflab/kCSD-python> # Acknowledgement Project funded from the Polish National Science Centre's OPUS grant (2015/17/B/ST7/04123). false https://pretalx.com/euroscipy-2019/talk/FZTEQ9/ https://pretalx.com/euroscipy-2019/talk/FZTEQ9/feedback/ Posters at 16:00 From Modeler to Programmer Poster 2019-09-04T16:25:00+00:00 16:25 01:30 The modeling system ueflow allows for customable, dynamic boundary conditions. The modeler can write Python plugins to implement the behavior of these boundary conditions. euroscipy-2019-1416-from-modeler-to-programmer Mike Müller en Boundary conditions are essential for groundwater models. The user can specify values for these boundary conditions such as a well at a certain location with a given pumping rate for a specified duration. For some special applications, however, the specified values may further depend on internal model conditions. For example, the flow rate of an infiltration well that re-infiltrates water is equal to the pumping rate of the extraction well. This can be useful for geothermal applications within groundwater bodies. The newly developed model, ueflow, allows the user to implement such a scheme by writing a plugin. In addition to just using the pumping rate as infiltration rate, the user can incorporate other constrains such as energy costs for pumping, capacities of water treatment facilities, maintenance schedules for pumps based on pumping regimes, or other technical constrains. The poster gives a short overview of ueflow that is based on the finite volume model framework FiPy (Guyer et al. 2009). FiPy is implemented in Python and offers multiple, high-performance solvers as well as several tools for generating grids and other input data. Guyer, J. E., Wheeler, D., Warren, J. A. (2009). FiPy: Partial Differential Equations with Python. Computing in Science & Engineering 11(3) pp. 6—15 (2009), doi:10.1109/MCSE.2009.52, http://www.ctcms.nist.gov/fipy false https://pretalx.com/euroscipy-2019/talk/CW97MN/ https://pretalx.com/euroscipy-2019/talk/CW97MN/feedback/ Posters at 16:00 MNE-Python, a toolkit for neurophysiological data Poster 2019-09-04T18:00:00+00:00 18:00 01:30 A summary of the MNE-Python changes introduced during the two last releases and highlights for future directions. euroscipy-2019-1441-mne-python-a-toolkit-for-neurophysiological-data Joan Massich en MNE-Python software is an open-source Python package for exploring, visualizing, and analyzing human neurophysiological data such as MEG, EEG, sEEG, ECoG, and more. It includes modules for data input/output, preprocessing, visualization, source estimation, time-frequency analysis, connectivity analysis, machine learning, and statistics. false https://pretalx.com/euroscipy-2019/talk/LWHPHN/ https://pretalx.com/euroscipy-2019/talk/LWHPHN/feedback/ Track 1 (Mitxelena) HPC and Python: Intel’s work in enabling the scientific computing community Keynote 2019-09-05T09:15:00+00:00 09:15 00:45 High Performance Computing (HPC) has been a pillar of the scientific community for years, with many in the Python community contributing to its continued development. However, one of the fundamental links in performance is the relationship between h euroscipy-2019-2636-hpc-and-python-intel-s-work-in-enabling-the-scientific-computing-community David Liu en High Performance Computing (HPC) has been a pillar of the scientific community for years, with many in the Python community contributing to its continued development. However, one of the fundamental links in performance is the relationship between hardware and software. Intel is hard at work on the Intel® Distribution for Python*, producing optimized packages and upstreaming changes to open source that help take advantage of current and future Intel® Architecture, and hardware that is purpose built to target HPC, Machine Learning, and AI workloads. Getting the performance out of these workloads has been a challenging journey, one in which good lessons and learnings were made. From Intel’s Python community contributions to the new architectures Intel created for a generation of more accessible scientific compute, Intel’s work continues on delivering more approachable HPC in Python. false https://pretalx.com/euroscipy-2019/talk/PRGASS/ https://pretalx.com/euroscipy-2019/talk/PRGASS/feedback/ Track 1 (Mitxelena) Inside NumPy: preparing for the next decade Talk (long) 2019-09-05T10:30:00+00:00 10:30 00:30 Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know? euroscipy-2019-1162-inside-numpy-preparing-for-the-next-decade Matti Picus en Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know? We will give an overview of the actions we’ve taken, both successful and unsuccessful, to improve sustainability of the NumPy project and its community. We will draw some lessons from a first year of grant-funded activity, discuss key obstacles faced, attempt to quantify what we need to operate sustainably, and present a vision for the project and how we plan to realize it. Topics we will cover include the following: - Invigorating the community - what did we do, and are we correct in our opinion that it invigorated the community? - doing things in the open as much as possible - creating a roadmap - NumPy Enhancement Proposal process - commit rights - in-person meetings - Measuring community/project health. We will use a number of published or proposed metrics to quantify this. Which ones do we think accurately represent the state of the project? - Lessons from the first grant and introducing paid work into a previously fully volunteer-driven project. - What is the best profile for a salaried employee? - Social profile - From inside or outside? - Have we succeeded in encouragin diversity? - A vision for future sustainabity - Models for obtaining and funneling funding false https://pretalx.com/euroscipy-2019/talk/R3TJLP/ https://pretalx.com/euroscipy-2019/talk/R3TJLP/feedback/ Track 1 (Mitxelena) Deep Learning without a PhD Talk (long) 2019-09-05T11:00:00+00:00 11:00 00:30 In this talk, you'll learn how to transition from traditional machine learning tools, like scikit-learn, to deep learning with Keras, TensorFlow, and JAX. No prior experience with machine learning or with deep learning required, and no need to instal euroscipy-2019-2749-deep-learning-without-a-phd Paige Bailey en In this talk, you'll learn how to transition from traditional machine learning tools, like scikit-learn, to deep learning with Keras, TensorFlow, and JAX. No prior experience with machine learning or with deep learning required, and no need to install anything to follow along - all examples will be run on Google Colab. false https://pretalx.com/euroscipy-2019/talk/3LXMC8/ https://pretalx.com/euroscipy-2019/talk/3LXMC8/feedback/ Track 1 (Mitxelena) The Magic of Neural Embeddings with TensorFlow 2 Talk (long) 2019-09-05T11:30:00+00:00 11:30 00:30 Neural Embeddings are a powerful tool of turning categorical into numerical values. Given reasonable training data semantics present in the categories can be preserved in the numerical representation. euroscipy-2019-1400-the-magic-of-neural-embeddings-with-tensorflow-2 Oliver Zeigermann en Symbols, words, categories etc. need to be converted into numbers before they can be processed by neural networks or used into other ML methods like clustering or outlier detection. It is desirable to have the converted numbers represent semantics of the encoded categories. That means, numbers close to each other indicate similar semantics. In this session you will learn what you need to train a neural network for such embeddings. I will bring a complete example including code that I will share using TensorFlow 2 functional API and the Colab service. I will also share some tricks how to stabilize embeddings when either the model changes or you get more training data. false https://pretalx.com/euroscipy-2019/talk/QGZTDZ/ https://pretalx.com/euroscipy-2019/talk/QGZTDZ/feedback/ Track 1 (Mitxelena) High quality video experience using deep neural networks Talk (long) 2019-09-05T12:00:00+00:00 12:00 00:30 Video compression algorithms used to stream videos are lossy, and when compression rates increase they result in strong degradation of visual quality. We show how deep neural networks can eliminate compression artefacts and restore lost details. euroscipy-2019-2541-high-quality-video-experience-using-deep-neural-networks Marco BertiniTiberio Uricchio en Video compression algorithms result in a reduction of image quality, because of their lossy approach to reduce the required bandwidth. This affects commercial streaming services such as Netflix, or Amazon Prime Video, but affects also video conferencing and video surveillance systems. In all these cases it is possible to improve the video quality, both for human view and for automatic video analysis, without changing the compression pipeline, through a post-processing that eliminates the visual artefacts created by the compression algorithms. In this presentation we show how deep convolutional neural networks implemented in Python using TensorFlow, Scikit-Learn and Scipy can be used to reduce compression artefacts and reconstruct missing high frequency details that were eliminated by the compression algorithm. In particular, we follow an approach based on Generative Adversarial Networks, that in the scientific literature have obtained extremely high quality results in image enhancement tasks. However, to obtain these results, typically, large generators are employed, resulting in high computational costs and processing time, and thus the method can be implemented using GPUs usually available only on desktop machines. In this presentation we show also an architecture that can be used to reduce the computational cost and that can be implemented also on mobile devices. A possible application is to improve video conferencing, or live streaming. Since in these cases there is no original uncompressed video stream available, we report results using no-reference video quality metric showing high naturalness and quality even for efficient networks. false https://pretalx.com/euroscipy-2019/talk/SKNH3X/ https://pretalx.com/euroscipy-2019/talk/SKNH3X/feedback/ Track 1 (Mitxelena) In the Shadow of the Black Hole Keynote 2019-09-05T14:00:00+00:00 14:00 00:45 I will walk through the entire Event Horizon Telescope experiment and the global effort that led to the first-ever direct image of a black hole revealed to the world on April 10th of this year. euroscipy-2019-2638-in-the-shadow-of-the-black-hole Sara Issaoun en The Event Horizon Telescope (EHT) is a global network of millimeter-wavelength radio telescopes that uses Very Long Baseline Interferometry (VLBI) to synthesize the resolution of a single, Earth-sized telescope. In April 2017 the EHT observed the black hole at the center of the giant galaxy M87. Turning these observations into an image required the development of new software tools across the global EHT collaboration, and relied on a wealth of open-source software made available to the broader scientific community. In this talk, I will walk through the entire EHT experiment from the individual telescopes that record the data through the calibration, imaging, and interpretation of the observations that lead to the first-ever direct image of a black hole released to the world on April 10th of this year. false https://pretalx.com/euroscipy-2019/talk/GLSVQA/ https://pretalx.com/euroscipy-2019/talk/GLSVQA/feedback/ Track 1 (Mitxelena) A practical guide towards algorithmic bias and explainability in machine learning Talk (long) 2019-09-05T14:45:00+00:00 14:45 00:30 Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company euroscipy-2019-1132-a-practical-guide-towards-algorithmic-bias-and-explainability-in-machine-learning Alejandro Saucedo en Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents that have been covered by the media. It is certainly a challenging topic, as it could even be said that the concept of societal bias is inherently biased in itself depending on an individual’s (or group’s) perspective. In this talk we avoid re-inventing the wheel, instead we use traditional methods to simplify this issue so it can be tackled from a practical perspective. # Content In this talk we will cover the high level definitions of bias in machine learning to remove ambiguity, and we will demistify it through a hands on example. Our objective will be to automate the loan approval process for a company using machine learning. This will allow us to go through this challenge step by step, using key tools and techniques from latest research that will allow us to assess and mitigate undesired bias in our machine learning models. # Definitions We will begin by providing a high level definition of undesired bias as two constituent parts: “a-priori societal bias” and “a-posteriori statistical bias”. We will provide tangible examples of how undesired bias is introduced in each step. This initial section will introduce very interesting research findings in this topic. Spolier alert: We will take a pragmatic approach, showing how any non-trivial system will always have an inherent bias, so the objective is not to remove bias, but to make sure 1) you can get as close as possible to your objectives, and 2) you can make sure your objectives are as close as possible to the “ideal solution”. # Process In this talk we introduce a pragmatic process to assess bias in machine learning models through three key steps: 1) Data analysis, 2) Inference result analysis, and 3) Production metrics analysis. For each of these three steps we will walk through a real life example. We will be tasked with the automation of a loan approval process. We will show how some bias may affect our results in a negative way, as well as how we can use various techniques to ensure we perform a reasonable analysis. Our objective is not to show how to completely remove bias from a machine learning model, but instead what are the tools and techniques available, as well as the key touch-points & metrics to ensure the right domain experts are involved. # Topics covered We will cover fundamental topics in data science such as feature importance analysis, class imbalance assessment, model evaluation metrics, partial dependence, feature correlation, etc. More importantly, we will cover how these fundamentals can interact at different touch-points with the right domain experts to ensure undesired bias is identified and documented. All will be covered with a hands on example through a practical jupyter notebook experience. false https://pretalx.com/euroscipy-2019/talk/SKAH3U/ https://pretalx.com/euroscipy-2019/talk/SKAH3U/feedback/ Track 1 (Mitxelena) Tracking migration flows with geolocated Twitter data Talk (long) 2019-09-05T15:15:00+00:00 15:15 00:30 Detect migration flows worldwide using geolocated Twitter data: routes, settlement areas, mobility to more than one country, spatial integration in cities, etc. euroscipy-2019-1247-tracking-migration-flows-with-geolocated-twitter-data Antònia Tugores en Traditionally, migration and refugee flows information is obtained from surveys and border control operatives. Here we propose a method to detect migration flows worldwide using geolocated Twitter data. In particular and as a practical example, we focus on the current migratory crisis in Venezuela. We study if the flows calculated are quantitatively reliable when compared with official numbers at the country level. Our method is versatile and can be used to study different features of migration such as the routes, settlement areas, mobility to more than one country, spatial integration in cities, etc. false https://pretalx.com/euroscipy-2019/talk/HBHY9Q/ https://pretalx.com/euroscipy-2019/talk/HBHY9Q/feedback/ Track 1 (Mitxelena) Deep Learning for Understanding Human Multi-modal Behavior Talk 2019-09-05T15:45:00+00:00 15:45 00:15 Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks. euroscipy-2019-1343-deep-learning-for-understanding-human-multi-modal-behavior Ricardo Manhães Savii en Multimedia automatic learning has drawn attention from companies and governments for a significant number of applications for automated recommendations, classification, and human brain understatement. In recent years, and an increased amount of research has explored using deep neural networks for multimedia related tasks. Some government security and surveillance applications are automated detections of illegal and violent behaviors, child pornography and traffic infractions. Companies worldwide are looking for content-based recommendation systems that can personalize clients consumption and interactions by understanding the human perception of memorability, interestingness, attractiveness, aesthetics. For these fields like event detection, multimedia affect and perceptual analysis are turning towards Artificial Neural Networks. In this talk, I will present the theory behind multi-modal fusion using deep learning and some open challenges and their state-of-the-art. false https://pretalx.com/euroscipy-2019/talk/UQHFD8/ https://pretalx.com/euroscipy-2019/talk/UQHFD8/feedback/ Track 1 (Mitxelena) How to process hyperspectral data from a prototype imager using Python Talk 2019-09-05T16:30:00+00:00 16:30 00:15 We present a collection of software for handling hyperspectral data acquisition and preprocessing fully in Python utilising Xarray for metadata preservation from start to finish. euroscipy-2019-1451-how-to-process-hyperspectral-data-from-a-prototype-imager-using-python Matti Eskelinen en Our lab specializes in hyperspectral imaging using a spectral imager that combines tunable filters with colour sensors. Compared to simpler, more established imaging systems, this results in some unique challenges for the data processing. Especially, many of the original imaging parameters need to be preserved an d joined with calibration-derived values to actually compute radiance values from the raw sensor data since they are not automatically handled by the hardware. Handling this metadata with the resulting hyperspectral images results in combined datasets of large 3-dimensional datacube, and multiple smaller 2D and 1D arrays with linked dimensions. We have built our solution to this problem utilizing Xarray for handling the multiple arrays of data as well as the existing Dask integration for providing easy parallelization for the required preprocessing. Xarray also provides us many other advantages, such as: * Exploration of very complex multi-dimensional datasets (especially when utilizing holoviews) * Interoperability with the scikit ecosystem * Serialization to NetCDF preserving all the data in a single file However, our extensive and somewhat non-conventional use of Xarray does also bring out it's shortcomings when trying to develop such a library as ours, such as indexing issues with multiple possible overlapping coordinates and performance issues with complex datasets. false https://pretalx.com/euroscipy-2019/talk/VKDH9K/ https://pretalx.com/euroscipy-2019/talk/VKDH9K/feedback/ Track 1 (Mitxelena) Enhancing & re-designing the QGIS user interface – a deep dive Talk 2019-09-05T16:45:00+00:00 16:45 00:15 How can one of the largest code bases in open source Geographical Information Science – QGIS – be enhanced and re-designed? Through the powers of Python plugins. This talk demonstrates concepts on how to make QGIS more user-friendly. euroscipy-2019-1438-enhancing-re-designing-the-qgis-user-interface-a-deep-dive Sebastian M. Ernst en Having been around for two decades, QGIS clearly is an organically grown project. It has primarily been fulfilling the various special needs of its developers. From an outsider's perspective, it is an amazingly rich patchwork of features. However, some are deeply hidden in numerous layers of user interface elements, requiring intense training for getting used to. Others are only accessibly through APIs, requiring not only training but also programming skills. Being confronted with QGIS as professional users on a regular basis, we thought about what would make working with QGIS more attractive. What if QGIS has a pleasant, coherent theme, including not only colors but also icons? What if QGIS had the ability to store workbench configurations? What if QGIS had dedicated interface configurations for specific workflows? What if much more of the API's functionality was accessible through the GUI in a well-organized way? How could QGIS work in a useful manner with ribbons? How could the incredible amount of dialogs be tamed into tabs? We demonstrate (live) a series of user interface experiments – all of which are or will be [available online](https://github.com/qgist) as Python plugins. In this context, the current state of play with respect to Python and QGIS is explained in detail. The way QGIS is typically being distributed puts quite a few unusual limitations on Python plugin code. The case is made that some of those limitations are simply out of date and must be overcome, which may require help from the broader (scientific) Python community. We seek a conversation with the audience. false https://pretalx.com/euroscipy-2019/talk/HXH3GN/ https://pretalx.com/euroscipy-2019/talk/HXH3GN/feedback/ Track 2 (Baroja) Visual Diagnostics at Scale Talk (long) 2019-09-05T10:30:00+00:00 10:30 00:30 Machine learning is a search for the best combination of features, model, and hyperparameters. But as data grow, so does the search space! Fortunately, visual diagnostics can focus our search and allow us to steer modeling purposefully, and at scale. euroscipy-2019-1286-visual-diagnostics-at-scale Dr. Rebecca Bilbro en Even with a modestly-sized dataset, the hunt for the most effective machine learning model is *hard*. Arriving at the optimal combination of features, algorithm, and hyperparameters frequently requires significant experimentation and iteration. This leads some of us to stay inside algorithmic comfort zones, some to trail off on random walks, and others to resort to automated processes like gridsearch. But whatever path we take, we are often left in doubt about whether our final solution really is the optimal one. And as our datasets grow in size and dimension, so too does this ambiguity. Fortunately, many of us have developed strategies for steering model search. Open source libraries like [seaborn](https://seaborn.pydata.org/), [pandas](https://pandas.pydata.org/) and [yellowbrick](https://www.scikit-yb.org/en/latest/) can help make machine learning more informed with visual diagnostic tools like histograms, correlation matrices, parallel coordinates, manifold embeddings, validation and learning curves, residuals plots, and classification heatmaps. These tools enable us to tune our models with visceral cues that allow us to be more strategic in our choices. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the multi-dimensional realm in which our models operate. However, large, high-dimensional datasets can prove particularly difficult to explore. Not only do the majority of people struggle to visualize anything beyond two- or three-dimensional space, many of our favorite open source Python tools are not designed to be performant with arbitrarily big data. So how well *do* our favorite visualization techniques hold up to large, complex datasets? In this talk, we'll consider a suite of visual diagnostics — some familiar and some new — and explore their strengths and weaknesses with several publicly available datasets of varying size. Which suffer most from the curse of dimensionality in face of increasingly big data? What are the workarounds (e.g. sampling, brushing, filtering, etc.) and when should we use them? And most importantly, how can we continue to steer the machine learning process — not only purposefully but at scale? false https://pretalx.com/euroscipy-2019/talk/D7WAFW/ https://pretalx.com/euroscipy-2019/talk/D7WAFW/feedback/ Track 2 (Baroja) Histogram-based Gradient Boosting in scikit-learn 0.21 Talk (long) 2019-09-05T11:00:00+00:00 11:00 00:30 In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees. euroscipy-2019-1536-histogram-based-gradient-boosting-in-scikit-learn-0-21 Olivier Grisel en scikit-learn 0.21 was recently released and this presentation will give an overview its main new features in general and present the new implementation of Gradient Boosted Trees. Gradient Boosted Trees (also known as Gradient Boosting Machines) are very competitive supervised machine learning models especially on tabular data. Scikit-learn offered a traditional implementation of this family of methods for many years. However its computational performance was no longer competitive and was dramatically dominated by specialized state of the art libraries such as XGBoost and LightGBM. The new implementation in version 0.21 uses histograms of binned features to evaluate the tree node spit candidates. This implementation can efficiently leverage multi-core CPUs and is competitive with XGBoost and LightGBM. We will also introduce pygbm, a numba-based implementation of gradient boosted trees that was used as prototype for the scikit-learn implementation and compare the numba vs cython developer experience. false https://pretalx.com/euroscipy-2019/talk/H3NTLX/ https://pretalx.com/euroscipy-2019/talk/H3NTLX/feedback/ Track 2 (Baroja) Recent advances in python parallel computing Talk (long) 2019-09-05T11:30:00+00:00 11:30 00:30 *Modern hardware is multi-core*. It is crucial for Python to provide efficient parallelism. This talk exposes the current state and advances in Python parallelism, in order to help practitioners and developers take better decisions on this matter. euroscipy-2019-1380-recent-advances-in-python-parallel-computing Pierre Glaser en # Parallel computing in Python: Current state and recent advances *Modern hardware is multi-core*. It is crucial for Python to provide high-performance parallelism. This talk will expose to both data-scientists and library developers the current state of affairs and the recent advances for parallel computing with Python. The goal is to help practitioners and developers to make better decisions on this matter. I will first cover how Python can interface with parallelism, from leveraging external parallelism of C-extensions –especially the BLAS family– to Python's multiprocessing and multithreading API. I will touch upon use cases, e.g single vs multi machine, as well as and pros and cons of the various solutions for each use case. Most of these considerations will be backed by benchmarks from the [scikit-learn](https://scikit-learn.org/stable/) machine learning library. From these low-level interfaces emerged higher-level parallel processing libraries, such as concurrent.futures, [joblib](https://joblib.readthedocs.io/en/latest/) and [loky](https://loky.readthedocs.io/en/latest/) (used by [dask](https://dask.org/) and [scikit-learn](https://dask.org/)) These libraries make it easy for Python programmers to use safe and reliable parallelism in their code. They can even work in more exotic situations, such as interactive sessions, in which Python’s native multiprocessing support tends to fail. I will describe their purpose as well as the canonical use-cases they address. The last part of this talk will focus on the most recent advances in the Python standard library, addressing one of the principal performance bottlenecks of multi-core/multi-machine processing, which is data communication. We will present a [new API](https://docs.python.org/3.8/library/multiprocessing.shared_memory.html) for shared-memory management between different Python processes, and performance improvements for the serialization of large Python objects ([PEP 574](https://www.python.org/dev/peps/pep-0574/), [pickle extensions](https://github.com/cloudpipe/cloudpickle)). These performance improvements will be leveraged by distributed data science frameworks such as dask, [ray](https://ray.readthedocs.io/en/latest/) and [pyspark](https://spark.apache.org/docs/latest/api/python/index.html). false https://pretalx.com/euroscipy-2019/talk/EQNGSQ/ https://pretalx.com/euroscipy-2019/talk/EQNGSQ/feedback/ Track 2 (Baroja) Data sciences in a polyglot world with xtensor and xframe Talk (long) 2019-09-05T12:00:00+00:00 12:00 00:30 The main scientific computing programming languages have different models the main data structures of data science such as dataframes and n-d arrays. In this talk, we present our approach to reconcile the data science tooling in this polyglot world. euroscipy-2019-2675-data-sciences-in-a-polyglot-world-with-xtensor-and-xframe Sylvain CorlayWolf Vollprecht en In this presentation, we demonstrate how xtensor can be used to implement numerical methods very efficiently in C++, with a high-level numpy-style API, and expose it to Python, Julia, and R for free. The resulting native extension operates in-place on Python, Julia, and R infrastructures without overhead. We then dive into the xframe package, a dataframe project for the C++ programming language, exposing an API very similar to Python's xarray. Features of xtensor and xframe will be demonstrated using the xeus-cling jupyter kernel, enabling interactive use of the C++ programming language in the notebook. false https://pretalx.com/euroscipy-2019/talk/QU88B8/ https://pretalx.com/euroscipy-2019/talk/QU88B8/feedback/ Track 2 (Baroja) Understanding Numba Talk (long) 2019-09-05T14:45:00+00:00 14:45 00:30 In this talk I will take you on a whirlwind tour of Numba and you will be quipped with a mental model of how Numba works and what it is good at. At the end, you will be able to decide if Numba could be useful for you. euroscipy-2019-1418-understanding-numba Valentin Haenel en In this talk I will take you on a whirlwind tour of Numba, the just-in-time, type-specializing, function compiler for accelerating numerically-focused Python. Numba can compile the computationally intensive functions of your numerical programs and libraries from Python/NumPy to highly optimized binary code. It does this by inferring the data types used inside these functions and uses that information to generate code that is specific to those data types and specialised for your target hardware. On top of that, it does all of this on-the-fly---or just-in-time---as your program runs. This significantly reduces the potential complexity that traditionally comes with pre-compiling and shipping numerical code for a variety of operating systems, Python versions and hardware architectures. All you need in principle, is to `conda install numba` and decorate your compute intensive functions with `@nuba.jit`! This talk will equip you with a mental model of how Numba is implemented and how it works at the algorithmic level. You will gain a deeper understanding of the types of use-cases where Numba excels and why. Also, you will understand the limitations and caveats that exist within Numba, including any potential ideas and strategies that might alleviate these. At the end of the talk you will be in a good position to decide if Numba is for you and you will have learnt about the concrete steps you need to take to include it as a dependency in your program or library. false https://pretalx.com/euroscipy-2019/talk/EDNVGJ/ https://pretalx.com/euroscipy-2019/talk/EDNVGJ/feedback/ Track 2 (Baroja) PyPy meets SciPy Talk (long) 2019-09-05T15:15:00+00:00 15:15 00:30 PyPy, the fast and compliant alternative implementation of Python, is now compatible with the SciPy ecosystem. We'll explore how scientific programmers can use it. euroscipy-2019-1825-pypy-meets-scipy Ronan Lamy en PyPy is a fast and compliant implementation of Python. In other words, it's an interpreter for the Python language that can act as a full replacement for the reference interpreter, CPython. It's optimised to enable efficient just-in-time compilation of Python code to machine code, and has releases matching versions 2.7, and 3.6. It now also supports the main pillars of the scientific ecosystem (numpy, Cython, scipy, pandas, ...) thanks to its emulation layer for the C API of CPython. Performance is a major concern for Python programmers. When using CPython, this leads to splitting out the performance-sensitive parts of the computation and rewriting them in a faster, but less convenient, language such as C or Cython. With PyPy, there is no need to choose between clear, Pythonic code and good performance. This talk aims to convince the audience that PyPy should be part of every scientific programmer's toolbox. false https://pretalx.com/euroscipy-2019/talk/ULT3M7/ https://pretalx.com/euroscipy-2019/talk/ULT3M7/feedback/ Track 2 (Baroja) High performance machine learning with dislib Talk 2019-09-05T15:45:00+00:00 15:45 00:15 This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters. euroscipy-2019-1413-high-performance-machine-learning-with-dislib Javier Álvarez en PyCOMPSs is a distributed programming model and runtime for Python. PyCOMPSs' main goal is to make distributed computing accessible to non-expert developers by providing a simple programming model, and a runtime that automates many aspects of the parallel execution. In addition to this, PyCOMPSs is infrastructure agnostic, and can run on top of a wide range of platforms, from HPC clusters to clouds, and from GPUs to FPGAs. This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs. Inspired by scikit-learn, dislib programming interface is based on the concept of *estimators*. This provides a clean and easy-to-use API that highly increases the productivity of building large-scale machine learning pipelines. Thanks to PyCOMPSs, dislib can run in multiple distributed platforms without changes in the source code, and can handle up to billions of input samples using thousands of CPU cores. This makes dislib a perfect tool for scientists (and other users) that are not machine learning experts, but that still want to extract useful knowledge from extremely large data sets. false https://pretalx.com/euroscipy-2019/talk/SUFHZT/ https://pretalx.com/euroscipy-2019/talk/SUFHZT/feedback/ Track 2 (Baroja) Can we make Python fast without sacrificing readability? numba for Astrodynamics Talk 2019-09-05T16:30:00+00:00 16:30 00:15 There are several solutions to make Python faster, and choosing one is not easy: we would want it to be fast without sacrificing its readability and high-level nature. We tried to do it for an Astrodynamics library using numba. How did it turn out? euroscipy-2019-1824-can-we-make-python-fast-without-sacrificing-readability-numba-for-astrodynamics Juan Luis Cano Rodríguez en We are lucky there are very diverse solutions to make Python faster that have been in use for a while: from wrapping compiled languages (NumPy), to altering the Python syntax to make it more suitable to compilers (Cython), to using a subset of it which can in turn be accelerated (numba). However, each of these options has a tradeoff, and there is no silver bullet. poliastro is a library for Astrodynamics written in pure Python. All its core algorithms are accelerated with numba, which allows poliastro to be decently fast while having minimal code complexity and avoid using other languages. However, even though numba is quite mature as a library and most of the Python syntax and NumPy functions are supported, there are still some limitations that affect its usage. In particular, we strive to offer a high-level API with support for physical units and reusable functions which can be passed as arguments, which sometimes require using complex objects or introspective Python behavior which is not available. In this talk we will discuss the strategies and workarounds we have developed to overcome these problems, and what advanced numba features we can leverage. false https://pretalx.com/euroscipy-2019/talk/WCXYRQ/ https://pretalx.com/euroscipy-2019/talk/WCXYRQ/feedback/ Track 2 (Baroja) PSYDAC: a parallel finite element solver with automatic code generation Talk 2019-09-05T16:45:00+00:00 16:45 00:15 PSYDAC takes input from SymPDE (a SymPy extension for partial differential equations), applies a finite-element discretization, generates MPI-parallel code, and accelerates it with Numba, Pythran, or Pyccel. We present design, usage and performance. euroscipy-2019-1809-psydac-a-parallel-finite-element-solver-with-automatic-code-generation Yaman Güçlü en PSYDAC is a Python 3 library for the solution of partial differential equations. Its current focus is on isogeometric analysis using B-spline finite elements, but extensions to other methodologies are under consideration. In order to use PSYDAC, the user defines geometry and model equations in an abstract form using SymPDE, an extension of Sympy that provides the mathematical expressions and checks their semantic validity. Once a finite element discretization has been chosen, PSYDAC maps the abstract concepts into concrete objects, the basic building blocks being MPI-distributed vectors and matrices. Python code is generated for all the computationally intensive operations (matrix and vector assembly, matrix-vector products, etc.), and it is accelerated using either Numba, Pythran, or Pyccel. We present the library design, the user interface, and the performance results. false https://pretalx.com/euroscipy-2019/talk/7A3ZQF/ https://pretalx.com/euroscipy-2019/talk/7A3ZQF/feedback/ Track 3 (Oteiza) Exceeding Classical: Probabilistic Data Structures in Data Intensive Applications Talk (long) 2019-09-05T11:00:00+00:00 11:00 00:30 We interact with an increasing amount of data but classical data structures and algorithms can't fit our requirements anymore. This talk is to present the probabilistic algorithms and data structures and describe the main areas of their applications. euroscipy-2019-1131-exceeding-classical-probabilistic-data-structures-in-data-intensive-applications Andrii Gakhov en *Nowadays, research in every scientific domain, from medicine to astronomy, is impossible without processing huge amounts of data to check hypotheses, find new relations, and make discoveries. However, the traditional technologies which include data structures and algorithms, become ineffective or require too many resources. This creates a demand for various optimization techniques, new data processing paradigms, and, finally, appropriate algorithms.* The presentation is dedicated to *probabilistic data structures*, that is a common name for advanced data structures based mostly on different hashing techniques. Unlike classical ones, these provide approximated answers but with reliable ways to estimate possible errors and uncertainty. They are designed for extremely low memory requirements, constant query time, and scaling, the factors that are essential for data applications. It is hard to imagine a branch that requires learning from data, where they cannot be applicable. They are not necessarily new. Probably, everybody knows about the Bloom filter data structure, designed in the 70s, it efficiently solves the problem of performing membership queries (a task to decide whether some element belongs to the dataset or not) in a constant time without requirements to store all elements. This is an example of a probabilistic data structure, but there are much more that have been designed for various tasks in many domains. In this talk, I explain **the five most important problems in data processing** that occurred in different domains but **can be efficiently solved with probabilistic data structures and algorithms**. We cover the *membership querying*, *counting* of unique elements, *frequency* and *rank* estimation in data streams, and *similarity*. Everybody interested in such a topic is welcome to participate in contributing a free and open-source Python (Cython) library called [PDSA](https://github.com/gakhov/pdsa). false https://pretalx.com/euroscipy-2019/talk/TXQW9H/ https://pretalx.com/euroscipy-2019/talk/TXQW9H/feedback/ Track 3 (Oteiza) Driving a 30m Radio Telescope with Python Talk (long) 2019-09-05T11:30:00+00:00 11:30 00:30 The IRAM 30m radio telescope is one of the best in the world. The telescope control software, monitoring, data archiving as well as some of the data processing code is written in Python. We will describe how and why Python is used at the telescope. euroscipy-2019-1774-driving-a-30m-radio-telescope-with-python Francesco Pierfederici en The IRAM 30m radio telescope is one of the best in the world. It has been in operation non-stop since the mid 80s and is used to observe 24-hours a day, 365 days a year. All of the high-level telescope control software, monitoring, data archiving as well as some of the data processing software is written in Python. This choice, controversial at first, proved to be extremely successful making the IRAM 30m telescope extremely efficient. This talk will describe how Python is used at the telescope, the reasons behind these choices, lessons learned and future developments. false https://pretalx.com/euroscipy-2019/talk/8K7TA9/ https://pretalx.com/euroscipy-2019/talk/8K7TA9/feedback/ Track 3 (Oteiza) Matrix calculus with SymPy Talk (long) 2019-09-05T12:00:00+00:00 12:00 00:30 In this talk we explore a recent addition to SymPy which allows to find closed-form solutions to matrix derivatives. As a consequence, generation of efficient code for optimization problems is now much easier. euroscipy-2019-1321-matrix-calculus-with-sympy Francesco Bonazzi en The recent popularization of libraries relying on tensor algebra operations has led to a rise in the requirement of computational tools to calculate the gradient and hessian of tensorial expressions. The derivative of a tensor *A* by tensor *B* is the tensor containing all combinations of the elements of *A* derived by the elements of *B*. While tensor derivative operations are commonly supported by most computer algebra systems and frameworks through iterative algorithms, these derivatives can be expressed mathematically in closed-form solutions, which are computationally many orders of magnitude faster. SymPy has been recently extended in order to support the computation of symbolic matrix derivatives, and is currently the only computer algebra system endowed with this feature (lacking even in Wolfram Mathematica). Matrix calculus plays indeed a central role in optimization and machine learning, but was unfortunately often limited to pen on papers or chalk on blackboards. In this talk, we will introduce matrix expressions in SymPy, and address the three ways they can be represented: 1. explicit matrices with symbolic entries, 2. indexed symbols with proper summation convention, 3. implicit matrix expressions. We illustrate the way matrix derivatives are implemented for all three representations, with special emphasis to the third way, the fastest and most elegant. The derived expressions can then be passed to SymPy's code generation utilities and the resulting code can be compared in speed with other frameworks, such as TensorFlow. The support of matrix derivatives can turn SymPy into a simple tool to create the code for optimization algorithms or the code to train machine learning algorithms. The code generation utilities of SymPy are indeed aware of how to export matrix expressions into other programming languages and frameworks. We will give some examples using maximum likelihood estimation and the expectation-maximization algorithms. false https://pretalx.com/euroscipy-2019/talk/F8X9BY/ https://pretalx.com/euroscipy-2019/talk/F8X9BY/feedback/ Track 3 (Oteiza) VeloxChem: Python meets quantum chemistry and HPC Talk (long) 2019-09-05T14:45:00+00:00 14:45 00:30 A new and efficient Python/C++ modular library for real and complex response functions at the level of Kohn-Sham density functional theory euroscipy-2019-1188-veloxchem-python-meets-quantum-chemistry-and-hpc Olav Vahtras en Zilvinas Rinkevicius, Xin Li, Olav Vahtras, Manuel Brand, Karan Ahmadzadeh, Magnus Ringholm, Nanna List, and Patrick Norman With the ease of Python library modules, VeloxChem offers a front end to quantum chemical calculations on contemporary high-performance computing (HPC) systems and aims at harnessing the future compute power within the EuroHPC initiative. At the heart of this software lies a module for the evaluation of electron-repulsion integrals (ERIs) using the ObaraSaika recurrence scheme, where a high degree of efficiency is achieved by employing architecture-independent vectorization via OpenMP SIMD pragmas in the auto-generated C++ source code. The software is topology aware and with a Python-controlled work and task flow, the idle time is minimized using an MPI/OpenMP partitioning of resources. In the second software layer, we have implemented a highly accurate SCF start guess based on atomic densities and a first-level of iterations in a reduced version of the user-defined basis set, leading to a very smooth convergence in the subsequent standard DIIS scheme. This layer also includes vectorized and OpenMP/MPI parallelized modules for efficient generation of DFT grid points and weights as well as kernel integration. In the third software layer, we present real and complex response functions as to address dispersive and absorptive molecular properties in spectroscopy. The kernel module in this layer is the iterative linear response equation solver that we have formulated and implemented for a combination of multiple optical frequencies and multiple perturbation operators. With efficient use of computer memory, we enable the simultaneous reference to, and solving of, in the order of 1,000 response equations for sizable biochemical systems without spatial symmetry, and we can thereby determine electronic response spectra in arbitrary wavelength regions, including UV/vis and X-Ray, without resolving the sometimes embedded excited states in the spectrum. E.g. the electronic CD spectrum (involving the Cartesian sets of electric and magnetic perturbations) over a range of some 10 eV is obtained at a computational cost comparable to that of determining the transition energy of the lowest excited state, or optimizing the electronic structure of the reference state. false https://pretalx.com/euroscipy-2019/talk/FLM8R7/ https://pretalx.com/euroscipy-2019/talk/FLM8R7/feedback/ Track 3 (Oteiza) emzed: a Python based framework for analysis of mass-spectrometry data Talk (long) 2019-09-05T15:15:00+00:00 15:15 00:30 This talk is about emzed, a Python library to support biologists with little programming knowledge to implement ad-hoc analyses as well as workflows for mass-spectrometry data. euroscipy-2019-1271-emzed-a-python-based-framework-for-analysis-of-mass-spectrometry-data Uwe Schmitt en Many of the existing mass spectrometry data analysis tools are desktop applications designed for specific applications without support for customization. In addition, many of the commercial solutions offer no or only limited functionality for exporting results. In addition, the existing programming libraries in this area are scattered across different languages, mostly R, Java and Python. As a result, data analysis in this area often consists of manual import/export steps from/to various tools and self-developed scripts that prevent the reproducibility of results obtained or automated execution on high-performance infrastructures. emzed tries to avoid these problems by integrating existing libraries and tools from Python, R (and in the near future also Java) into an easy-to-use API. To support workflow development and increase confidence in end results emzed also offers tools for interactive visualization of mass spectrometry related data structures. The presentation introduces basics and concepts of emzed, some lessons learned and current development of the next version of emzed. false https://pretalx.com/euroscipy-2019/talk/YJQH7M/ https://pretalx.com/euroscipy-2019/talk/YJQH7M/feedback/ Track 3 (Oteiza) vtext: fast text processing in Python using Rust Talk 2019-09-05T15:45:00+00:00 15:45 00:15 In this talk, we present some of the benefits of writing extensions for Python in Rust. We then illustrate this approach on the [vtext](https://github.com/rth/vtext) project, that aims to be a high-performance library for text processing. euroscipy-2019-1456-vtext-fast-text-processing-in-python-using-rust Roman Yurchak en Scientific Python has historically relied on compiled extensions for performance critical parts of the code. In this talk, we outline how to write Rust extensions for Python using [rust-numpy](https://github.com/rust-numpy/rust-numpy), project. Advantages and limitations of this approach as compared to Cython or wrapping Fortran, C or C++ are also discussed. In the second part, we introduce the [vtext](https://github.com/rth/vtext) project that allows fast text processing in Python using Rust. In particular, we consider the problems of text tokenization, and (parallel) token counting resulting in a sparse vector representation of documents. These can then be used as input in machine learning or information retrieval applications. We outline the approach used in vtext and compare to existing solutions of these problems in the Python ecosystem. false https://pretalx.com/euroscipy-2019/talk/PUCWVY/ https://pretalx.com/euroscipy-2019/talk/PUCWVY/feedback/ Track 3 (Oteiza) pystencils: Speeding up stencil computations on CPUs and GPUs Talk 2019-09-05T16:30:00+00:00 16:30 00:15 [pystencils](https://i10git.cs.fau.de/pycodegen/pystencils) speeds up stencil computations on numpy arrays using a sympy-based high level description, that is compiled into optimized C code. euroscipy-2019-1421-pystencils-speeding-up-stencil-computations-on-cpus-and-gpus Martin Bauer en [Interactive Notebooks are available here](https://mybinder.org/v2/gh/mabau/pystencils/master?filepath=doc%2Fnotebooks). Many operations on structured arrays can be formulated as stencil codes, where the update of one array cell depends only on values in its local neighborhood. Stencil codes arise in many different fields, for example in image processing or in computational fluid dynamics by discretizing partial differential equations (PDEs) using finite differences or finite volume schemes. We present the [pystencils](https://i10git.cs.fau.de/pycodegen/pystencils) package that allows for fast execution of stencil codes on numpy arrays using code generation techniques. The stencil is formulated in sympy and transformed into an intermediate representation (IR). *pystencils* comes with a set of optimizing transformations that can be applied on this IR, for example cache blocking or explicit SIMD vectorization with intrinsics. The intermediate representation is transformed into C or CUDA code and automatically loaded as a C extension module. This approach yields highly efficient implementations, outperforming current acceleration techniques like Cython or numba. Additionally, together with the [waLBerla](https://www.walberla.net/) package, the resulting stencil codes can be run on large computing clusters, using MPI parallelization. *pystencils* also comes with functions to automatically derive the sympy-based stencil representation from a continuous PDE. Symbolic, continuous differential operators are automatically discretized by finite difference schemes of arbitrary order. We show two examples of large-scale setups run with *pystencils*: a phase-field method simulating solidification of alloys and a CFD simulation based on the lattice-Boltzmann method. false https://pretalx.com/euroscipy-2019/talk/7EYS3W/ https://pretalx.com/euroscipy-2019/talk/7EYS3W/feedback/ Track 3 (Oteiza) TelApy a Python module to compute free surface flows and sediments transport in geosciences Talk 2019-09-05T16:45:00+00:00 16:45 00:15 TelApy a Python module to compute free surface flows and sediments transport in geosciences and examples of how it is used to inter-operate with other Python libraries for Uncertainty Quantification, Optimization, Reduced Order Model. euroscipy-2019-1784-telapy-a-python-module-to-compute-free-surface-flows-and-sediments-transport-in-geosciences yoann audouin en This talk is focused on the application of TelApy module (www.opentelemac.org). TelApy aims to provide a Python wrapper of TELEMAC-MASCARET API (Application Program Interface). The goal of TelApy is to have a full control on the simulation while running a case. For example, it must allow the user to stop the simulation at any time step, get values of some variables and change them. In order to make this possible, a Fortran structure called instantiation was developed with the API. It contains a list of strings pointing to TELEMAC variables. This gives direct access to the physical memory of variables, and allows therefore to get and set their values. Furthermore, changes have been made in TELEMAC-MASCARET main subroutines to make hydraulic cases execution possible time step by time step. It is useful to drive the TELEMAC-MASCARET SYSTEM APIs using Python programming language. In fact, Python is a portable, dynamic, extensible, free language, which allows (without imposing) a modular approach and object oriented programming. In addition of benefits of this programming language, Python offers a large amounts of interoperable libraries. The link between various interoperable libraries with TELEMAC-MASCARET SYSTEM APIs allows the creation of an ever more efficient computing chain able to more finely respond to various complex problems. Therefore, the TelApy module has the ambition to enable a new way of use for the TELEMAC-MASCARET system. In particular one can think about high performance computing for the calculation of uncertainties, optimization, code coupling and so on. The objectives of this talk is to present some examples of the TelApy module in the case of Uncertainty Quantification, Optimization, Reduced Order Model. false https://pretalx.com/euroscipy-2019/talk/LE7AAH/ https://pretalx.com/euroscipy-2019/talk/LE7AAH/feedback/