{"$schema": "https://c3voc.de/schedule/schema.json", "generator": {"name": "pretalx", "version": "2024.2.0.dev0"}, "schedule": {"url": "https://pretalx.com/euroscipy-2019/schedule/", "version": "0.10", "base_url": "https://pretalx.com", "conference": {"acronym": "euroscipy-2019", "title": "EuroSciPy 2019", "start": "2019-09-02", "end": "2019-09-06", "daysCount": 5, "timeslot_duration": "00:05", "time_zone_name": "UTC", "colors": {"primary": "#8CAAE6"}, "rooms": [{"name": "Track 1 (Mitxelena)", "guid": "ce709522-7868-5963-994b-f188cf865f49", "description": null, "capacity": 441}, {"name": "Track 2 (Baroja)", "guid": "80b23020-5895-5cbf-a86d-d0fa63d36b2f", "description": null, "capacity": 161}, {"name": "Track 3 (Oteiza)", "guid": "357064ac-3ea3-528f-abe9-f3c171715b29", "description": null, "capacity": 150}, {"name": "Track4 (Chillida)", "guid": "d160d434-8351-5cc7-ad37-04fcbb7e597b", "description": null, "capacity": 80}, {"name": "Maintainer's track (Elhuyar)", "guid": "50a293dd-cc6d-529d-b871-2bf4e8c8f6a1", "description": null, "capacity": 24}, {"name": "Posters at 16:00", "guid": "704c4cdd-f0ca-5a86-9e38-22b95c1a2ad4", "description": null, "capacity": null}], "tracks": [], "days": [{"index": 1, "date": "2019-09-02", "day_start": "2019-09-02T04:00:00+00:00", "day_end": "2019-09-03T03:59:00+00:00", "rooms": {"Track 2 (Baroja)": [{"url": "https://pretalx.com/euroscipy-2019/talk/QPKHMG/", "id": 1396, "guid": "1d9c0ef9-059e-563f-b451-9a7fa013ea34", "date": "2019-09-02T09:00:00+00:00", "start": "09:00", "logo": null, "duration": "01:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1396-getting-started-with-jupyterlab", "title": "Getting Started with JupyterLab", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "JupyterLab is used for essentially all other tutorials at EuroSciPy. This tutorial gives an overview over the basic functionality and shows how to use some of the many tools it provides to simplify your Python programming workflow.", "description": "This tutorial is hands-on. It is designed  for participants who haven't used the JupyterLab yet or have only minimal experience with it. Participant will work along with the trainer and learn how a Jupyter Notebook work by using some basic features.\r\n\r\nSome of the topics are:\r\n\r\n* Client-server concept\r\n* How cells work\r\n* Basic markdown\r\n* Magic commands overview\r\n* Some magic commands in more detail\r\n* Debugging basics\r\n* Basic timing and profiling\r\n* Extensions\r\n* History of variables\r\n* Saving to files\r\n* and more\r\n\r\nThere will be room for questions during the tutorial as well as a dedicated FAQ session at the end.\r\n\r\nAfter this tutorial participants should be able to comfortably follow  the other tutorials that are delivered with a Jupyter Notebook.\r\n\r\n# Requirements and set up instructions\r\n\r\nTraining will be doe wit Python 3.7 and the latest Jupyter Lab version.\r\n\r\n*  Install Anaconda\r\n\r\nalternatively\r\n\r\n* Install Miniconda and `conda install jupyterlab`\r\n\r\nalternatively\r\n\r\n* Create a new conda environment: \r\n    + `conda create -n jupyterlabtutorial python=3.7 jupyterlab` and activate it with \r\n    + `conda activate   jupyterlabtutorial`", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c84d882a-39c5-51be-b6d5-7b7408e7002b", "id": 4, "code": "9KSJ3K", "public_name": "Mike M\u00fcller", "avatar": "https://pretalx.com/media/avatars/4f3782b004830f622e19029e5f7fc146_41xklgK.jpg", "biography": "Founder and trainer of Python Academy", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/KRNP7Y/", "id": 2502, "guid": "9e585e96-c0a5-5f7a-ad92-6212b823e0dd", "date": "2019-09-02T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "01:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-2502-never-get-in-a-battle-of-bits-without-ammunition", "title": "Never get in a battle of bits without ammunition", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "The `numpy` package takes a central role in Python scientific ecosystem. \r\nThis is mainly because `numpy` code has been designed with\r\nhigh performance in mind. This tutorial will introduce the main features of in numpy in `90` mins.", "description": "# Outline\r\n\r\n**Part 1** Numpy Basics\r\n\r\n- Introduction to NumPy Arrays\r\n    - numpy internals schematics\r\n    - Reshaping and Resizing\r\n- Numerical Data Types\r\n    - Record Array\r\n    \r\n**Part 2** Indexing and Slicing\r\n    \r\n- Indexing numpy arrays\r\n    - fancy indexing\r\n    - array masking\r\n    \r\n- Slicing & Stacking\r\n- Vectorization & BroadCasting\r\n\r\n**Part 3**  \"Advanced\" NumPy\r\n\r\n- Serialisation & I/O\r\n    - `.mat` files\r\n- Array and Matrix\r\n    - Matlab compatibility\r\n- Memmap \r\n- Bits of Data Science with NumPy\r\n- NumPy beyond classic `numpy`\r\n\r\n### Python version\r\n\r\nThe minimum recommended version of Python to use for this tutorial is **Python 3.5**, although \r\nPython 2.7 should be fine, as well as previous versions of Python 3. \r\n\r\nPy3.5+ is recommended due to a reference to the `@` operator in the linear algebra notebook.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "47dbdbbd-1657-53bd-8f04-80611fe324a5", "id": 1097, "code": "7FNYLG", "public_name": "Valerio Maggio", "avatar": "https://pretalx.com/media/avatars/me_Nwbnd3M.jpg", "biography": "Valerio Maggio is a Data Scientist and Post-doc Researcher. \r\nHe has a Ph.D. in Computer Science from the University of Naples \u201cFederico II\u201d, and he is currently enrolled as \r\nResearch/Cloud Software Engineer at FBK/MPBA. \r\nHis research interests focus on Reproducible Science and Machine/Deep Learning methods for Computational Biology and Precision Medicine.\r\nValerio is also a very active fellow in the Italian Python community and member of the organising committee of many \r\nPython Conferences (i.e. EuroPython, PyCon/PyData Italy, EuroSciPy).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/G7CTX8/", "id": 2641, "guid": "cb798535-d0f3-5f59-99fd-015dc4f2c543", "date": "2019-09-02T14:00:00+00:00", "start": "14:00", "logo": null, "duration": "01:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-2641-introduction-to-pandas", "title": "Introduction to pandas", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "This tutorial is an introduction to pandas for people new to it. We will cover how to open datasets, perform some analysis, apply some transformations and visualize the data", "description": "pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with \u201crelational\u201d or \u201clabeled\u201d data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.\r\n\r\nThis tutorial will use couple of example data sets to show what pandas can do, and get an idea on how to work with data using pandas.\r\n\r\nIt is recommended to bring your own laptop with the latest version of Anaconda, pandas, Jupyter, and the repository of the tutorial cloned. See the exact instructions here: https://github.com/datapythonista/pandas-tutorials", "recording_license": "", "do_not_record": false, "persons": [{"guid": "73a7f484-df4e-5926-ba4d-af8429e588d4", "id": 1100, "code": "SXWKLX", "public_name": "Marc Garcia", "avatar": "https://pretalx.com/media/avatars/photo5821414706567558354_s4wALp9.jpg", "biography": "Marc Garcia is a pandas core developer and Python fellow.\r\n\r\nHe has been working in Python for more than 12 years, and worked as data scientist and data engineer for different companies such as Bank of America, Tesco and Badoo.\r\n\r\nHe is a regular speaker at PyData and PyCon conferences, and a regular organizer of sprints.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track 3 (Oteiza)": [{"url": "https://pretalx.com/euroscipy-2019/talk/A8KBUB/", "id": 1389, "guid": "0863ce8c-ff34-50f1-9df8-5ee1f437c20e", "date": "2019-09-02T09:00:00+00:00", "start": "09:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1389-hands-on-tensorflow-2-0", "title": "Hands-on TensorFlow 2.0", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "A hands-on introduction to TensorFlow 2.0 at an intermediate difficulty level, with code examples for Deep Dream, Style Transfer, and Image Colorization.", "description": "A hands-on introduction to TensorFlow 2.0 at an intermediate difficulty level. In this 90 minute tutorial, we will briefly introduce TensorFlow 2.0, then dive in to writing code. We will complete four short exercises on Deep Dream, Style Transfer, Image colorization, and GANs (if time allows). This tutorial is intermediate level, for folks with prior Deep Learning experience. You will need a laptop with an internet connection, there is nothing to install in advance.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6fe3b4fb-dd25-521f-b528-58168a4acdfb", "id": 1480, "code": "QEVSWY", "public_name": "Josh Gordon", "avatar": "https://pretalx.com/media/avatars/j3.png", "biography": "Josh Gordon is a Developer Advocate at Google, and teaches Applied Deep Learning at Columbia University and Pace University. You can find him on Twitter at https://twitter.com/random_forests", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/Q79NND/", "id": 1796, "guid": "5ccfbcf7-7395-5845-bec6-ad8ff8034987", "date": "2019-09-02T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1796-deep-diving-into-gans-from-theory-to-production-with-tensorflow-2-0", "title": "Deep Diving into GANs: From Theory to Production with TensorFlow 2.0", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "GANs are one of the hottest topics in the ML arena; however, they present a challenge for the researchers and the engineers alike. This workshop will guide you through both the theory and the code needed to build a GAN and put into production.", "description": "GANs are the new hottest topic in the ML arena; however, they present a challenge for the researchers and the engineers alike. Their design, and most importantly, the code implementation has been causing headaches to the ML practitioners, especially when moving to production.\r\n\r\nThe workshop aims at providing a complete understanding of both the theory and the practical know-how to code and deploy this family of models in production. By the end of it, the attendees should be able to apply the concepts learned to other models without any issues.\r\n\r\nWe will be showcasing all the shiny new APIs introduced by TensorFlow 2.0 by showing how to build a GAN from scratch and how to \"productionize\" it by leveraging the AshPy Python package that allows to easily design, prototype, train and export Machine Learning models defined in TensorFlow 2.0.\r\n\r\n------\r\n\r\nThe workshop is composed of\r\n\r\n- Theoretical introduction\r\n- GANs from Scratch in TensorFlow 2.0\r\n- High-performance input data pipeline with TensorFlow Datasets\r\n- Introduction to the AshPy API\r\n- Implementing, training, and visualizing DCGAN using AshPy\r\n- Serving TF2 Models with Google Cloud Functions\r\n\r\nThe materials of the workshop will be openly provided via GitHub (https://github.com/zurutech/gans-from-theory-to-production) prior to the event and will be run on Colab leveraging the free GPU \r\n\r\n**Note**: the workshop requires Python 3.7 to run, therefore the colab support is still uncertain. The attendees are encouraged to bring their own devices with Python 3.7 installed and ready to use.\r\n\r\n## Requirements and set up instructions\r\n\r\nTwo options available:\r\n\r\n1. (recommended). Use Google Colab & Binder. Every notebook has a button to lunch the correct tool. Just use it.\r\n2.  Local setup: follow the instructions in the README https://github.com/zurutech/gans-from-theory-to-production", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8a1b48e1-f1ea-5bbe-a80b-c04fbd0cfd91", "id": 82, "code": "GAWBSA", "public_name": "Michele \"Ubik\" De Simoni", "avatar": "https://pretalx.com/media/avatars/48255185872_a968d54df2_o1_con1iRU.jpg", "biography": "Lover of \ud83d\udc27\ud83d\udc27. Pythonista \ud83d\udc0d. Machine Learning Engineer \ud83e\udd16. Mad Scientist. Evil Mastermind. Walking Beard. Tinkerer. Nerd. Tech junkie.\r\n\r\nProgramming turned the tide of a crippling, panic-attacks inducing depression caused by a profound unsatisfaction with my choice of an academic career. During 2016 and 2017 I developed a burning passion for Python, robotics, Linux, Open Source/Hardware/Data/Science, and all things Machine Learning, I devoted myself day and night to learn the craft. My passion never waned and my drive toward knowing more grows each day.\r\n\r\nCurrently employed as a Machine Learning Engineer at zuru.tech, there I lead the research effort on GANs (and everything else relating to either the generative or the adversarial world of Deep Learning), help with Computer Vision and act as the Supreme Overlord of the Data Pipeline that feeds our AIs.", "answers": []}, {"guid": "1c28378b-adce-58d9-bcca-f85ad3ae4546", "id": 171, "code": "XT3JVZ", "public_name": "Paolo Galeone", "avatar": "https://pretalx.com/media/avatars/me_hk.jpg", "biography": "Computer Engineer + Machine Learning & Computer Vision researcher + Google Developer Expert in Machine Learning.\r\n\r\nHe received his MSc in 2016 with a thesis on the application of convolutional neural networks to the object detection and classification problems. After this, he took up research as a career and became a research fellow at the Computer Vision Laboratory at the University of Bologna, Italy, where he worked on a broad range of topics such as object detection, classification, coordinate regression, and anomaly detection. Currently, he leads the computer vision and machine learning department at ZURU Tech, Italy.\r\n\r\nWhile in school, university and at work, he developed several projects spanning a broad range of topics such as database abstraction layers, a complete social network covering both the back-end and front-end aspects, several tools for machine learning developers and researchers with the aim to simplify the machine learning pipeline.\r\nAll his computer vision and machine learning projects have been implemented using the framework he loves: Tensorflow.\r\n\r\nAll these projects were completely open-source and are available on his Github profile at https://github.com/galeone.\r\nHe also blogs about Computer Vision, Machine Learning and Linux system administration. You can find his blog at https://pgaleone.eu/", "answers": []}, {"guid": "75292022-09cd-53c7-9a04-58c6f6eb2782", "id": 2455, "code": "X3NHDJ", "public_name": "Federico Di Mattia", "avatar": null, "biography": "Machine Learning & Computer Vision Engineer.\r\n\r\nHe received his MSc in Computer Science and Engineering at the University of Modena and Reggio Emilia. He spent a period working with the Computer Vision team at the Queen Mary University in London where he worked on his research thesis on a cognitive people tracker.\r\n\r\nHe worked on different Computer Vision related projects regarding security systems and worked on image processing algorithms. With a passion for psychology and negotiation, he tries always to get the best from people and the work environment.\r\n\r\nRecently, working in Zuru, he could go in-depth in the research and studies of deep-learning algorithms applied to numerous areas using Tensorflow and Keras. During the last year, he had the chance to work on multiple tasks such as classification, segmentation, and anomaly detection.\r\n\r\nCurrently, he is working on Generative Adversarial Networks and many different Computer Vision tasks.\r\n\r\nTwitter: @iLeW Github: https://github.com/iLeW", "answers": []}, {"guid": "239893b9-9a23-5fd2-9e6e-053a70eec919", "id": 2456, "code": "HHHYER", "public_name": "Emanuele Ghelfi", "avatar": null, "biography": "Machine Learning and Computer Vision Engineer @ ZURU Tech Italy\r\n\r\nEmanuele received the M.Sc. Degree in Computer Science and Engineering at Politecnico di Milano with 110L/110 in December 2018. In particular, he followed the Artificial Intelligence (AI) track. The AI track includes courses like Game Theory, Machine Learning, Robotics, Image Analysis and Computer Vision, Autonomous Agent and Multi-Agent Systems and Natural Language Processing.\r\nHis thesis is located in the Machine Learning field, and more precisely in the Reinforcement Learning field. The paper from his thesis has been accepted at the International Conference on Machine Learning (ICML) 2019.\r\n\r\nSince November 2018, he has been working as a Machine Learning and Computer Vision Engineer at Zuru Tech Italy.\r\nCurrently, he's working on Generative Models (GANs) and on Recurrent Models (LSTM). In addition, he deals with Computer Vision tasks applied to complex industrial processes.\r\n\r\nGitHub: https://github.com/EmanueleGhelfi\r\nWebsite: emanueleghelfi.github.io", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/L8LMQR/", "id": 1494, "guid": "b5ba1512-1fb4-5fad-ac10-7ed00d53a9b6", "date": "2019-09-02T14:00:00+00:00", "start": "14:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1494-create-cuda-kernels-from-python-using-numba-and-cupy-", "title": "Create CUDA kernels from Python using Numba and CuPy.", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "We'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library.", "description": "### Abstract \r\nWe'll explain how to do GPU-Accelerated numerical computing from Python using the Numba Python compiler in combination with the CuPy GPU array library. Numba is an open source compiler that can translate Python functions for execution on the GPU without requiring users to write any C or C++ code. Numba's just-in-time compilation ability makes it easy to interactively experiment with GPU computing in the Jupyter notebook. Combining Numba with CuPy, a nearly complete implementation of the NumPy API for CUDA, creates a high productivity GPU development environment. Learn the basics of using Numba with CuPy, techniques for automatically parallelizing custom Python functions on arrays, and how to create and launch CUDA kernels entirely from Python. Access to appropriate hardware will be provided in the form of access to GPU based cloud resources.\r\n\r\n### Libraries\r\n* https://numba.pydata.org/\r\n* https://cupy.chainer.org/\r\n\r\n### Requirements and set up instructions\r\n* Cloud based access to GPUs will be provided, please bring a laptop with an operating system and a browser. Chrome is usually fine.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2e824709-7612-50cb-9d67-5bf963037d0f", "id": 1499, "code": "LFHYZA", "public_name": "Valentin Haenel", "avatar": "https://pretalx.com/media/avatars/1570bec4897bb18b702105182f2951b5_E7FScDr.jpg", "biography": "Valentin is a long-time \"Python for Data\" user and developer who still\r\nremembers hearing Travis Oliphant's keynote at the EuroScipy 2007. This was\r\nduring a time where he first became aware of the nascent scientific Python\r\nstack. He started using Python for simple modeling of spiking neurons and\r\nevaluation of data from perception experiments during his Masters degree in\r\ncomputational neuroscience. Since then he has been active as a contributor\r\nacross more than 75 open source projects. For example, within the Blosc\r\necosystem where he still maintains and contributes to Python-Blosc and\r\nBloscpack.  Furthermore, he has acquired significant experience as a Git\r\ntrainer and consultant and had published the first German language book about\r\nthe topic in 2011. In 2014 and 2015 he helped kickstart the PyData Berlin\r\ncommunity alongside a few other volunteers and co-organized the first two\r\neditions of the PyData Berlin Conference. He now works for Anaconda as a\r\nsoftware engineer / open source developer on the Numba project.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/MNAGWC/", "id": 1455, "guid": "c7d7436e-2c42-5514-b665-7578f1b396a4", "date": "2019-09-02T16:00:00+00:00", "start": "16:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1455-speed-up-your-python-code", "title": "Speed up your python code", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "In this tutorial we will see how to profile and speed up Python code, from a pure Python implementation to an optimized Cython code.", "description": "Through a simple example we will see how to optimize Python code. First we will introduce a few tools to profile and visualize the performances of our code, such as Perf and SnakeViz. Then we will incrementally optimize our code using Cython, a lower level compiled language designed to make a bridge between C and Python. As an alternative, we will also use Numba, a Python just in time compiler. Finally, we will see how to parallelize our code to speed it up further.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "930c3d07-cf05-5775-adcc-a89f02429ba9", "id": 1518, "code": "WZGMW9", "public_name": "J\u00e9r\u00e9mie du Boisberranger", "avatar": null, "biography": "I'm a software engineer at INRIA, essentially involved in scikit-learn, an open source Python library for machine learning.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track4 (Chillida)": [{"url": "https://pretalx.com/euroscipy-2019/talk/DU9CAN/", "id": 2696, "guid": "b854cec1-9f1c-5e96-9c97-eb79e17626b9", "date": "2019-09-02T09:00:00+00:00", "start": "09:00", "logo": null, "duration": "01:30", "room": "Track4 (Chillida)", "slug": "euroscipy-2019-2696-3d-image-processing-with-scikit-image", "title": "3D image processing with scikit-image", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "This tutorial will introduce how to analyze three dimensional stacked and volumetric images in Python, mainly using scikit-image.", "description": "This tutorial will introduce how to analyze three dimensional stacked and volumetric images in Python, mainly using scikit-image. We start the tutorial checking a brief overview of scikit-image and how it relates to packages in the scientific Python ecosystem, such as NumPy, SciPy and matplotlib. Then, we discuss how to process two and three dimensional data through several steps: first, we will pre-process the data using filtering, binarization and segmentation techniques. After that, we cover how to inspect, count and measure attributes of objects and regions of interest in the data. At the end, we present the visualization of large 3D data. Real-world examples are given from domains such as materials science and biology.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5a221ca7-fc7e-5062-969a-8d6eb7e29004", "id": 2489, "code": "EM8RQM", "public_name": "Alexandre de Siqueira", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/TQH9FG/", "id": 1406, "guid": "8728b810-d9aa-55ea-af6e-08c80000ebb8", "date": "2019-09-02T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "01:30", "room": "Track4 (Chillida)", "slug": "euroscipy-2019-1406-reproducible-data-science-in-python", "title": "Reproducible Data Science in Python", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "In this tutorial, we will take a detailed look at the concept of _reproducibility_, survey the landscape of existing solutions, and, using one solution in particular, [Renku](https://renkulab.io), we will do some hands-on work.", "description": "The expectation of reproducibility in scientific work has been established for several hundred years, and, increasingly, communities and funding sources are actually demanding it. Within the Python ecosystem, there are now a variety of tools available to support reproducible data science, but choosing and using one is not always straightforward. One source of confusion is simply the number of available options. Beyond that, the term \"reproducibility\" can mean multiple things, making it difficult to compare tools.\r\n\r\nIn this tutorial, we will examine _reproducibility_ from the perspective of the philosophy of science. That will give us the concepts and vocabulary necessary to precisely understand and discuss different definitions of the term and allow us to identify the technologies that provide the building blocks for reproducible data science. We will briefly survey the landscape of existing solutions and then spend the remaining time looking at one solution in particular, Renku, which we will use to work end-to-end through a reproducible data-science scenario.\r\n\r\n* 0:00 - 0:35 Introduction & Background\r\n\t* 0:00 - 0:15 Reproducibility, a philosophy of science perspective\r\n\t\t* Overview of reproducibility issues in different domains of science (Nature 2016 survey results)\r\n\t\t* Definition of different degrees of reproducibility: _Reproducibility_, _replicability_, and _repeatability_\r\n\t\t* Examine the function of reproducibility in the scientific process\r\n\t* 0:15 - 0:25 Building blocks for reproducibility: clean code, workflow automation, version control, containerization, provenance tracking\r\n\t* 0:25 - 0:35 Survey of the Tool Landscape: Binderhub, Pachyderm, Beaker, Gigantum, Whole Tale, SingularityHub, DVC, Stencila, dotscience, amie, CodeOcean, Renku\r\n\t\r\n* 0:35 - 1:30 Hands-on session with Renku where we will develop a typical data-science use-case, focusing on the building blocks of reproducibility along the way. \r\n\r\n## Requirements and set up instructions\r\n\r\nWe will run the tutorial on https://renkulab.io so please register and create an account following [these instructions](https://github.com/SwissDataScienceCenter/reproducible-data-science/blob/master/README-renkulab.md).\r\n\r\nTo follow along with the slides, go [here](https://github.com/SwissDataScienceCenter/reproducible-data-science/blob/euroscipy2019/presentation/index.ipynb)", "recording_license": "", "do_not_record": false, "persons": [{"guid": "34f87b49-1e75-5ac6-8b13-90cdbfd5a167", "id": 1488, "code": "89TPKE", "public_name": "Chandrasekhar Ramakrishnan", "avatar": "https://pretalx.com/media/avatars/25aa43cb5b3358bf5ef306d0ff3acd71_w3GDGWo.jpg", "biography": "Chandrasekhar studied mathematics at the University of California, Berkeley (B.A. 1997) and art and computer science at the University of California, Santa Barbara (M.A. 2003). He has worked as a software developer and consultant for companies, research institutions, and NGOs in the US, Germany, and Switzerland. Since 2009, he has been at ETH Z\u00fcrich supporting projects by developing software solutions for data management, analysis, and visualization. In addition to his work at ETH, he teaches data visualization at Propulsion Academy and, as [Illposed](http://illposed.com) works on artistic projects that incorporate data as a central component.", "answers": []}, {"guid": "e37a8d89-c629-577a-a2f6-6ec30c6aac43", "id": 1508, "code": "UDTFUP", "public_name": "Rok Ro\u0161kar", "avatar": "https://pretalx.com/media/avatars/91ca991159c4306c6c689c8730cbba69_Ljtc6H3.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/3MG8K3/", "id": 1394, "guid": "fd2efcf8-3aee-5135-9774-460169a6d3c1", "date": "2019-09-02T14:00:00+00:00", "start": "14:00", "logo": null, "duration": "01:30", "room": "Track4 (Chillida)", "slug": "euroscipy-2019-1394-building-data-pipelines-in-python-airflow-vs-scripts-soup", "title": "Building data pipelines in Python: Airflow vs scripts soup", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "In this workshop, you will learn how to migrate from \u2018scripts soups\u2019 (a set of scripts that should be run in a particular order) to robust, reproducible and easy-to-schedule data pipelines in Airflow.", "description": "## Introduction (5 minutes)\r\nFormat: presentation\r\nGo over the agenda\r\nList the relevant resources\r\nMake sure everyone has followed the installation instructions\r\n\r\n## Intro to data pipelines \r\nFormat: presentation\r\nGo over the components of traditional data science pipelines\r\nPresentation of the scripts soup anttipatern\r\n\r\n## Creating a script soup \r\nFormat: hands-on \r\nThe attendees will perform an ETL task on some data using a set of independent scripts. \r\nIn this exercise, I will provide and explain the code and explain what we are trying to achieve with this pseudo-pipeline. The attendees will have a chance to try and reproduce it themselves.\r\n\r\n## Introduction to Airflow and DAGS\r\nFormat: presentation\r\nIntroduce the concept of DAGs (directed acyclic graphs) \r\nPresent and introduce the components of Airflow\r\nAirflow documentation\r\n\r\n## Set up a local instance of Airflow \r\nFormat: hands-on \r\nThe attendees will create a local instance of Airflow and explore the sample DAGS provided. \r\nThey will be introduced to the scheduling capabilities of the tool and track the status of the pipelines using the web GUI.\r\n\r\n## ETL task on Airflow\r\nFormat: hands-on \r\nI will provide hints on how to transform the scripts soup into Airflow DAGS. \r\nFor this, I will use the pseudo code and other pedagogical approaches inspired by the software carpentry lessons to direct the attendees to the deployment of their first DAG in Airflow. \r\n\r\n## Wrap up and questions\r\nFormat: Q&A\r\n\r\n## Setup\r\n\r\n<https://opendata-airflow-tutorial.readthedocs.io/en/latest/setup.html>", "recording_license": "", "do_not_record": false, "persons": [{"guid": "f9849d9c-b12c-5e6d-b79d-fbc257982a5c", "id": 1483, "code": "M9T83C", "public_name": "Dr. Tania Allard", "avatar": "https://pretalx.com/media/avatars/Tania_Allard19-07-170429_RBUDAdg.jpg", "biography": "Tania is a Research Engineer and developer advocate with vast experience in academic research and industrial environments. Her main areas of expertise are within data-intensive applications, scientific computing, and machine learning. One of her main areas of expertise is the improvement of processes, reproducibility and transparency in research, data science and artificial intelligence.  \r\nOver the last few years, she has trained hundreds of people on scientific computing reproducible workflows and ML models testing, monitoring and scaling and delivered talks on the topic worldwide. \r\n\r\nShe is passionate about mentoring, open source, and its community and is involved in a number of initiatives aimed to build more diverse and inclusive communities. She is also a contributor, maintainer, and developer of a number of open source projects and the Founder of Pyladies NorthWest UK.\r\n\r\n\r\nTania has vast experience providing both workshops and talks all over the world, from big conferences such as PyCon to smaller user groups or interest groups. She is interested in both technical talks as well as talks covering community aspects.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/J3HEDH/", "id": 1390, "guid": "5712e1a0-f831-5024-806c-a5d5958d7c37", "date": "2019-09-02T16:00:00+00:00", "start": "16:00", "logo": null, "duration": "01:30", "room": "Track4 (Chillida)", "slug": "euroscipy-2019-1390-performing-quantum-measurements-in-qutip", "title": "Performing Quantum Measurements in QuTiP", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you!\r\n\r\nNo previous quantum mechanics experience required!", "description": "Would you like to create (virtual) qubits and perform measurements on them using Python? Perhaps even explore entanglement and quantum teleportation? If so, this tutorial is for you!\r\n\r\nNo previous quantum mechanics experience required. It will be helpful to be comfortable with Python and only a little scared of matrix multiplication.\r\n\r\nThe goal of the workshop is for each participant to:\r\n\r\n  * Understand what a qubit is\r\n  * Be able to create a 1-qubit state\r\n  * Be able to measure a 1-qubit state\r\n  * Be able to create a 2-qubit state\r\n  * Be able to create an entangled 2-qubit state\r\n  * Be able to measure part of an entangled state\r\n  * Be able to teleport part a qubit using an entangled state\r\n\r\nTo each of these please add \"in Python with QuTiP\" and \"with a good understanding of what they're doing\".\r\n\r\nThe target audience is people who are:\r\n\r\n  * interested in quantum mechanics but are not experts\r\n  * comfortable with Python basics\r\n  * only a little scared of matrix multiplication (have learnt it at some point, even if they don't remember it well now)", "recording_license": "", "do_not_record": false, "persons": [{"guid": "30e51757-4ebd-5726-a2b3-f4d335e9e9e2", "id": 1401, "code": "9HF87X", "public_name": "Simon Cross", "avatar": "https://pretalx.com/media/avatars/3958e748e0704904fc94840c6872566b_drMnCV8.jpg", "biography": "Some potentially relevant facts about me:\r\n\r\n* I've lectured mathematics and MATLAB at the University of Cape Town.\r\n* I've worked in bioinformatics and radio astronomy.\r\n* I've published some academic papers (and even a couple in vaguely respectable journals!)\r\n* I have an undergraduate degree in physics and a masters degree in applied mathematics.\r\n* I accidentally wrote a lot of games in pygame.\r\n* In 2012 I started PyConZA (PyCon South Africa).\r\n* I have a Python Community Service award (woot!).\r\n* I currently lead a small data science team at a financial startup.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 2, "date": "2019-09-03", "day_start": "2019-09-03T04:00:00+00:00", "day_end": "2019-09-04T03:59:00+00:00", "rooms": {"Track 2 (Baroja)": [{"url": "https://pretalx.com/euroscipy-2019/talk/RHUPZ3/", "id": 2383, "guid": "da867013-9c9a-56f3-953b-69dcdc0449e8", "date": "2019-09-03T09:00:00+00:00", "start": "09:00", "logo": null, "duration": "01:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-2383-a-tour-of-the-data-visualization-ecosystem-of-python", "title": "A Tour of the Data Visualization Ecosystem of Python", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "The tutorial will be a a tour of the getting-started how-tos of the major Python data visualization  libraries such as Yt-Project, Seaborn, Altair, Plotly", "description": "Python and it ecosystem is used nowadays in many scientific context as an advanced data visualization tool.\r\nThere a wide variety of  visualization libraries. The tutorial will focus on primarly on :\r\n\r\n* [Yt](https://yt-project.org)\r\n* [Seaborn](https://seaborn.pydata.org)\r\n* [Altair](https://altair-viz.github.io)\r\n* [Plotly](https://plot.ly)\r\n\r\nFor each one it will be shown how to use it in Jupyter, exploring the getting started examples, and letting the audience propose data set to visualize.\r\nAt the end of the tutorial, the participants will fill a pros/cons table with an online voting mechanism.\r\nIf time will allow, a short view of other libraries may be included.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7a678b54-7837-5e49-8e39-6308c659f9a5", "id": 60, "code": "EUT7ME", "public_name": "Giovanni De Gasperis", "avatar": null, "biography": "Assistant Professor of Computer Systems and Cognitive Robotics\r\n\r\nPython user since 2001. Core developer of Python3.7-based TaLTaC italian tool for text mining", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/WSNPK7/", "id": 2615, "guid": "7484a98e-c3d7-5862-824f-11abe9177ff6", "date": "2019-09-03T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "01:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-2615-introduction-to-scipy", "title": "Introduction to SciPy", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "SciPy is a comprehensive library for scientific computing and one of the central components of the scientific Python ecosystem. As most of its functionality naturally involves NumPy arrays, SciPy works hand in hand with NumPy.", "description": "SciPy covers a broad variety of typical numerical tasks encountered in scientific computing ranging from the statistical analysis of data, curve fitting, and fast Fourier transform to numerical integration and special functions to name just a few topics. To avoid reinventing the wheel, it is always a good idea to check whether a desired functionality is already provided by SciPy.\r\n\r\nIn the main part of the tutorial, we will demonstrate how some real-world data taken with a smartphone can be analyzed by means of SciPy.\r\n\r\n#### Installation instructions\r\nThe tutorial requires the following packages on top of a Python 3 installation: \r\n\r\n* numpy\r\n* scipy\r\n* matplotlib\r\n* jupyter\r\n\r\nAny recent version of the [Anaconda distribution](https://anaconda.org) should allow to run the Jupyter notebooks used in this tutorial (see below) just fine. If you do not have the Anaconda distribution installed and are not short of disk space and want to do scientific work with Python, seriously consider installing it. It is free and pretty straightforward to install.\r\n\r\nAlternatively, you can install miniconda and build a specific environment `euroscipy-scipy-tutorial` for the tutorial by running\r\n\r\n```\r\nconda env create -f environment.yml\r\n```\r\n\r\nwith the `environment.yml` file provided in the [repository of this tutorial](https://github.com/gertingold/euroscipy-scipy-tutorial).  For more detailed instruction on how to create a conda environment, see the [conda documentation](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). Note that you need to activate the environment by means of\r\n\r\n```\r\nconda activate euroscipy-scipy-tutorial\r\n```\r\n\r\nFinally, it nothing else works, the notebooks can also be run on [binder](https://mybinder.org/v2/gh/gertingold/euroscipy-scipy-tutorial/master?filepath=notebooks) (provided wifi is available during the tutorial session).\r\n\r\n#### Get the tutorial notebooks\r\n\r\nUnless you are using binder, you will need the notebooks of the tutorial to actively follow along.\r\n\r\nYou can either clone the repository [gertingold/euroscipy-scipy-tutorial](https://github.com/gertingold/euroscipy-scipy-tutorial) or go to https://github.com/gertingold/euroscipy-scipy-tutorial/archive/master.zip to download a zipped version of the repository. All files needed during the tutorial are located in the directory `notebooks`.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "805d87a4-b6c5-5223-98eb-2a4826e4204d", "id": 71, "code": "978CAZ", "public_name": "Gert-Ludwig Ingold", "avatar": "https://pretalx.com/media/avatars/GLIngold.jpg", "biography": "Professor for theoretical physics", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/XXJGGG/", "id": 1407, "guid": "7e1c28fd-ffe0-5090-a635-bc0ee0593089", "date": "2019-09-03T14:00:00+00:00", "start": "14:00", "logo": null, "duration": "01:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1407-introduction-to-scikit-learn-from-model-fitting-to-model-interpretation", "title": "Introduction to scikit-learn: from model fitting to model interpretation", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "We will present scikit-learn by focusing on the available tools used to train a machine-learning model. Then, we will focus on the challenge linked to model interpretation and the available tools to understand these models.", "description": "Our introduction to scikit-learn will be subdivided into 2 parts.\r\n\r\nWe will give a general introduction to scikit-learn presenting basic concepts around cross-validation, pipeline estimator, and hyperparameter search.\r\n\r\nThen, we will focus on model interpretation presenting the challenges and the available tools to understand a trained machine-learning model: partial independence plot, features importance, LIME, shapley values, etc.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d6297904-fa91-50b7-9510-cc23c0cf9edb", "id": 55, "code": "KMDJAL", "public_name": "Guillaume Lemaitre", "avatar": "https://pretalx.com/media/avatars/guillaumelemaitre.jpg__200x200_q85_crop_subsampling-2_upscale_9Ptqss3.jpg", "biography": "I am an engineer working for the scikit-learn foundation @ Inria.", "answers": []}, {"guid": "91114ee9-3e12-54e9-8119-9813674ba951", "id": 1530, "code": "NEUMLP", "public_name": "Olivier Grisel", "avatar": "https://pretalx.com/media/avatars/ogrisel_portrait_870x550_PMry4Oq.jpg", "biography": "Olivier is a Software Engineer at Inria working on scikit-learn and related projects of the Python Data ecosystem.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track 3 (Oteiza)": [{"url": "https://pretalx.com/euroscipy-2019/talk/ZHQALW/", "id": 1803, "guid": "8789956b-565d-5465-b125-d9e6d17c3c70", "date": "2019-09-03T09:00:00+00:00", "start": "09:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1803-sufficiently-advanced-testing-with-hypothesis", "title": "Sufficiently Advanced Testing with Hypothesis", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "Testing research code can be difficult, but is essential for robust results.  Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex \"black box\" codes.", "description": "Hypothesis is a testing package that will search for counterexamples to your\r\nassertions \u2013 so you can write tests that provide a high-level description of your\r\ncode or system, and let the computer attempt a Popperian falsification. If it\r\nfails, your code is (probably) OK\u2026 and if it succeeds you have a minimal input\r\nto debug.\r\n\r\nCome along and learn the principles of property-based testing, how to use\r\nHypothesis, and how to use it to check scientific code \u2013 whether highly-polished\r\nor quick-and-dirty!\r\n\r\nYou can even use it to test 'black boxes', such as simulations, where we have no\r\nway of independently verifying that some input leads to the right output!\r\nIntrigued?  Come and learn about the power of embedding assertions in your\r\ncode, and metamorphic relations in your tests!", "recording_license": "", "do_not_record": false, "persons": [{"guid": "3f0779c7-230b-53b2-863c-6a3cff408151", "id": 1604, "code": "LFU8AS", "public_name": "Zac Hatfield-Dodds", "avatar": "https://pretalx.com/media/avatars/zac-profile_elxMt0n.jpg", "biography": "Zac is a researcher at the Australian National University\u2019s 3A Institute, which is building a new applied science to 'manage the machines' - AI, cyber-physical systems, and other new technologies.\r\n\r\nHe started using Python to analyse huge environmental datasets, and contributing to libraries like Xarray to make such analysis easier for all scientists.  Now, as a maintainer of Hypothesis, Pytest, and Trio, Zac is still passionate about making it easy to write software you can understand and rely on.\r\n\r\nWhen not at a computer he can usually be found surrounded by books of all kinds, the Australian bush, or both.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/M3RZXE/", "id": 1823, "guid": "425307d7-76c0-53ce-a408-984b7f02d254", "date": "2019-09-03T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1823-effectively-using-matplotlib", "title": "Effectively using matplotlib", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "It can sometimes be difficult and frustrating to know how to achieve a desired plot. \u2013 Have you made this experience as well? Then this tutorial is for you. It will make you more effective and help you generate better looking plots.", "description": "Matplotlib is one of the most-used and powerful visualization libraries for python. Nevertheless,  there has been and still is some confusion on how use it properly. This has a number of reasons ranging from an evolution of the API and lack of good documentation to the complexity that comes with the large feature set and flexibility. But these issues can be overcome.\r\n\r\nThis tutorial will explain the main concepts and intended usage patterns of matplotlib. Knowing these, lets you effectively use high-level functions for most of the cases. But you will be able to go into the details if you need to fine-tune certain aspects of the plot. We'll also touch some nowadays discouraged ways of working from the past (you should know what not to do - even though that's still found in lots of examples on the web) and we may get a glimpse into the future.\r\n\r\nTim Hoffmann joined the matplotlib core development team almost two years ago with the mission to make matplotlib easier to use.\r\n\r\n*Requirements and set up instructions:*\r\nJupyter plus any recent (>=3.0) matplotlib version will do. To be on the safe side, you may set up a new conda environment using `conda create -n using-mpl matplotlib>=3 jupyterlab pandas ipympl`.\r\nLink to tutorial notebook will be posted here soon.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b7178135-aeb4-58c6-9484-913dc118b78d", "id": 156, "code": "YWW9JH", "public_name": "Tim Hoffmann", "avatar": null, "biography": "Tim Hoffmann has been involved in several open source projects over time. Almost two years ago, he joined the matplotlib core development team with the mission to make matplotlib easier to use.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/NQMWSX/", "id": 1161, "guid": "9ed4ab2a-0ac0-5009-875b-209ba96e06f2", "date": "2019-09-03T14:00:00+00:00", "start": "14:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1161-cffi-ctypes-cython-cppyy-how-to-run-c-code-from-python", "title": "CFFI, Ctypes, Cython, Cppyy: how to run C code from Python", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "Python is flexible, C and C++ are fast. How to use them together? There are many ways to call C code from Python, we will learn about the major ones, find out when you would prefer to use one over the other.", "description": "Using the Jupyter notebook and a compiler, we will start with a pure python implementation of a mandlebrot image. Then we will write the computationally heavy part of the code in C, and learn how to call it from Ctypes (part of the Python standard library), CFFI (a newer and better Ctypes alternative), Cython (a compiler from Python to C), and CPPYY (like Ctypes and CFFI, but for C++).\r\n\r\nAlong the way we will stop to reflect on the advantages and disadvantages of each technique in terms of speed of development, runtime overhead, maintainability, and readability.\r\n\r\n\r\n\r\nThe participants will come away with an understanding of the tools, their strengths and weaknesses, and how to use them.\r\n\r\nPlease be sure you have a computer with anaconda python installed and a compiler (for windows users - Visual Studio 2019 is recommended. Others should have a functioniong gcc or clang). You should also download the [git repo](https://github.com/mattip/c_from_python) and be sure you can run the first few cells that involve compilation (before the `ctypes` discussion). Also please be sure to preinstall [`cppyy`](https://pypi.org/project/cppyy/).", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7e011e91-8d53-51d7-a3fe-bd87d772c045", "id": 63, "code": "UZVQTV", "public_name": "Matti Picus", "avatar": null, "biography": "Matti is a core developer of [PyPy](https://www.pypy.org), contributing to the internal numpy implementation _micronumpy and to the layer that allows python c-extension modules to run on the PyPy python interpreter. He has been active in the open source community both as a contributor, teacher, and presenter at conferences. Since April 2018, he works full-time developing [NumPy](http://www.numpy.org/), employed by [BIDS](https://bids.berkeley.edu/people/matti-picus)", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/HVEBGU/", "id": 1447, "guid": "a479fe08-5466-59b2-938f-c77446ecd625", "date": "2019-09-03T16:00:00+00:00", "start": "16:00", "logo": null, "duration": "01:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1447-kcsd-a-python-package-for-reconstruction-of-brain-activity", "title": "kCSD - a Python package for reconstruction of brain activity", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "_kCSD_ is a Python package for localization of sources of brain electric activity based on recorded electric potentials.", "description": "Electric potential measured in the brain is generated by transmembrane ionic currents of neural cells.  Due to the long range of electric field simultaneously recorded extracellular potential - EEG, local field potential (LFP) - at different places are typically strongly correlated which complicates their analysis.  It is thus useful to reconstruct their current sources which in practice means solving Poisson equation.  The first method for estimation of _Current Source Density_ (CSD) from measured potentials was proposed in the early 1950s (1).  Despite some developments, a number of limitations were present until recently, in particular, most previous methods required recordings with regular grids of electrodes and overfitted to noise.\r\n\r\nThe _kernel Current Source Density_ method (kCSD) developed in 2012 (2) uses kernel methods to estimate the potential and CSD in the whole space, from arbitrary distribution of electrodes using regularization to minimize the influence of noise on reconstruction.  In this tutorial we will demonstrate kCSD-python package (3) which allows reconstruction of CSD in different dimensions.\r\n\r\nAfter this tutorial you will be able to:\r\n* estimate the distribution of current sources based on the exact values of the electric field potentials,\r\n* deal with measurement noise,\r\n* diagnose the quality of the obtained reconstruction.\r\n\r\n# Requirements:\r\n* Python 2.7/3.4+ environment (Anaconda with Jupyter Notebook recommended),\r\n* numpy, scipy, matplotlib packages installed,\r\n* kcsd package installed or possibility to download it from GitHub (4) (network connection etc.).\r\n\r\n\r\n# Authors\r\n\r\n* Chaitanya Chintaluri,\r\n* Marta Kowalska,\r\n* Micha\u0142 Czerwi\u0144ski,\r\n* W\u0142adys\u0142aw \u015aredniawa,\r\n* Joanna J\u0119drzejewska-Szmek,\r\n* Daniel K. W\u00f3jcik\r\n\r\n# Bibliography\r\n\r\n1. Pitts, W. H. (1952), _Investigations on synaptic transmission_, in 'Cybernetics, Trans. 9th Conf. Josiah Macy Foundation H. von Foerster', pp. 159-166.\r\n2. Potworowski, J., Jakuczun, W., \u0141\u0119ski, S. & W\u00f3jcik, D. (2012) _Kernel current source density method_. Neural Comput 24(2), 541-575.\r\n3. _Kernel Current Source Density_ <https://github.com/Neuroinflab/kCSD-python>\r\n\r\n# Acknowledgement\r\n\r\nProject funded from the Polish National Science Centre's SYMFONIA (2013/08/W/NZ4/00691) and OPUS (2015/17/B/ST7/04123) grants.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "16ced360-f474-5475-b640-77435989a3c8", "id": 1505, "code": "APW3XT", "public_name": "Marta Kowalska", "avatar": "https://pretalx.com/media/avatars/MartaKowalska.jpg", "biography": "I am a PhD student at the Laboratory of Neuroinformatics at Nencki Institute of Experimental Biology. I work with methods for current source density reconstruction in a brain tissue.", "answers": []}, {"guid": "1f576562-fd3c-50c5-8781-e3f3795c584f", "id": 1512, "code": "3XH3MM", "public_name": "Jakub M. Dzik", "avatar": "https://pretalx.com/media/avatars/IMG_0187-US_Wiza_format_cyfrowy-900x900_px.jpg", "biography": "Since 2011 I am a Scientific Programmer in Laboratory of Neuroinformatics (Nencki Institute).\r\n\r\n#Education\r\n\r\n* MSc in Computer Science (2011; University of Wroclaw)\r\n* PhD in Neuroinformatics (2019; Nancki Institute)", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track4 (Chillida)": [{"url": "https://pretalx.com/euroscipy-2019/talk/YKPNEE/", "id": 1433, "guid": "f9b887dc-d518-51ee-89be-ccb4f2ae51c1", "date": "2019-09-03T09:00:00+00:00", "start": "09:00", "logo": null, "duration": "01:30", "room": "Track4 (Chillida)", "slug": "euroscipy-2019-1433-introduction-to-geospatial-data-analysis-with-geopandas-and-the-pydata-stack", "title": "Introduction to geospatial data analysis with GeoPandas and the PyData stack", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "This tutorial is an introduction to geospatial data analysis, with a focus on tabular vector data using GeoPandas. It will show how GeoPandas and related libraries can improve your GIS workflow and fit nicely in the traditional PyData stack.", "description": "This tutorial is an introduction to geospatial data analysis in Python, with a focus on tabular vector data using GeoPandas. The content focuses on introducing the participants to the different libraries to work with geospatial data and will cover munging geo-data and exploring relations over space. This includes importing data in different formats (e.g. shapefile, GeoJSON), visualizing, combining and tidying them up for analysis, and will use libraries such as pandas, geopandas, shapely, pyproj, matplotlib, cartopy, ... The tutorial will cover the following topics, each of them using Jupyter notebooks and hands-on exercises with real-world data:\r\n\r\n    1. Introduction to vector data and GeoPandas\r\n    2. Visualizing geospatial data\r\n    3. Spatial relationships and operations\r\n    4. Spatial joins and overlays\r\n\r\nMaterials of previous versions of this tutorial: https://github.com/jorisvandenbossche/geopandas-tutorial", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7e876587-827f-57eb-8ec2-ba1bbb58a7f3", "id": 75, "code": "7VUXWM", "public_name": "Joris Van den Bossche", "avatar": "https://pretalx.com/media/avatars/profile_Rc56sfi.png", "biography": "I am a core contributor to Pandas and maintainer of GeoPandas. I have given several tutorials at international conferences and a course on python for data analysis for PhD students at Ghent University. I did a PhD at Ghent University and VITO in air quality research, worked at the Paris-Saclay Center for Data Science, and, currently I am a freelance software developer and teacher.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/SMLGVL/", "id": 2667, "guid": "c7953afc-db52-5e30-b7a9-000e2d060392", "date": "2019-09-03T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "01:30", "room": "Track4 (Chillida)", "slug": "euroscipy-2019-2667-astronomical-image-processing", "title": "Astronomical Image Processing", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "This tutorial will introduce the concept of *sparsity* and demonstrate how it can be used to remove noise from signals. These concepts will then be expanded to demonstrate how noise can be removed from astronomical images in particular.", "description": "### Programme\r\n\r\n- The tutorial will begin with short introduction to the basic premise of sparsity and highlight some problems in astronomical image processing that can be solved using this methodology.  (~15-20min; slides)\r\n- Tutees will then follow a hands-on demonstration of how the concept of sparsity can be used to denoise signals. (~30-35min; interactive jupyter notebook with exercises)\r\n- Finally the tutees will learn how to denoise an astronomical image and use their newfound skills to recover a nice picture of Saturn. (~35-40min; interactive jupyter notebook with  an exercise)\r\n\r\n### Requirements\r\n\r\n- The tutorial contents are available on [GitHub](https://github.com/sfarrens/euroscipy).\r\n- Provided tutees have a stable internet connection, the entire tutorial can be run online using [Binder](https://mybinder.org/v2/gh/sfarrens/euroscipy/master).\r\n- However, to be safe, tutees should download and install the tutorial materials beforehand.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "84a6bf8e-430c-53ac-8e53-ab1f878d0bc6", "id": 232, "code": "VFV3HW", "public_name": "Samuel FARRENS", "avatar": "https://pretalx.com/media/avatars/office_wave_reoyMY9.jpeg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/CQCKY9/", "id": 1295, "guid": "50a9bb38-12ad-55c9-8c58-f2f3aaa1c280", "date": "2019-09-03T14:00:00+00:00", "start": "14:00", "logo": null, "duration": "01:30", "room": "Track4 (Chillida)", "slug": "euroscipy-2019-1295-parallelizing-python-applications-with-pycompss", "title": "Parallelizing Python applications with PyCOMPSs", "subtitle": "", "track": null, "type": "Tutorial", "language": "en", "abstract": "PyCOMPSs is a task-based programming model that enables the parallel execution of Python scripts by annotating methods with task decorators. At run time, it identifies tasks' data-dependencies, schedules and executes them in distributed environments.", "description": "## PyCOMPSs!\r\n\r\nCOMPSs is a **task-based programming model that aims to ease the development of parallel applications and their execution in distributed computing environments**, which provides a binding for Python (aka **PyCOMPSs**). It is based on sequential programming, which helps application developers on parallelization and distribution efforts (e.g. thread/process creation, synchronization, data movements, etc.). Application developers simply need to identify which methods will be considered tasks, and the runtime exploits the inherent parallelism of the application at execution time by detecting the task calls and the data dependencies among them. To this end, the runtime is able to spawn the tasks asynchronously on the available resources and orchestrate their data transfers guaranteeing the validity of the execution.\r\n\r\nPyCOMPSs relies on the usage of decorators for task selection and a tiny API for synchronization. Moreover, it has also integration with Jupyter notebooks, and provides a wide range of supported features, such as task constraint definition, multiple implementations (so that the runtime can choose the most appropriate considering the available resources), and binary tasks (e.g. binary, MPI and OmpSs) among others.\r\n\r\nIn addition, PyCOMPSs' runtime enables to run the applications on top of different infrastructures (such as multi-core machines, clusters, grids, clouds or containers) without modifying a single line of the application. It also provides fault-tolerant mechanisms, a live monitoring tool, it is able to generate post-mortem performance traces using Extrae that can be later analyzed with Paraver, and it is extendible through pluggable connectors (e.g. clouds and schedulers).\r\n\r\nThis rich number of features enables the quick and easy parallelization of Python code, its execution in distributed environments and performance analysis, with current success in scientific fields like numeric algorithms, AI, life and earth sciences.\r\n\r\nThis tutorial has as main objective to instruct **how to program and decorate Python applications using PyCOMPSs** in order to enable them **to run in parallel**.\r\nMore in detail, the tutorial objectives are:\r\n\r\n* To give an overview of PyCOMPSs task-based programming model syntax.\r\n* To demonstrate how to use PyCOMPSs to parallelize and run applications in distributed platforms.\r\n* To illustrate how sample benchmarks from linear algebra and big data can benefit of PyCOMPSs as a programming model. Also, from real use cases from AI, Life and Earth sciences.\r\n* To give practical insight of how to use PyCOMPSs programming model with the Jupyter notebook.\r\n* To give an overview of the PyCOMPSs runtime and how it interacts with clusters, clusters of docker containers and clouds.\r\n\r\n**The attendees will benefit by learning how to parallelize their Python application with PyCOMPSs with a simple interface, run them in distributed parallel platforms, the integration with Jupyter notebooks, and how to analyze the execution behaviour.**\r\n\r\n#### Requirements and setup instructions\r\n\r\nThis tutorial can be followed using a virtual machine or using a docker container. Attendees can choose the best option considering their system.\r\n\r\n- Using Virtual Appliance:\r\n    - Install VirtualBox\r\n    - Download and import the COMPSs 2.5 VM image from http://compss.bsc.es (Downloads section)\r\n    - Import the VM image\r\n    - Start the VM image (user: compss password: compss19)\r\n    - Update the tutorial apps folder: rm -rf tutorial_apps && git clone https://github.com/bsc-wdc/tutorial_apps.git\r\n\r\n\r\n- Using Docker:\r\n    - Install docker\r\n    - git clone https://github.com/bsc-wdc/tutorial_apps.git\r\n    - docker pull compss/compss-tutorial:patc2019\r\n    - docker run --name mycompss -p 8888:8888 -p 8080:8080 -v /path/to/tutorial_apps:/home/tutorial_apps -itd compss/compss-tutorial:patc2019", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b3b89d95-37f4-5755-ae3d-3efe06bc1ed9", "id": 1414, "code": "U3JBJX", "public_name": "Javier Conejero", "avatar": "https://pretalx.com/media/avatars/FJ.jpg", "biography": "Javier Conejero is a Senior Researcher at the Barcelona Supercomputing Center. He holds a PhD on\r\nAdvanced Computer Technologies (2014) from the University of Castilla-La Mancha (UCLM), Spain.\r\nDuring his PhD, he was awarded by the Ministry of Economy and Competitiveness (MINECO) of the\r\nSpanish Government with a FPI fellowship grant. Previously, he worked at CERN for one year\r\n(2009) into WLCG software development and management. Since 2015, he is a Senior Researcher\r\nof the Workflows and Distributed Computing research group at the Barcelona Supercomputing\r\nCenter (BSC). He is leading the efforts on the PyCOMPSs binding at BSC. In 2016 he was awarded\r\nby the MINECO with the Juan de la Cierva grant.\r\n\r\nJavier lectured and ran practical exercises on PyCOMPSs development within the PATC:\r\nProgramming Distributed Computing Platforms with COMPSs tutorial annually since 2016. He has\r\nalso participated in PyCOMPSs tutorials in various conferences and workshops: EuroPython 2017,\r\nCCGrid 2017, EuroPar2017 and SIAM 2018.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 3, "date": "2019-09-04", "day_start": "2019-09-04T04:00:00+00:00", "day_end": "2019-09-05T03:59:00+00:00", "rooms": {"Track 1 (Mitxelena)": [{"url": "https://pretalx.com/euroscipy-2019/talk/H8VPAY/", "id": 2637, "guid": "48a02219-1f7f-51f0-b49f-e863a84a1822", "date": "2019-09-04T10:15:00+00:00", "start": "10:15", "logo": null, "duration": "00:45", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-2637-from-galaxies-to-brains-image-processing-with-python", "title": "From Galaxies to Brains! - Image processing with Python", "subtitle": "", "track": null, "type": "Keynote", "language": "en", "abstract": "From the smallest microscopic objects to the largest scales of the Universe, our ability to study the world around us is predicated on the quality of the data we have access to.", "description": "From the smallest microscopic objects to the largest scales of the Universe, our ability to study the world around us is predicated on the quality of the data we have access to. In other words, cleaner and higher resolution images will provide us with more detailed and accurate information. Obtaining the necessary image quality, however, is extremely difficult, particularly as we push instruments to their limits and have to deal with larger and larger amounts of data. In this talk I will introduce some of the current challenges in the realms of astrophysical and biomedical imaging. I will then present some interesting new ideas for tackling these problems and how Python facilitates their implementation.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "84a6bf8e-430c-53ac-8e53-ab1f878d0bc6", "id": 232, "code": "VFV3HW", "public_name": "Samuel FARRENS", "avatar": "https://pretalx.com/media/avatars/office_wave_reoyMY9.jpeg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/9DPFGM/", "id": 1405, "guid": "4d3e73ed-a74b-515f-bebd-d25ac522ebe9", "date": "2019-09-04T11:30:00+00:00", "start": "11:30", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1405-distributed-gpu-computing-with-dask", "title": "Distributed GPU Computing with Dask", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Dask has evolved over the last year to leverage multi-GPU computing alongside its existing CPU support. We present how this is possible with the use of NumPy-like libraries and how to get started writing distributed GPU software.", "description": "The need for speed remains important for scientific computing. Historically, computers were limited to few dozens of processors, but with modern GPUs, we can have thousands, or even millions of cores running in parallel on distributed systems.\r\n\r\nHowever, developing software for distributed GPU systems can be difficult, both because writing GPU code can be challenging for non-experts, and because distributed systems are inherently complex. We can work to address these challenges by using GPU-enabled libraries that mimic parts of the SciPy ecosystem, such as CuPy, RAPIDS, and Numba, abstracting GPU programming complexity, combined with Dask to abstract distributed computing complexity.\r\n\r\nWe talk about how Dask has come a long way to support distributed GPU-enabled systems by leveraging community standards and protocols, reusing open source libraries for GPU computing, and keeping it simple and complication-free to build highly-configurable accelerated distributed software.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "df7927ae-8348-5f57-938c-4f4c2ddf0a36", "id": 1342, "code": "N9CZU7", "public_name": "Peter Andreas Entschev", "avatar": "https://pretalx.com/media/avatars/DSC_0354.JPG", "biography": "Peter Andreas Entschev is a senior system software engineer in the AI Infrastructure group at NVIDIA, where he works on the RAPIDS stack, building GPU-enabled distributed software. Before NVIDIA, he worked on real-time computer vision systems for various applications. He holds an MSc in electrical engineering and applied computer science from the Federal University of Technology - Paran\u00e1, Brazil.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/YRJNR8/", "id": 1440, "guid": "4fa0bca8-5f31-5c38-865c-5e15b6fd9865", "date": "2019-09-04T12:00:00+00:00", "start": "12:00", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1440-modern-data-science-a-new-approach-to-dataframes-and-pipelines", "title": "Modern Data Science: A new approach to DataFrames and pipelines", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "We will demonstrate how to explore and analyse massive datasets (>150GB) on a laptop with the Vaex library in Python. Using computational graphs, efficient algorithms and storage (Apache Arrow / hdf5) Vaex can easily handle up to a billion rows.", "description": "Working with datasets comprising millions or billions of samples is an increasingly common task, one that is typically tackled with distributed computing. Nodes in high-performance computing clusters have enough RAM to run intensive and well-tested data analysis workflows. More often than not, however, this is preceded by the scientific process of cleaning, filtering, grouping, and other transformations of the data, through continuous visualizations and correlation analysis. In today\u2019s work environments, many data scientists prefer to do this on their laptops or workstations, as to more effectively use their time and not to rely on spotty internet connection to access their remote data and computation resources. Modern laptops have sufficiently fast I/O SSD storage, but upgrading RAM is expensive or impossible. \r\n\r\nApplying the combined benefits of computational graphs, which are common in neural network libraries, with delayed (a.k.a lazy) evaluations to a DataFrame library enables efficient memory and CPU usage. Together with memory-mapped storage (Apache Arrow, hdf5) and out-of-core algorithms, we can process considerably larger data sets with fewer resources. As an added bonus, the computational graphs \u2018remember\u2019 all operations applied to a DataFrame, meaning that data processing pipelines can be generated automatically.\r\n\r\nIn this talk, we will demonstrate Vaex, an open-source DataFrame library that embodies these concepts. Using data from the New York City YellowCab taxi service comprising 1.1 billion samples and taking up over 170 GB on disk, we will showcase how one can conduct an exploratory data analysis, complete with filtering, grouping, calculations of statistics and interactive visualisations on a single laptop in real time. Finally we will show an example of how one can automatically build a machine learning pipeline as a by-product of the exploratory data analysis using the computational graphs in Vaex.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "53c7e8ce-8a0f-5771-aeb3-df7936e16b4b", "id": 1506, "code": "FTP8ZM", "public_name": "Jovan Veljanoski", "avatar": "https://pretalx.com/media/avatars/ebdf4cae49cc296078aab1615b40d6c1_jNsUPF5.jpg", "biography": "Jovan is a senior data scientists & researcher at XebiaLabs, where he creates predictive models related to DevOps pipelines. Working mostly with Python in the Jupyter ecosystem, he has considerable experience in clustering analysis and predictive modeling. Jovan has a PhD in Astrophysics, is a co-founder of vaex.io, and is interested in novel machine learning technologies and applications.", "answers": []}, {"guid": "5ec0270e-50b8-55b2-96ab-ccef2e5d93a2", "id": 1510, "code": "38EYUA", "public_name": "Maarten Breddels", "avatar": "https://pretalx.com/media/avatars/DSC_0701_bright_small.jpeg", "biography": "Maarten Breddels is an entrepreneur and freelance developer/consultant/data scientist working mostly with Python, C++ and Javascript in the Jupyter ecosystem. Creator of ipyvolume and vaex, founder of vaex.io. His expertise ranges from fast numerical computation, API design, to 3d visualization. He has a Bachelor in ICT, a Master and PhD in Astronomy, likes to code and solve problems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/KZGLXR/", "id": 1443, "guid": "c22db99f-e698-5d02-93c1-210bb39bd635", "date": "2019-09-04T14:45:00+00:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1443-apache-arrow-a-cross-language-development-platform-for-in-memory-data", "title": "Apache Arrow: a cross-language development platform for in-memory data", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Apache Arrow, defining a columnar, in-memory data format standard and communication protocols, provides a cross-language development platform with already several applications in the PyData ecosystem.", "description": "This talk discusses Apache Arrow project and how it already interacts with the Python ecosystem.\r\n\r\nThe Apache Arrow project specifies a standardized language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware. On top of that standard, it provides computational libraries and zero-copy streaming messaging and interprocess communication protocols, and as such, it provides a cross-language development platform for in-memory data. It has support for many languages, including C, C++, Java, JavaScript, MATLAB, Python, R, Rust, ..\r\n\r\nThe Apache Arrow project, although still in active development, has already several applications in the Python ecosystem. For example, it provides the IO functionality for pandas to read the Parquet format (a columnar, binary file format used a lot in the Hadoop ecosystem). Thanks to the standard memory format, it can help improve interoperability between systems, and this is already seen in practice for the Spark / Python interface, by increasing the performance of PySpark. Further, it has the potential to provide a more performant string data type and nested data types (like dicts or lists) for Pandas dataframes, which is already being experimented with in the fletcher package (using the pandas ExtensionArray interface).", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7e876587-827f-57eb-8ec2-ba1bbb58a7f3", "id": 75, "code": "7VUXWM", "public_name": "Joris Van den Bossche", "avatar": "https://pretalx.com/media/avatars/profile_Rc56sfi.png", "biography": "I am a core contributor to Pandas and maintainer of GeoPandas. I have given several tutorials at international conferences and a course on python for data analysis for PhD students at Ghent University. I did a PhD at Ghent University and VITO in air quality research, worked at the Paris-Saclay Center for Data Science, and, currently I am a freelance software developer and teacher.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/BLPA7N/", "id": 1412, "guid": "31e9ceff-6dd1-5823-a95e-864c618a8f2c", "date": "2019-09-04T15:15:00+00:00", "start": "15:15", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1412-caterva-a-compressed-and-multidimensional-container-for-big-data", "title": "Caterva: A Compressed And Multidimensional Container For Big Data", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Caterva is a library on top of the Blosc2 compressor that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk.", "description": "# Caterva: A Compressed And Multidimensional Container For Big Data\r\n\r\n[Caterva](https://github.com/Blosc/Caterva) is a C library on top of [C-Blosc2](https://github.com/Blosc/c-blosc2) that implements a simple multidimensional container for compressed binary data. It adds the capability to store, extract, and transform data in these containers, either in-memory or on-disk.\r\n\r\nWhile there are several existing solutions for this scenario (HDF5 is one of the most known), Caterva brings novel features that, when taken toghether, set it appart from them:\r\n\r\n* __Leverage important features of C-Blosc2__.  C-Blosc2 is the next generation of the well-know, high performance C-Blosc compression library (see below for a more in-depth description). \r\n\r\n* __Fast and seamless interface with the compression engine__.  While in other solutions compression seems an after-thought and can implies several copies of buffers internally, the interface of Caterva and C-Blosc2 (its internal compression engine) is meant to be as direct as possible minimizing copies and hence, increasing performance.\r\n\r\n* __Both in-memory and on-disk paradigms are supported the same way__.  This allows for using the same API for data that can be either in-memory or on-disk.\r\n\r\n* __Support for a plain buffer data layout__.  This allows for essentially no-copy data sharing among existing libraries (NumPy), allowing to use existing functionality to be used directly in Caterva without loosing performance.  \r\n\r\nAlong this features, there is an important 'mis-feature': Caterva is __type-less__.  Lacking the notion of data type means that Caterva containers are not meant to be used in computations directly, but rather in combination with other higher-level libraries.  While this can be seen as a drawback, it actually favors simplicity and leaves up to the user the addition of the types that he is more interested in, which is far more flexible than typed-aware libraries (HDF5, NumPy and many others).\r\n\r\nDuring our talk, we will describe all these Caterva features by using [cat4py](https://github.com/Blosc/cat4py), a Python wrapper for Caterva.  Among the points to be discussed would be:\r\n\r\n* Introduction to the main features of Caterva.\r\n\r\n* Description of the basic data container and its usage.\r\n\r\n* Short discussion of different use cases:\r\n\r\n  * Create and fill high dimensional arrays.\r\n  * Get multi-dimensional slices out of the arrays.\r\n  * How different compression codecs and filters in the pipeline affect store/retrieval performance.\r\n  \r\nWe have been using Caterva in one of our internal projects for several months now, and we are pretty happy with the flexibility and easy-of-use that it brings to us.  This is why we decided to open-source it in the hope that it would benefit others, but also that others may help us in developing it further ;-)\r\n\r\n## About C-Blosc and C-Blosc2\r\n[C-Blosc](https://github.com/Blosc/c-blosc) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that we are aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate memory-bound computations.\r\n\r\n[C-Blosc2](https://github.com/Blosc/c-blosc2) is the new major version of C-Blosc, with a revamped API and support for new compressors and new filters (data transformations), including filter pipelining, that is, the capability to apply different filters during the compression pipeline, allowing for more adaptability to the data to be compressed.  Dictionaries are also introduced, allowing better handling of redundancies among independent blocks and generally increasing compression ratio and performance.  Last but not least, there are new data containers that are meant to overcome the 32-bit limitation of the original C-Blosc. Furthermore, the new data containers are available in various formats, including in-memory and on-disk implementations.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "3e6a29ad-c337-5c06-96de-a34f3494d390", "id": 1492, "code": "QNMMGN", "public_name": "Francesc Alted", "avatar": "https://pretalx.com/media/avatars/65868d36f26f237938997dd28c2b2453_vYeukrV.jpg", "biography": "After more than a decade working in developing different Data Oriented libraries (PyTables, Blosc, bcolz), and High Performance Computing (numexpr) I am offering consulting and developer services for all the skills that I have cumulated through the years. I can also act as a teacher in Python and data handling; my courses can be tailored to the needs of the customer. \r\n\r\nI am also an open source developer and highly interested in Data Oriented Programming. Most of my current work in this area happens at Blosc2 (https://github.com/Blosc/c-blosc2), and Caterva (https://github.com/Blosc/Caterva) with some maintenance work on existing PyTables and Blosc packages.\r\n\r\nAreas of expertise: C and Python programming, compression, large databases, optimization, SQL, NoSQL.\r\n\r\nFormal resum\u00e9: http://www.blosc.org/pages/francesc-alted-resume/", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/H3DRAV/", "id": 2582, "guid": "57d214ce-3844-55c3-8f16-dd6592f6aa59", "date": "2019-09-04T15:45:00+00:00", "start": "15:45", "logo": null, "duration": "00:15", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-2582-modin-scaling-the-capabilities-of-the-data-scientist-not-the-machine", "title": "Modin: Scaling the Capabilities of the Data Scientist, not the machine", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "Modern data systems tend to heavily focus on optimizing for the system\u2019s time. In this talk, we discuss the design of Modin, a DataFrame library, and how to optimize for the human system.", "description": "Modern data systems tend to heavily focus on optimizing for the system\u2019s time. Some of these optimizations, however, are counterproductive to the end user\u2019s workflow and thought process. In this talk, we discuss the design of Modin, a DataFrame library, and how to optimize for the human system.\r\n\r\nModin is a project at UC Berkeley's RISELab designed to optimize for the data scientist\u2019s time. Often when building a data system, the system designers will follow a set of \u201cbest practices\u201d in order to optimize performance. These \u201cbest practices\u201d often require data scientists to understand and personally optimize concepts and system components that are not central to extracting value from their data.\r\n\r\nThe fundamental goal of data science is to extract value from data. Despite this, data systems are being built with user requirements such as: (1) knowledge of partitioning, (2) understanding laziness and what triggers computation, (3) an entirely new API, and (4) where their code is running (e.g. locally, on-prem cluster, cloud). This overhead is passed to the data scientist, even though there is no overlap between these new requirements and the fundamental goal of their profession.\r\n\r\nIn this talk, we will discuss how we think about the problem of large scale data science and optimizing for the human system. We will discuss the system design of Modin, which enables pluggable backends, runtimes, and APIs. The system is designed to solve the needs of the data science community regardless of an individual user\u2019s environment. Currently, Modin supports the pandas API, and a proof of concept for SQL has been implemented. Modin is completely open-source and can be found on GitHub: https://github.com/modin-project/modin.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b4455655-6675-5217-ac86-e9ebe4b7f24c", "id": 2380, "code": "PVZBRE", "public_name": "Devin Petersohn", "avatar": null, "biography": "Devin Petersohn", "answers": []}, {"guid": "449ab546-a9c0-5c8f-acad-e9ddaebb6af8", "id": 2381, "code": "QCHQEZ", "public_name": "Devin Petersohn", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/XBGYZB/", "id": 1420, "guid": "0c6bed7a-a8ca-5be8-86c9-499c52bbc857", "date": "2019-09-04T16:30:00+00:00", "start": "16:30", "logo": null, "duration": "00:15", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1420-best-coding-practices-in-jupyterlab", "title": "Best Coding Practices in Jupyterlab", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. In this talks I will present some best practices to make code more readable, better to maintain and re-usable.", "description": "Jupyter notebooks are often a mess. The code produced is working for one notebook, but it's hard to maintain or to re-use. \r\nIn this talks I will present some best practices to make code more readable, better to maintain and re-usable.\r\n\r\nThis will include:\r\n- versioning best practices\r\n- how to use submodules\r\n- coding methods to avoid (e.g. closures)", "recording_license": "", "do_not_record": false, "persons": [{"guid": "df0f8c22-edcd-5284-8a9b-119ca1de8d97", "id": 3, "code": "78ZETH", "public_name": "Alexander CS Hendorf", "avatar": "https://pretalx.com/media/avatars/Alexander_Hendorf_square_MRk6WO4.jpg", "biography": "Alexander' professional career was always about digitalization: starting from vinyl records in the nineties to advanced data analytics nowadays. He's  a Python Software Foundation fellow, program chair of Europe's main Python conference EuroPython, PyConDE and the scientific Python conference EuroSciPy. He\u2019s one of the 25 mongoDB masters and a regular contributor to the tech community. As regular speaker at international conferences in he love to talk about, discuss and train tech.\r\nBeing a partner at [K\u00f6nigsweg][1] - a boutique consultancy based in Mannheim, Germany - he's advising and training industry clients in Ai, data science and big data matters.\r\n\r\n  [1]: https://www.koenigsweg.com", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/UHMWGH/", "id": 1767, "guid": "510a0638-5d2e-5169-aac3-0ac3a24ee12f", "date": "2019-09-04T16:45:00+00:00", "start": "16:45", "logo": null, "duration": "00:15", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1767-lessons-learned-from-comparing-numba-cuda-and-c-cuda", "title": "Lessons learned from comparing Numba-CUDA and C-CUDA", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "We compared the performance of GPU-Applications written in C-CUDA and Numba-CUDA. By analyzing the GPU assembly code, we learned about the reasons for the differences. This helped us to optimize our codes written in NUMBA-CUDA and NUMBA itself.", "description": "Numba allows the development of GPU  code in Python style.  When a Python script using Numba is executed, the code is compiled just-in-time (JIT)  using the LLVM framework. Using Python for GPU programming can mean a considerable simplification in the development of parallel applications compared to C and C-CUDA.\r\n\r\nPython, however, has to live with the prejudice of low performance, especially in HighPerformance Computing.\r\nWe wanted to get to the bottom of whether this is really true and where these differences come from. For this reason, we first analyzed the performance of typical micro benchmarks used in HPC. By analyzing the assembly codes, we learned a lot about the difference between codes produced by C-CUDA and NUMBA-CUDA.  Some of these insights have helped us to improve the performance of our application - and also of Numba-CUDA. With a few tricks it is possible to achieve very good performance with our Numba-Codes, which are very close - or sometimes even better than the C-CUDA versions.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d3da3c80-48df-5178-80db-91897aaabb2e", "id": 1751, "code": "NJFPZ3", "public_name": "Lena Oden", "avatar": null, "biography": "Lena Oden recently became a Junior Professor for Computer Architecture at the FernUniversit\u00e4t Hagen. Before that, she worked as a postdoctoral researcher at the Forschungszentrum J\u00fclich and at Argonne National Laboratory in the USA. She received her PhD in Computer Science from the Ruprecht-Karls-Universit\u00e4t Heidelberg and a Diploma in Electrical Engineering from RWTH Aachen. During her PhD, she worked at the Fraunhofer Institute for Industrial Mathematics. Her main research areas are Computer Architectures and Runtime Systems for HPC.\r\nHer interest in Python started when she worked with people  from other scientific areas. She likes the simplicity of Python, and started to use it as her main programming language for teaching parallel programming. \r\nShe is interested in improving the performance of Python, to make it more usable in HPC.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track 2 (Baroja)": [{"url": "https://pretalx.com/euroscipy-2019/talk/YU8EML/", "id": 2679, "guid": "ece1093a-1dcb-5acc-b821-80db57b25eb2", "date": "2019-09-04T11:30:00+00:00", "start": "11:30", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-2679-how-a-voice-assistant-works", "title": "How a voice assistant works", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "This talk will focus on the technologies needed to build a voice assistant. It will keep as center point Samsung\u2019s voice assistant Bixby, which is available in 8 languages across the world (5 EU languages) in a variety of Samsung mobile phones.", "description": "This talk will focus on the technologies needed to build a voice assistant. It will keep as center point Samsung\u2019s voice assistant Bixby, which is available in 8 languages across the world (5 EU languages) in a variety of Samsung mobile phones.\r\nFirst an overview of the needed infrastructure and the challenges regarding user education will be presented.\r\nThen, the talk will offer an overview of the technologies needed in a voice assistant:\r\n1.       Automatic Speech Recognition: how a sound wave is transcribed into words\r\n2.       Natural Language Understanding: extraction of meaning from a sentence\r\n3.       Natural Language Generation: response generation\r\n4.       Text To Speech: speech synthesis\r\nDuring the talk the new Bixby IDE will also be presented, with which any developer can create a \u201cvoice capsule\u201d that processes natural language to send/retrieve information from their API.\r\nBixby Developers site: https://bixbydevelopers.com/", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b6f1474b-0fea-54b6-849b-5f01997d2ab6", "id": 2439, "code": "3BFXHU", "public_name": "Miren Urteaga Aldalur", "avatar": "https://pretalx.com/media/avatars/231f49d5298344283e2785498fc34447_EkOImKM.jpg", "biography": "todo", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/JJCQQJ/", "id": 1311, "guid": "00202586-2ba2-50a1-8500-cd616c60d6e4", "date": "2019-09-04T12:00:00+00:00", "start": "12:00", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1311-qutip-the-quantum-toolbox-in-python-as-an-ecosystem-for-quantum-physics-exploration-and-quantum-information-science", "title": "QuTiP: the quantum toolbox in Python as an ecosystem for quantum physics exploration and quantum information science", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "In this talk you will learn how QuTiP, the quantum toolbox in Python (http://qutip.org), has emerged from a library to an *ecosystem*. QuTiP is used for education, to teach quantum physics. In research and industry, for quantum computing simulation.", "description": "QuTiP is emerging as a library at the center of a lively ecosystem. In this talk you will learn about the ongoing projects that have invested this project, from providing the framework to simulate quantum machine learning for quantum computers to the development of efficient numerical solvers tackling dynamical problems that are inherently hard to simulate classically. \r\n\r\nIt can be noted that [Astropy](https://www.astropy.org/affiliated/index.html) is a community effort to develop a common core package for Astronomy in Python and \"foster an ecosystem of interoperable astronomy packages\", \r\n\r\nIt seems an interesting model for the quantum tech landscape. [Qiskit]() did build its own ecosystem of sub-libraries for quantum computing. The physics library for quantum tech is http://qutip.org .\r\n\r\nAbout the idea of QuTiP as a super-library, here are some details:\r\n\r\n- `krotov`, a very recent package for optimal control built on top of QuTiP ( https://arxiv.org/abs/1902.11284). \r\n[https://github.com/qucontrol/krotov].\r\n\r\n- `piqs`, the permutational invariant quantum solver, now a QuTiP module (see also https://arxiv.org/abs/1805.05129 );\r\n\r\n- `matsubara`, a plugin to study the ultrastrong coupling regime with structured baths,  http://matsubara.readthedocs.io/ \r\n\r\n- `QNET`, a computer algebra package for quantum mechanics and photonic quantum networks, which actually calls QuTiP as a plugin, mainly developed at Stanford in Mabuchi Lab https://github.com/mabuchilab/QNET \r\n\r\n- `qptomographer`,  https://qptomographer.readthedocs.io/en/latest/install, a library to derive error bars for experiments in quantum computing and quantum information processing. \r\n\r\n- `tiqs`, a library to study open quantum systems on extended lattices exploiting the symmetries of such systems, https://github.com/fminga/tiqs \r\n\r\n- other upcoming integrations relative to pulse control, such as `qupulse`, https://github.com/qutech/qupulse/wiki/Architecture-Proposal\r\n\r\nThis talk will be of interest to the curious coder and researcher, analyzing how QuTiP's impact in the research community has fostered a [*lingua franca* for quantum tech research](https://twitter.com/goerz/status/1118739088595652611). We will also draw comparisons with other larger ecosystems in Python-based scientific projects, such as astropy and scikit-learn.  \r\n\r\n# More about QuTiP\r\n- QuTiP is the open-source software to study quantum physics. It develops both an intuitive playground to understand quantum mechanics and cutting-edge tools to investigate it. \r\n- QuTiP provides the most comprehensive toolbox to characterize noise and dissipation \u2013realistic processes\u2013 affecting quantum systems, as well as tools not only to monitor but also to minimize their impact (quantum optimal control, description of decoherence-free spaces). \r\n- For this reason QuTiP is a software born out of the quantum optics community and that has become increasingly relevant for the quantum computing community, as current quantum computing devices are noisy (NISQ definition by Preskill).  \r\n- `pypinfo` data shows that QuTiP is popular in countries that are strong in quantum tech and quantum computing research, eg, The Netherlands in the top five, as well as countries that benefit in the use of open source software (OSS) for university coursework, eg, India. \r\n- In the past three years, there has been an evolution in the quantum tech community, which has embraced OSS. \r\n- OSS libraries are used as a means to grow the user base, as well as in a more structural way for quantum computers, as they provide cloud access to quantum devices, e.g., IBM Q.\r\n- QuTiP is the only major library that has continued to thrive in this ecosystem, competing with other library packages that are funded by corporations or VC-backed startups/ \r\n- Since the tools of QuTiP provide a common ground to study quantum mechanics, it is important that this independent project is provided with the necessary support to thrive\r\n- As access to quantum computers becomes more and more widespread also for the use of data scientist and QuTiP's popularity grows even more for undergraduate and graduate courses, becoming the de-facto standard OSS to study quantum optical systems, it is imperative that the QuTiP library makes a quality jump to provide a comprehensive introduction to its tools for a much broader community of users. \r\n\r\n\r\n- QuTiP website: http://www.qutip.org/\r\n- GitHub repository: https://github.com/qutip\r\n- GitHub repository (QuTiP code): https://github.com/qutip/qutip\r\n- GitHub repository (QuTiP documentation): https://github.com/qutip/qutip-doc\r\n- GitHub repository (QuTiP tutorials): https://github.com/qutip/qutip-notebooks\r\n- Latest version of the documetnation:\r\n  http://qutip.org/docs/latest/index.html\r\n- Historical archive of released documentation: http://qutip.org/documentation.html\r\n\r\n\r\n## QuTiP core development team\r\n\r\nQuTiP core development team: (Alex Pitchford, alex.pitchford@gmail.com). Additional mentors will be the project's core contributors Nathan Shammah (nathan.shammah@gmail.com), Shahnawaz Ahmed (shahnawaz.ahmed95@gmail.com) and Eric Giguere (eric.giguere@usherbrooke.ca). \r\n\r\nQuTiP is a project started by Robert J. Johansson and Paul Nation. Other core developers have been Arne Grimso, Chris Granade and over other 44 contributors. \r\n\r\n## References\r\n[1] J. R. Johansson, P. D. Nation, and F. Nori: \u201cQuTiP: An open-source Python framework for the dynamics of open quantum systems.\u201d, Comp. Phys. Comm. 183, 1760\u20131772 (2012)\r\n\r\n[2] J. Robert Johansson, Paul D. Nation, and Franco Nori: \u201cQuTiP 2: A Python framework for the dynamics of open quantum systems.\u201d, Comp. Phys. Comm. 184, 1234 (2013)\r\n\r\n[3] J. Preskill, \"Quantum Computing in the NISQ era and beyond.\" Quantum **2**, 79 (2018)\r\n\r\n[4] Mark Fingerhuth, Tom\u00e1\u0161 Babej, and Peter Wittek, Open source software in quantum computing, PLoS ONE 13 (12): e0208561 (2018).\r\n\r\n[5] N. Shammah, S. Ahmed, N. Lambert, S. De Liberato, and F. Nori, \"Open quantum systems with local and collective incoherent processes: Efficient numerical simulation using permutational invariance \" Phys. Rev. A 98, 063815 (2018). Code at [http://piqs.readthedocs.io](http://piqs.readthedocs.io)\r\n\r\n[6] N. Lambert, S. Ahmed, M. Cirio, and F. Nori, \"Virtual excitations in the ultra-strongly-coupled spin-boson model: physical results from unphysical modes\", arXiv preprint arXiv:1903.05892. Also [http://matsubara.readthedocs.io](http://matsubara.readthedocs.io)\r\n\r\n**Other relevant material**:\r\n\r\n- Slides on QuTiP and the quantum-tech open source ecosystem (Nathan Shammah @ Berkeley Lab, 2019). [PDF](https://conferences.lbl.gov/event/195/session/6/contribution/13/material/slides/0.pdf)\r\n\r\n- [\"The rise of open source in quantum physics research\"](http://blogs.nature.com/onyourwavelength/2019/01/09/the-rise-of-open-source-in-quantum-physics-research/), Nathan Shammah and Shahnawaz Ahmed, Nature's physics blog, January 9, 2019. \r\n\r\n- \"Bit to QuBit: Data in the age of quantum computers\", Shahnawaz Ahmed, PyData 2018, Warsaw, Poland, 2019. [YouTube video](https://www.youtube.com/watch?v=6GAXJhL1mSs).", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7df8b8e5-a1fc-5fc6-89c3-dfe056ef6620", "id": 35, "code": "3KH7ZM", "public_name": "Nathan Shammah", "avatar": null, "biography": "I work for the development of open-source software for quantum physics research and its role in quantum technology transfer. I am also interested in the study of quantum information processing and light-matter interaction in solid-state cavity quantum electrodynamics (QED). My research focus is on open quantum systems dynamics, and the interplay between cooperative effects and dissipative mechanisms in many-body quantum systems. In particular, I investigate how fingerprints of the ultrastrong coupling regime between light and matter can be addressed. I am also interested in the characterization of the light-matter physics in physical devices, such as superconducting circuits and semiconductor quantum wells, for technology applications such quantum information processing and Terahertz light emission.", "answers": []}, {"guid": "e9ffe006-eb9e-586c-96b0-292a9f7b2808", "id": 1428, "code": "LWLGFE", "public_name": "Alexander Pitchford", "avatar": "https://pretalx.com/media/avatars/fuji-pitchfork-zoom.png", "biography": "Currently working as postgraduate researcher in quantum control theory and optimisation algorithms. I am employed by the [Mathematics Dept of Aberystwyth University](https://www.aber.ac.uk/en/maths/). I am also associated with the [Controlled Quantum Dynamics Group at Imperial College](https://www.imperial.ac.uk/controlled-quantum-dynamics). \r\nI am part of the Administration Team for [QuTiP](http://qutip.org/) - the Quantum Toolkit in Python. I introduced the quantum control sub-library into QuTiP. Through this I also have close ties with the [Theoretical Quantum Physics Lab at RIKEN](http://www.riken.jp/en/research/labs/chief/theor_qtm_phys/)\r\n\r\nI spent the last 9 years doing undergraduate, then PhD Physics at Aberystwyth University. Prior to that I worked as a software developer / consultant in manufacturing simulation and finance process automation.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/PDYER8/", "id": 1822, "guid": "687cf29f-73ab-51c0-b74a-a504f5433e9f", "date": "2019-09-04T14:45:00+00:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1822-constrained-data-synthesis", "title": "Constrained Data Synthesis", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "We introduce a method for creating synthetic data \"to order\" based on learned (or provided) constraints and data classifications. This includes \"good\" and \"bad\" data.", "description": "Synthetic data is useful in many contexts, including\r\n\r\n  * providing \"safe\", non-private alternatives to data containing personally identifiable information\r\n  * software and pipeline testing\r\n  * software and service development\r\n  * enhancing datasets for machine learning.\r\n\r\nSynthetic data is often created on a bespoke basis, and since the advent of generative adverserial networks (GANs) there has been considerable interest and experimentation with using those as the basis for creating synthetic data.\r\n\r\nWe have taken a different approach. We have worked for some years on developing methods for automatically finding constraints that characterise data, and which can be used for testing data validity (so-called \"test-driven data analysis\", TDDA). Such constraints form (by design) a useful characterisation of the data from which they were generated. As a result, methods that generate datasets that match the constraints necessarily construct datasets that match many of the original characteristics of the data from which the constraints were extracted.\r\n\r\nAn important aspect of datasets is the relationship between \"good\" (~ valid) and \"bad\" (~ invalid) data, both of which are typically present. Systems for creating useful, realistic synthetic data generally need to be able to synthesize both kinds, in realistic mixtures.\r\n\r\nThis talk will discuss data synthesis from constraints, describing what has been achieved so far (which includes synthesizing good and bad data) and future research directions.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "59c564f1-00b7-5285-a30d-a6d37d21928a", "id": 1793, "code": "R89Z7R", "public_name": "Nick Radcliffe", "avatar": "https://pretalx.com/media/avatars/NickRadcliffe600x600.jpeg", "biography": "Nick is a practising data scientist with over 30 years experience, from neural networks and genetic algorithms on parallel systems in the late 1980s, through parallel machine learning and 3D visualisation software as a founder of Quadstone, from 1995, to novel modelling methods (e.g. uplift modelling) in the early 2000s. Since 2007 , he has run Edinburgh data science specialists Stochastic Solutions. \r\n\r\nNick enjoys using his deep knowledge of underlying algorithms to fashion tailored solutions to practical business problems for clients including Barclays, Sainsburys, T-Mobile and Skyscanner, and has a particular interest in testing and correctness in data science.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/PEJPDG/", "id": 1445, "guid": "af9924cb-d992-5c91-ac58-9c8c14f4195e", "date": "2019-09-04T15:15:00+00:00", "start": "15:15", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1445-tofu-an-open-source-python-cython-library-for-synthetic-tomography-diagnostics-on-tokamaks", "title": "ToFu - an open-source python/cython library for synthetic tomography diagnostics on Tokamaks", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "We present an open-source parallelized and cythonized python library, ToFu, for modeling tomography diagnostics on nuclear fusion reactors. Its functionalities (with realistic examples), its architecture and its design will be shown.", "description": "Nuclear fusion comes along with great promises of almost limitless energy with little risks and waste. But it also comes with significant scientific and technological complexities. Decades of efforts may find an echo in ITER, an international tokamak being built to address this challenge. A tokamak is a particular kind of advanced experimental nuclear fusion reactor. It is a torus-shaped vacuum vessel in which a hydrogen plasma of very low density is heated up to temperatures (10-100 millions of degrees Celsius) allowing nuclear fusion reactions to occur. The torus-shaped plasma radiates light, which is measured in various wavelength domains by dedicated sets of detectors (called diagnostics), like 2D cameras observing visible light, 1D arrangements of diodes sensitive to X-rays, ultra-violet spectrometers... Due to the torus shape, the plasma is axisymmetric, and like in medical imaging, tomography methods can be used to diagnose the light radiated in a plasma cross-section.\r\nFor all diagnostics, one can seek to solve the direct or the inverse problem. The direct problem consists in computing the measurements from a known plasma light emissivity, provided by a plasma simulation for example. \r\nThe inverse problem consists in computing the plasma light emissivity from experimental measurements. The algorithms involved in solving both the direct and inverse problem are very similar, no matter the wavelength domain.\r\n\r\nLike many, the fusion community tends to suffer from a lack of reproducibility of the results it publishes. This problem is particularly acute in the case of tomography diagnostics since the inverse problem is ill-posed and the solution unicity is not guaranteed. There are also many possible simplifying hypotheses that may, or may not, be relevant for each diagnostic. In this regard, the historical uses of the community display a large variety of single-user black-box codes, each typically designed by a student, and often forgotten or left as is until a new student is hired and starts all over again. \r\n\r\nIn this context, a machine-independent, open-source and documented python library, ToFu, was started to provide the fusion community with a common and free reference tool. \r\nWe thus aim at improving reproducibility by providing a known and transparent tool, able to efficiently solve both the direct and inverse problem for tomography diagnostics. It can use very simple hypothesis or very complete diagnostics descriptions alike, one of the ideas being that it should allow users to perform accurate calculations easily, sparing them the need to simplify hypotheses that are not always valid. \r\n\r\nA zero version of tofu, fully operational but not user-friendly enough, was first developed between 2014 and 2016 when it was used for published results. Strong with this first proof of principle, a significant effort was initiated in 2017 to completely re-write the code with a stronger emphasis on python community standards (PEP8), version control (Github), performance (cython), packaging (pip and conda), continuous integration (nosetests and travis), modularity (architecture refurbishing), user-friendliness (renamings, utility tools) and object-oriented coding (class inheritance). \r\nThis effort is still ongoing to this day and is scheduled to go on for the next 2.5 years. However, the first milestones have been reached, and we would like to present the first re-written modules to the python community, for publicity, advice, feedback, mutually enriching exchanges and more generally because we feel tofu is part of the large open-source python scientific community.\r\n\r\nThe code is composed of several modules: a geometry module, a data visualization module, a meshing module, and an inversion module. We will present the geometry module (containing ray-tracing tools, spatial integration algorithms...) and the data module (making use of matplotlib for pre-defined interactive figures). Using profiling tools, the numerical core of the geometry module was optimized and parallelized recently in `Cython` making the code more than ten thousand times faster than the previous version on some test cases. Memory usage has also been reduced by half on the largest test cases.\r\n\r\nsee [ToFu](https://github.com/ToFuProject/tofu)", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6b4cc5f6-c9d1-5953-8d32-0422dcbec946", "id": 1513, "code": "ZKBLZU", "public_name": "Laura Mendoza", "avatar": "https://pretalx.com/media/avatars/new_pp.jpg", "biography": "I was born in Strasbourg, France, but was raised in Guatemala city where I went to a French school. I did my undergraduate degree in the University of Strasbourg, where I obtained a bachelor degree in Mathematics with minor in Computer Science followed by a Master degree in Applied Mathematics specialized in Scientific Computing and Computer Science Security. Then, I obtained my Ph.D. in Numerical Methods in Plasma's Physics at the Technical University of Munich while I was based at the Max-Planck Institute for Plasma Physics. I want back to Strasbourg for a 2-year post-doc. Currently, I am working on the development and optimization of the ToFu library as a Research Engineering at the INRIA institute. This research is being funded by a 3-year Engineering grant of EUROFUSION obtained in June 2018.", "answers": []}, {"guid": "4676076a-dbe8-5994-a34f-a274e5e93777", "id": 1312, "code": "8YRZ7W", "public_name": "Didier VEZINET", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/HGNPFF/", "id": 2748, "guid": "2beccc72-e284-5d8f-b76f-dba46ed5c96f", "date": "2019-09-04T15:45:00+00:00", "start": "15:45", "logo": null, "duration": "00:15", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-2748-debugging-in-jupyterlab", "title": "Debugging in JupyterLab", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "Debugging Jupyter Notebooks has been one of the most requested features. In this presentation we give an overview of the current state and tools for debugging in Jupyter, and offer a glimpse of what is coming next.", "description": "Layout:\r\n\r\n##### 1. Current tools for debugging Jupyter Notebooks\r\n\r\n- print statements\r\n- ipdb\r\n- PixieDebugger (IBM)\r\n- Visual Studio Code cell debugging\r\n\r\n##### 2. Native debugging support for Jupyter Kernels\r\n\r\n- Jupyter protocol extension\r\n- Debug Adapter Protocol in xeus-python\r\n\r\n##### 3. Debugger extension for JupyterLab\r\n\r\n- An IDE-like debugging experience in JupyterLab\r\n- Active development, current prototype\r\n- Demo", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6dd57d12-0432-5dd5-8eec-5649265edf22", "id": 2552, "code": "MAZXW8", "public_name": "Jeremy Tuloup", "avatar": "https://pretalx.com/media/avatars/JT2019.png", "biography": "Scientific Software Developer at QuantStack", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/XZLXZM/", "id": 1567, "guid": "d470fc0b-849f-5919-9961-e633d4cd5d64", "date": "2019-09-04T16:30:00+00:00", "start": "16:30", "logo": null, "duration": "00:15", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1567-controlling-a-confounding-effect-in-predictive-analysis-", "title": "Controlling a confounding effect in predictive analysis.", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "Confounding effects are often present in observational data: the effect or association studied is observed jointly with other effects that are not desired.", "description": "For instance, when predicting the salary to offer given the descriptions of professional experience, the risk is to capture indirectly a gender bias present in the distribution of salaries. Another example is found in biomedical applications, where for an automated radiology diagnostic system to be useful, it should use more than socio-demographic information to build its prediction.\r\n\r\nHere I will talk about confounds in predictive models. I will review classic deconfounding techniques developed in a well-established statistical literature, and how they can be adapted to predictive modeling settings. Departing from deconfounding, I will introduce a non-parametric approach \u2013that we named \u201cconfound-isolating cross-validation\u201d\u2013 adapting cross-validation experiments to measure the performance of a model independently of the confounding effect.\r\n\r\nThe examples are mentioned in this work are related to the common issues in neuroimage analysis, although the approach is not limited to neuroscience and can be useful in another domains.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6d5b48e4-55a9-5787-81a0-73e32ddc722e", "id": 1103, "code": "7N3KYY", "public_name": "Darya Chyzhyk", "avatar": null, "biography": "I\u2019m Darya, researcher in artificial intelligence and machine learning, in particular feature selection, clustering, pattern recognition, segmentation and statistical analysis. During the last years I have been working on computer aided diagnostic systems for brain diseases that allow identification of the anatomical location of image biomarkers, lesion segmentation and phenotype prediction.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/DVDLRG/", "id": 1429, "guid": "8aabfffe-488a-5648-9e2e-488e78563f4e", "date": "2019-09-04T16:45:00+00:00", "start": "16:45", "logo": null, "duration": "00:15", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1429-the-rapid-analytics-and-model-prototyping-ramp-framework-tools-for-collaborative-data-science-challenges", "title": "The Rapid Analytics and Model Prototyping (RAMP) framework: tools for collaborative data science challenges", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "The RAMP (Rapid Analytics and Model Prototyping) framework provides a platform to organize reproducible and transparent data challenges. We will present the different framework bricks.", "description": "We will give an overview of the RAMP framework, which provides a platform to organize reproducible and transparent data challenges.\r\n\r\nRAMP workflow is a python package used to define and formalize the data science problem to be solved. It can be used as a standalone package and allows a user to prototype different solutions. In addition to RAMP workflow, a set of packages have been developed allowing to share and collaborate around the developer solutions. Therefore, RAMP database provides a database structure to store the solutions of different users and the performance of these solutions. RAMP engine is the package to run the user solutions (possibly on the cloud) and populate the database. Finally, RAMP frontend is the web frontend where users can upload their solutions and which shows the leaderboard of the challenge.\r\n\r\nThe project is open-source and can be deployed on any local server. The framework has been used at the Paris-Saclay Center for Data Science for setting up and solving about twenty scientific problems, for organizing collaborative data challenges, for organizing scientific sub-communities around these events, and for training novice data scientists.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d6297904-fa91-50b7-9510-cc23c0cf9edb", "id": 55, "code": "KMDJAL", "public_name": "Guillaume Lemaitre", "avatar": "https://pretalx.com/media/avatars/guillaumelemaitre.jpg__200x200_q85_crop_subsampling-2_upscale_9Ptqss3.jpg", "biography": "I am an engineer working for the scikit-learn foundation @ Inria.", "answers": []}, {"guid": "7e876587-827f-57eb-8ec2-ba1bbb58a7f3", "id": 75, "code": "7VUXWM", "public_name": "Joris Van den Bossche", "avatar": "https://pretalx.com/media/avatars/profile_Rc56sfi.png", "biography": "I am a core contributor to Pandas and maintainer of GeoPandas. I have given several tutorials at international conferences and a course on python for data analysis for PhD students at Ghent University. I did a PhD at Ghent University and VITO in air quality research, worked at the Paris-Saclay Center for Data Science, and, currently I am a freelance software developer and teacher.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track 3 (Oteiza)": [{"url": "https://pretalx.com/euroscipy-2019/talk/SZ8S8G/", "id": 1801, "guid": "8e1293ca-3c89-5427-a872-5c2f39208a1d", "date": "2019-09-04T11:30:00+00:00", "start": "11:30", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1801-sufficiently-advanced-testing-with-hypothesis", "title": "Sufficiently Advanced Testing with Hypothesis", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Testing research code can be difficult, but is essential for robust results.  Using Hypothesis, a tool for property-based testing, I'll show how testing can be both easier and dramatically more powerful - even for complex \"black box\" codes.", "description": "Code is now a critical part of almost all research, whether for communication or\r\nfor data collection and analysis. Unfortunately, producing reliably error-free\r\ncode remains an open problem in science to an even greater extent than other\r\napplications. Soergal (2014) estimates that \"any reported scientific result could\r\nvery well be wrong if data have passed through a computer, and that these\r\nerrors may remain largely undetected.\" - though some software errors are\r\nmuch more dramatic, as with the crash of the Mars Climate Orbiter.\r\n\r\nWhat can we do to reduce the rate of errors in our own code? There is no silver\r\nbullet, but a more efficient way to create tests would certainly help...\r\n\r\nThe answer is to have a computer write your tests for you! Using Hypothesis,\r\nyou describe valid inputs - from 'an integer' to 'dataframes like this', as\r\ncomplex and precise as needed - and write a test which should always pass...\r\nthen Hypothesis searches for the smallest inputs that cause an error.\r\n\r\nThis approach is called property based testing, and it regularly catches errors\r\nthat evaded every human review and hand-written test case (even in Numpy).\r\nEven better, it rewards well-designed software - but can also do a quick check\r\nof a script in just a few lines of code.\r\n\r\nWe'll cover the theory of property-based testing, a worked example, and then\r\njump into a whirlwind tour of the Hypothesis API: how to use, define, compose,\r\nand infer strategies for input; properties and testing tactics for your code; and\r\nhow to debug your tests if everything seems to go wrong.\r\n\r\nBy the end of this talk, you'll be ready to find real bugs with Hypothesis in\r\nanything from data pipelines to the core scientific Python libraries. Be the\r\nchange you want to see in your team's code - or test someone else's and help\r\npush the world into a new age of reliable research software!", "recording_license": "", "do_not_record": false, "persons": [{"guid": "3f0779c7-230b-53b2-863c-6a3cff408151", "id": 1604, "code": "LFU8AS", "public_name": "Zac Hatfield-Dodds", "avatar": "https://pretalx.com/media/avatars/zac-profile_elxMt0n.jpg", "biography": "Zac is a researcher at the Australian National University\u2019s 3A Institute, which is building a new applied science to 'manage the machines' - AI, cyber-physical systems, and other new technologies.\r\n\r\nHe started using Python to analyse huge environmental datasets, and contributing to libraries like Xarray to make such analysis easier for all scientists.  Now, as a maintainer of Hypothesis, Pytest, and Trio, Zac is still passionate about making it easy to write software you can understand and rely on.\r\n\r\nWhen not at a computer he can usually be found surrounded by books of all kinds, the Australian bush, or both.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/YHCP9C/", "id": 1810, "guid": "3b5336b9-8930-5392-b533-cc24a34e0aa4", "date": "2019-09-04T12:00:00+00:00", "start": "12:00", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1810-what-about-tests-in-machine-learning-projects-", "title": "What about tests in Machine Learning projects?", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits?", "description": "Once your machine learning POC seems promising and your development environment is set up, the next step is to refactor your code and write TESTS. We know that a lot of people think tests are too complicated and boring to write and they are not very useful. Some manual checks can address the need.\r\n\r\nIt is not totally false. Tests can be really boring and time consuming to write when you don't have the right tools, the right APIs, the right environments or the right code structure.\r\nBut it is always a bad idea to ignore tests or to perform them manually. If you want to be involved in your project life cycle, if you want to bring it from POC to production you need to care about tests. After some years tackling production bugs, you can't feel safe delivering without tests as you can't start driving until your seat belt is fastened.\r\n\r\nThere is more than one way to test. Tests can be split on several levels (unit, component, functional, performances, etc...) to be able to quickly identify the faulty code/data/parameter. Tests must also be automated in a Continuous Integration and run at least on each experiment before merging it in the baseline pipeline as it is done in software engineering (the CI is triggered on each feature branch).\r\n\r\nThis talk is about how to easily write tests and testable code, how to avoid most common traps and what are the benefits of tests on unrealistic data in your Machine Learning project. \r\n\r\n(Tests on real data are also really important but they are not the main purpose of this talk.)\r\n\r\nSlides are here: sdg.jlbl.net/slides/tests_for_datascientist/presentation.html", "recording_license": "", "do_not_record": false, "persons": [{"guid": "3008d29d-aa8c-565c-91d3-a086e52cc362", "id": 1731, "code": "X7BWJC", "public_name": "Sarah Diot-Girard", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/QVCFGE/", "id": 1436, "guid": "6e172ed5-2139-599e-8248-3316828ee999", "date": "2019-09-04T14:45:00+00:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1436-scientific-devops-designing-reproducible-data-analysis-pipelines-with-containerized-workflow-managers", "title": "Scientific DevOps:  Designing Reproducible Data Analysis Pipelines with Containerized Workflow Managers", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "A review of DevOps tools as applied to data analysis pipelines, including workflow managers, software containers, testing frameworks, and online repositories for performing reproducible science that scales.", "description": "Open source and open science come together when the software is accessible, transparent, and owned by all.  For data analysis pipelines that grow in complexity beyond a single Jupyter notebook, this can become a challenge as the number of steps and software dependencies increase.  In this talk, Nicholas Del Grosso will review a variety of tools for packaging and managing a data analysis pipeline, showing how they fit together and benefit the development, testing, deployment, and publication processes and the scientific community.  In particular, this talk will cover:\r\n\r\n  - **Workflow managers** (e.g. Snakemake, PyDoit, Luigi) to combine complex pipelines into single applications.  \r\n\r\n  - **Container Solutions** (e.g. Docker and Singularity) to package and deploy the software on others' computers, including high-performance computing clusters.\r\n\r\n  - **The Scientific Filesystem** to build explorable and multi-purpose applications.\r\n\r\n  - **Testing Frameworks** (e.g. PyTest, Hypothesis) to declare and confirm the assumptions and functionality of the analysis pipeline.\r\n\r\n  - **Ease-of-Use Utilities** to share the pipeline online and make it accessible to non-programmers.\r\n\r\nBy writing software that stays manageable, reproducible, and deployable continuously throughout the development cycle, we can better fulfill the goals of open science and good scientific practice in a digital era.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "4fa481d1-dd10-5a8f-9219-c2f9e35f4662", "id": 1502, "code": "TVGSW8", "public_name": "Nicholas Del Grosso", "avatar": null, "biography": "Nicholas Del Grosso is an American neuroscientist post-doc in Germany who is passionate about open, reproducible science.   Besides teaching data analysis and programming to scientists in courses, workshops, and at PyData Munich, he builds scientific software to study the learning process itself--from understanding the brain's responses to exposure to machine-brain interfaces, rat's understanding of 3D virtual environments, and scientists's responses to the stress of managing their own experiments!\r\n\r\n**Note**: Nick is currently looking for a post-doctoral position to work on problems related to reproducible science!  If you're looking for someone like him, send him a message or come say hello!", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/UMWUTW/", "id": 1813, "guid": "050d4697-9cd8-5ce4-9462-008a8c2fb131", "date": "2019-09-04T15:15:00+00:00", "start": "15:15", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1813-dashboarding-with-jupyter-notebooks-voila-and-widgets", "title": "Dashboarding with Jupyter notebooks, voila and widgets", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Turn your Jupyter notebook into a beautiful modern React or Vue based dashboard using voila and Jupyter widgets.", "description": "Sharing the result of a Jupyter notebook is currently not an easy path. With voila we are changing this. Voila is a small but important ingredient in the Jupyter ecosystem. Voila can execute notebooks, keeping the kernel connected but does not allow for arbitrary code execution, making it safe to share your notebooks with others.\r\nWith new libraries built on top of Jupyter widgets/ipywidgets (ipymaterialui and ipyvuetify) we allow beautiful modern React and Vue components to enter the Jupyter notebook. Using voila we can integrate the ipywidgets seamlessly into modern React and Vue pages, to build modern dashboards directly from a Jupyter notebook.\r\nI will give a live example on how to transform a Jupyter notebook into a fully functional single page application with a modern (Material Design) look.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5ec0270e-50b8-55b2-96ab-ccef2e5d93a2", "id": 1510, "code": "38EYUA", "public_name": "Maarten Breddels", "avatar": "https://pretalx.com/media/avatars/DSC_0701_bright_small.jpeg", "biography": "Maarten Breddels is an entrepreneur and freelance developer/consultant/data scientist working mostly with Python, C++ and Javascript in the Jupyter ecosystem. Creator of ipyvolume and vaex, founder of vaex.io. His expertise ranges from fast numerical computation, API design, to 3d visualization. He has a Bachelor in ICT, a Master and PhD in Astronomy, likes to code and solve problems.", "answers": []}, {"guid": "8f45d923-a237-55f6-8ecb-74c87e5f0d41", "id": 2416, "code": "DXWAL8", "public_name": "Martin Renou", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/PFH3QK/", "id": 1403, "guid": "7c9654fa-3aab-5c1c-a0ce-787c82cecbf2", "date": "2019-09-04T15:45:00+00:00", "start": "15:45", "logo": null, "duration": "00:15", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1403-make-your-python-code-fly-at-transonic-speeds-", "title": "Make your Python code fly at transonic speeds!", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "[Transonic](http://transonic.readthedocs.io) is a new pure Python package to easily accelerate modern Python-Numpy code with different accelerators (like Cython, Pythran, Numba, Cupy, etc...).", "description": "Slides available at https://tiny.cc/euroscipy2019-transonic\r\n\r\n[Transonic](http://transonic.readthedocs.io/) is a pure Python package (requiring Python >= 3.6) to easily accelerate modern Python-Numpy code with different accelerators (like Cython, [Pythran](https://github.com/serge-sans-paille/pythran), Numba, Cupy, etc...) opportunistically (i.e. if/when they are available).\r\n\r\nWe will first present the context of the creation of this package, i.e. the Python's High Performance Computing (HPC) Landscape. We will show how Transonic can be used to write elegant and very efficient HPC codes with Python, with examples taken from real-life research simulation codes ([fluidfft](https://fluidfft.readthedocs.io) and [fluidsim](https://fluidsim.readthedocs.io)). We will discuss the advantages of using Transonic instead of writing big Cython extensions or using Numba or Pythran directly.\r\n\r\nA strategy to quickly develop a very efficient scientific application/library with Python and Transonic could be:\r\n\r\n1. Use modern Python coding, standard Numpy/Scipy for the computations and all the cool libraries you want.\r\n\r\n2. Profile your applications on real cases, detect the bottlenecks and apply standard optimizations with Numpy.\r\n\r\n3. Add few lines of Transonic to compile the hot spots.\r\n\r\nWe won't forget to also discuss some limitations of Transonic, and more generally of Python and its numerical ecosystem for High Performance Computing.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "69db8997-4dfc-571d-95e0-aed628c39e6d", "id": 1486, "code": "83M9HS", "public_name": "Pierre Augier", "avatar": null, "biography": "Researcher in fluid dynamics at LEGI (Grenoble). Use Python a lot, in particular for the [FluidDyn project](https://fluiddyn.readthedocs.io).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/JSCWY7/", "id": 1573, "guid": "ad5d4bbd-7302-5e6f-9bce-b828f50dc484", "date": "2019-09-04T16:30:00+00:00", "start": "16:30", "logo": null, "duration": "00:15", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1573-pyfeti-an-easy-and-massively-dual-domain-decomposition-solver-for-python", "title": "PyFETI - An easy and massively Dual Domain Decomposition Solver for Python", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver using Domain Decomposition method, where problems are solved locally by Direct Solver and at the interface iteratively.", "description": "PyFETI is a python implementation of Finite-Element-Tearing-Interconnecting Methods. The library provides a massive linear solver that uses Domain Decomposition Techniques. FETI methods rely in the solution of a linear system, based on to linear solver algorithm strategies, Direct and Iteratively. A big problem is decomposed in subdomains, generating an additional set of constraints at the interface among subdomains. The local problem solution is formulated based on a new interface force at the interface that must connect the subdomains. Therefore, given an interface force, the local problems are solved based on a direct solver, e.g SuperLU, and the update of interface force is performed by Preconditioned Conjunged Projected Gradient.  The library has been tested for large linear elastic problems at the IT4I supercomputer center.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7d9dfa34-a68e-55de-b73f-7bce43b9b6a9", "id": 1625, "code": "KJZ9JQ", "public_name": "Guilherme Jenovencio", "avatar": "https://pretalx.com/media/avatars/Dia_3_-_885.jpg", "biography": "Ph.D. candidate at Applied Mechanics TUM focused on Computational Solid Mechanics. Strong experience in Structural analysis and Optimization.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/JHGWWN/", "id": 1450, "guid": "31e39f11-cfec-5d28-a6e9-122afdb42e1e", "date": "2019-09-04T16:45:00+00:00", "start": "16:45", "logo": null, "duration": "00:15", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1450-high-voltage-lab-common-code-basis-library-a-uniform-user-friendly-object-oriented-api-for-a-high-voltage-engineering-research-", "title": "High Voltage Lab Common Code Basis library: a uniform user-friendly object-oriented API for a high voltage engineering research.", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "The library leverages Python richness to provide a uniform user-friendly API for a zoo of industrial communication protocols used to control high voltage engineering devices, together with abstraction and implementations for such devices.", "description": "At the heart of ETH High Voltage Lab's (HVL) research are industrial devices put\r\ntogether into code-automated experiments. It's a zoo of industrial communication\r\nprotocols one needs to handle when controlling these devices. HVL decided to switch from\r\nMATLAB to Python as a programming and analysis tool. Python community provides solutions\r\nto majority of technicalities involved in handling multitude of industrial communication\r\nprotocols used to control high voltage research experiment devices. Moreover\r\nPython seems to be a more future-proof choice, meeting industry demand for a more\r\ncost-effective and collaborative solution.\r\n\r\nThe HVL Common Code Basis library (`hvl_ccb`) provides a uniform user-friendly\r\nobject-oriented API as well as  implementation for multiple of high voltage engineering\r\ndevices and their respective communication protocols. The library leverages Python's\r\nopen source community - implementations of specific communication protocols, but also\r\nrelies heavily on some of the languages newer features such as typing hints, dataclasses\r\nor enums.\r\n\r\nPython typing hints are used not only for their static type checking and autocompletion\r\nsupport from IDEs, but also for dynamic type checking of the communication protocol's\r\nand devices' configurations. The configurations themselves are a customized\r\nimplementation of Python's 3.7 dataclasses. Configurations properties rely heavily on\r\nPython (advanced) enumerations.\r\n\r\nCurrently, the library supports serial port, VISA over TCP, Modbus TCP, LabJack LJM and\r\nOPC UA communication protocols. These protocols are used within code abstraction of\r\ndevices such MBW973 SF6 Analyzer / dew point mirror, LabJack (T7-PRO) device, Schneider\r\nElectric ILS2T stepper motor drive, Elektro-Automatik PSI9000 DC power supply, Rhode &\r\nSchwarz RTO 1024 oscilloscope, or the Lab's state-of-the-art Supercube platform, which\r\nencapsulates safety components, the voltage source, as well as other auxiliary devices.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "e305aa3c-dcb9-5ece-9283-eba69eb03cb2", "id": 1514, "code": "FZGGZF", "public_name": "Miko\u0142aj Rybi\u0144ski", "avatar": null, "biography": "I have MSc degrees in Mathematics and Computer Science as well as a PhD degree in computational Mathematics, all from the University of Warsaw (Poland). My theoretical and research background is backed up with many years of experience in industrial and scientific software development.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Posters at 16:00": [{"url": "https://pretalx.com/euroscipy-2019/talk/WGE8NA/", "id": 1248, "guid": "a2c28767-8392-53fa-b260-ad7c6803a711", "date": "2019-09-04T08:25:00+00:00", "start": "08:25", "logo": null, "duration": "01:30", "room": "Posters at 16:00", "slug": "euroscipy-2019-1248-scikit-fdiff-a-new-tool-for-pde-solving", "title": "scikit-fdiff, a new tool for PDE solving", "subtitle": "", "track": null, "type": "Poster", "language": "en", "abstract": "Scikit-fdiff (formally Triflow) has been developed in order to facilitate mathematic models building. It has been made to quickly build and try many asymptotic falling film modelling with different phenomena coupling (energy and mass transfer).", "description": "Scikit-FDiff (formerly known as Triflow) is a new tool, written in pure Python, that focus on reducing the time between the developpement of the mathematical model and the numerical solving. It allows an easy and automatic finite difference discretization, thanks to a symbolic processing that can deal with systems of multi-dimensional partial differential equation with complex boundary conditions.\r\n\r\nUsing finite differences and the method of lines, it allows the transformation of the original PDE into an ODE, providing a fast computation of the temporal evolution vector and the Jacobian matrix. The later is pre-computed in a symbolic way and sparse by nature. It can be evaluated with as few computational resources as possible, allowing the use of implicit and explicit solvers at a reasonable cost.\r\n\r\nClassic ODE solvers have been implemented (or made available from dedicated python libraries), including backward and forward Euler scheme, Crank-Nickolson, explicit Runge-Kutta. More complexes ones, like improved Rosenbrock-Wanner schemes up to the 6th order, are also available. The time-step is managed by a built-in error computation, which ensures the accuracy of the solution. The main goal of the software is to minimize the time spent writting numerical solvers to focus on model development and data analysis.\r\n\r\nScikit-Fdiff is then able to solve toy cases in a few line of code as well as complex models. Extra tools are available, such as data saving during the simulation, real-time plotting and post-processing. It has been validated with the shallow-water equation on dam-breaks and the steady-lake case. It has also been applied to heated falling-films, dropplet spread and simple moisture flow in porous medium.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "834d0f62-7045-56ca-bec7-c3e0a9014d7e", "id": 40, "code": "F7R79L", "public_name": "Nicolas Cellier", "avatar": "https://pretalx.com/media/avatars/profile_N59aOoS.jpeg", "biography": "Postdoct working in the Alps, mostly doing numerical support for the research. Specialized into PDE solving, I also have a strong numerical analysis background, and can use stat and machine learning tools.\r\nI mainly do Python (for the last 8 years), but I can switch on other tool if I need to : lower level language as C or Fortran, or specialized one like R and Julia.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/CUXPCN/", "id": 2272, "guid": "94203d7f-aca2-53da-9874-31e8543f3677", "date": "2019-09-04T11:40:00+00:00", "start": "11:40", "logo": null, "duration": "01:30", "room": "Posters at 16:00", "slug": "euroscipy-2019-2272-phonolammps-phonopy-with-lammps-made-easy", "title": "PhonoLAMMPS: Phonopy with LAMMPS made easy", "subtitle": "", "track": null, "type": "Poster", "language": "en", "abstract": "PhonoLAMMPS is a Phonopy interface with LAMMPS that allows to calculate the interatomic force constants and other phonon properties from a usual LAMMPS input file.", "description": "In recent years Phonopy[1] has become a very well known software in  the materials science field for calculating the phonon properties of crystals. While Phonopy provides interfaces for many popular First Principles calculations software such as VASP, WIEN2K, SIESTA, etc., the implementation of interfaces for software based on empirical potentials is usually more challenging. This fact is due to the large variability of input structure and potential definitions that these kind of software require in comparison to the ones based on First Principles.\r\n\r\nIn this poster I present PhonoLAMMPS[2], a Phonopy interface with LAMMPS[3]  written in python that makes use of the LAMMPS official python API to allow to calculate the interatomic 2nd order force constants from a usual LAMMPS input file.\r\n\r\nPhonoLAMMPS can be used either as a python module with a similar phonopy-like \r\ninterface or as a simple comandline script.\r\n\r\n[1] A. Togo and I. Tanaka, Scr. Mater., 108, 1-5 (2015)  \r\n[2] https://github.com/abelcarreras/phonolammps  \r\n[3] S. Plimpton, J Comp Phys., 117, 1-19 (1995)", "recording_license": "", "do_not_record": false, "persons": [{"guid": "ff8b22ba-ac28-52d2-a846-fe5bc5353c28", "id": 2168, "code": "8VJXJQ", "public_name": "Abel Carreras", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/HUPE99/", "id": 1446, "guid": "c1c71ae6-49e9-5b33-9e5f-f86fe8291f16", "date": "2019-09-04T13:15:00+00:00", "start": "13:15", "logo": null, "duration": "01:30", "room": "Posters at 16:00", "slug": "euroscipy-2019-1446-really-reproducible-behavioural-paper", "title": "Really reproducible behavioural paper", "subtitle": "", "track": null, "type": "Poster", "language": "en", "abstract": "A heavily _XKCD_ themed poster about writing a really reproducible behavioural paper in Python environment.\r\n[The poster is also available online.](https://tinyurl.com/y35otadt)", "description": "In recent years replication crisis in life sciences has received significant attention.  Reproducibility of behavioural experiments may be affected by many factors, such as lack of standardisation of experimental conditions or human errors.  While use of standardized systems for automated phenotyping (such as _IntelliCage_) leads to interlaboratory replicability of experiments (1), manual analysis of the obtained data still remains a potential source of irreproducibility due to human errors.  Luckily, a countermeasurement for that issue is known for more than least twenty years: automation of data analysis with a non-interactive computer program (2).\r\n\r\nTo facilitate development of Python programs for automated analysis of mice behavioural data obtained from IntelliCage system _PyMICE_ library (RRID:nlx\\_158570) has been developed.  The title paper is the publication presenting the library to the scientific community (3).  As it has been written according to literate programming paradigm (4), all programs used for analysing the experimental data are embedded in [the source code of the paper itself](https://github.com/Neuroinflab/PyMICE_SM/) which makes the presented results highly reproducible and the methodology of analysis transparent.\r\n\r\n\r\n# Authors\r\n\r\n* Jakub M. Dzik,\r\n* Alicja Pu\u015bcian,\r\n* Zofia Mijakowska,\r\n* Kasia Radwanska,\r\n* Szymon \u0141\u0119ski\r\n\r\n\r\n# Bibliography\r\n\r\n1. A. Codita, A. H. Mohammed, A. Willuweit, A. Reichelt, E. Alleva, I. Branchi, F. Cirulli, G. Colacicco, V. Voikar, D. P. Wolfer, F. J. U. Buschmann, H.-P. Lipp, E. Vannoni, S. Krackow (2012)\r\n   Effects of Spatial and Cognitive Enrichment on Activity Pattern and Learning Performance in Three Strains of Mice in the IntelliMaze.\r\n   Behavior Genetics [doi:10.1007/s10519-011-9512-z](https://dx.doi.org/10.1007/s10519-011-9512-z)\r\n2. J. B. Buckheit, D. L. Donoho (1995)\r\n   WaveLab and Reproducible Research. Lecture Notes in Statistics.\r\n   [doi:10.1007/978-1-4612-2544-7\\_5](https://dx.doi.org/10.1007/978-1-4612-2544-7\\_5)\r\n3. J. M. Dzik, A. Pu\u015bcian, Z. Mijakowska, K. Radwanska, S. \u0141\u0119ski (2017)\r\n   PyMICE: A Python library for analysis of IntelliCage data.\r\n   Behavior Research Methods. [doi:10.3758/s13428-017-0907-5](https://dx.doi.org/10.3758/s13428-017-0907-5)\r\n4. D. E. Knuth (1984) Literate Programming.\r\n   The Computer Journal. [doi:10.1093/comjnl/27.2.97](https://dx.doi.org/10.1093/comjnl/27.2.97)\r\n\r\n\r\n# Acknowledgement\r\n\r\nProject funded from the Polish National Science Centre's SYMFONIA (2013/08/W/NZ4/00691) grant.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "1f576562-fd3c-50c5-8781-e3f3795c584f", "id": 1512, "code": "3XH3MM", "public_name": "Jakub M. Dzik", "avatar": "https://pretalx.com/media/avatars/IMG_0187-US_Wiza_format_cyfrowy-900x900_px.jpg", "biography": "Since 2011 I am a Scientific Programmer in Laboratory of Neuroinformatics (Nencki Institute).\r\n\r\n#Education\r\n\r\n* MSc in Computer Science (2011; University of Wroclaw)\r\n* PhD in Neuroinformatics (2019; Nancki Institute)", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/FZTEQ9/", "id": 1444, "guid": "e921c0cb-75c4-58ca-801f-551e7a693206", "date": "2019-09-04T14:50:00+00:00", "start": "14:50", "logo": null, "duration": "01:30", "room": "Posters at 16:00", "slug": "euroscipy-2019-1444-kesi-a-kernel-based-method-for-reconstruction-of-sources-of-brain-electric-activity-in-realistic-brain-geometries", "title": "kESI - a kernel-based method for reconstruction of sources of brain electric activity in realistic brain geometries", "subtitle": "", "track": null, "type": "Poster", "language": "en", "abstract": "_kESI_ is a new Python package for kernel-based reconstruction of brain electric activity from recorded electric field potentials using realistic assumptions about brain geometry and conductivity.", "description": "Epilepsy affects around 50 million people worldwide (1).\r\n30% of epilepsy cases are drug-resistant and surgical removal of the the neural tissue generating seizures (epileptogenic) may be the only way to prevent seizures.  When removing the epileptogenic tissue it is crucial to minimize the lesioned area, because removing too much of the brain may lead to serious impairment of its function.\r\n\r\nTo identify the epileptogenic zone, neurosurgeon typically implants electrode on the cortex (ECoG) or deep in the brain (SEEG).  The measured potentials are used as indicators localizing the epileptic source.  We argue that reconstruced source of this brain activity are better predictors of areas for resection.  Here we present a method - kernel Electrical Source Imaging (kESI) - and its Python implementation which allow reconstruction of current sources taking into account the actual geometry of the patient's brain and the conductivity distribution.  This method extends the _kernel Current Source Density_ (kCSD) method (3, 4) to realistic geometries and complex conductivity models.\r\n\r\nIn the poster we present our most recent results in development of Python tools for reconstruction of brain activity and the progress report of kESI development.\r\n\r\n\r\n# Authors\r\n\r\n* Marta Kowalska,\r\n* Jakub M. Dzik,\r\n* Chaitanya Chintaluri,\r\n* Daniel K. W\u00f3jcik\r\n\r\n\r\n# Bibliography\r\n\r\n1. World Health Organization, _Epilepsy_, available at: <https://www.who.int/news-room/fact-sheets/detail/epilepsy>\r\n2. Pitts, W. H. (1952), _Investigations on synaptic transmission_, in 'Cybernetics, Trans. 9th Conf. Josiah Macy Foundation H. von Foerster', pp. 159-166.\r\n3. Potworowski, J., Jakuczun, W., \u0141\u0119ski, S. & W\u00f3jcik, D. (2012) _Kernel current source density method_. Neural Comput 24(2), 541-575.\r\n4. _Kernel Current Source Density_ <https://github.com/Neuroinflab/kCSD-python>\r\n\r\n\r\n# Acknowledgement\r\n\r\nProject funded from the Polish National Science Centre's OPUS grant (2015/17/B/ST7/04123).", "recording_license": "", "do_not_record": false, "persons": [{"guid": "1f576562-fd3c-50c5-8781-e3f3795c584f", "id": 1512, "code": "3XH3MM", "public_name": "Jakub M. Dzik", "avatar": "https://pretalx.com/media/avatars/IMG_0187-US_Wiza_format_cyfrowy-900x900_px.jpg", "biography": "Since 2011 I am a Scientific Programmer in Laboratory of Neuroinformatics (Nencki Institute).\r\n\r\n#Education\r\n\r\n* MSc in Computer Science (2011; University of Wroclaw)\r\n* PhD in Neuroinformatics (2019; Nancki Institute)", "answers": []}, {"guid": "16ced360-f474-5475-b640-77435989a3c8", "id": 1505, "code": "APW3XT", "public_name": "Marta Kowalska", "avatar": "https://pretalx.com/media/avatars/MartaKowalska.jpg", "biography": "I am a PhD student at the Laboratory of Neuroinformatics at Nencki Institute of Experimental Biology. I work with methods for current source density reconstruction in a brain tissue.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/CW97MN/", "id": 1416, "guid": "2cd3115e-8d80-56dd-8c78-7dc337a2baaf", "date": "2019-09-04T16:25:00+00:00", "start": "16:25", "logo": null, "duration": "01:30", "room": "Posters at 16:00", "slug": "euroscipy-2019-1416-from-modeler-to-programmer", "title": "From Modeler to Programmer", "subtitle": "", "track": null, "type": "Poster", "language": "en", "abstract": "The modeling system ueflow allows for customable, dynamic boundary conditions.\r\nThe modeler can write Python plugins to implement the behavior of these boundary conditions.", "description": "Boundary conditions are essential for groundwater models.\r\nThe user can specify values for these boundary conditions such as a well\r\nat a certain location with a given pumping rate for a specified duration.\r\nFor some special applications, however, the specified values may further\r\ndepend on internal model conditions.\r\nFor example, the flow rate of an infiltration well that re-infiltrates water is equal to\r\nthe pumping rate of the extraction well.\r\nThis can be useful for geothermal applications within groundwater bodies.\r\nThe newly developed model, ueflow, allows the user to implement such a scheme by writing a plugin.\r\nIn addition to just using the pumping rate as infiltration rate, the user can incorporate\r\nother constrains such as energy costs for pumping, capacities of water treatment facilities,\r\nmaintenance schedules for pumps based on pumping regimes, or other technical constrains.\r\n\r\nThe poster gives a short overview of ueflow that is based on the finite volume model framework\r\nFiPy (Guyer et al. 2009).\r\nFiPy is implemented in Python and offers multiple, high-performance solvers as well as\r\nseveral tools for generating grids and other input data.\r\n\r\n\r\n\r\nGuyer, J. E., Wheeler, D., Warren, J. A. (2009). FiPy: Partial Differential Equations with Python. Computing in Science & Engineering 11(3) pp. 6\u201415 (2009), doi:10.1109/MCSE.2009.52, http://www.ctcms.nist.gov/fipy", "recording_license": "", "do_not_record": false, "persons": [{"guid": "1a747dad-e662-5191-9389-86ade8347a86", "id": 1495, "code": "QWPUQW", "public_name": "Mike M\u00fcller", "avatar": null, "biography": "CEO of hydrocomputing.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/LWHPHN/", "id": 1441, "guid": "8c6d1a42-cd39-504f-b35e-46a9ae1d61a7", "date": "2019-09-04T18:00:00+00:00", "start": "18:00", "logo": null, "duration": "01:30", "room": "Posters at 16:00", "slug": "euroscipy-2019-1441-mne-python-a-toolkit-for-neurophysiological-data", "title": "MNE-Python, a toolkit for neurophysiological data", "subtitle": "", "track": null, "type": "Poster", "language": "en", "abstract": "A summary of the MNE-Python changes introduced during the two last releases and highlights for future directions.", "description": "MNE-Python software is an open-source Python package for exploring, visualizing, and analyzing human neurophysiological data such as MEG, EEG, sEEG, ECoG, and more. It includes modules for data input/output, preprocessing, visualization, source estimation, time-frequency analysis, connectivity analysis, machine learning, and statistics.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "1a161f4b-e5e4-5c98-a611-46b1d2aaead4", "id": 67, "code": "DQYNNA", "public_name": "Joan Massich", "avatar": null, "biography": "I am a research engineer in the Parietal team at INRIA-Saclay working on human neuro-physiological data and machine learning. Contributing to open-source projects like: MNE-Python, OpenMEEG, scikit-learn, and others.\r\n\r\nI obtained my PhD in computer vision applied to medical imaging, jointly from the Universitat de Girona and the Universite de Bourgogne France-Comte in 2013. After my PhD, and before coming to Parietal as an engineer, I was a postdoctoral fellow and teaching assistant at Universite de Bourgogne France-Comte.\r\n\r\nI enjoy following technology trends, learning new skills, and sharing them. I also care about pedagogy and education as I strongly believe that any skill can be acquired by anyone with ease if transferred properly. This is why I have been involved in organizing pedagogical activities such as underwater robotics workshops for kids and enthusiasts, First LEGO League competitions, and lately software carpentry workshops.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 4, "date": "2019-09-05", "day_start": "2019-09-05T04:00:00+00:00", "day_end": "2019-09-06T03:59:00+00:00", "rooms": {"Track 1 (Mitxelena)": [{"url": "https://pretalx.com/euroscipy-2019/talk/PRGASS/", "id": 2636, "guid": "0e9eb8f3-0ef5-5a80-a0b6-bde2d2af1de0", "date": "2019-09-05T09:15:00+00:00", "start": "09:15", "logo": null, "duration": "00:45", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-2636-hpc-and-python-intel-s-work-in-enabling-the-scientific-computing-community", "title": "HPC and Python: Intel\u2019s work in enabling the scientific computing community", "subtitle": "", "track": null, "type": "Keynote", "language": "en", "abstract": "High Performance Computing (HPC) has been a pillar of the scientific community for years, with many in the Python community contributing to its continued development.  However, one of the fundamental links in performance is the relationship between h", "description": "High Performance Computing (HPC) has been a pillar of the scientific community for years, with many in the Python community contributing to its continued development.  However, one of the fundamental links in performance is the relationship between hardware and software.  \r\n \r\nIntel is hard at work on the Intel\u00ae Distribution for Python*, producing optimized packages and upstreaming changes to open source that help take advantage of current and future Intel\u00ae Architecture, and hardware that is purpose built to target HPC, Machine Learning, and AI workloads. \r\n \r\nGetting the performance out of these workloads has been a challenging journey, one in which good lessons and learnings were made.  From Intel\u2019s Python community contributions to the new architectures Intel created for a generation of more accessible scientific compute, Intel\u2019s work continues on delivering more approachable HPC in Python.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "85b11eb2-9d1b-5c02-81dd-1b3588563990", "id": 2409, "code": "38MMCZ", "public_name": "David Liu", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/R3TJLP/", "id": 1162, "guid": "aec33a15-8ee3-55b9-9667-304bb6f5bc86", "date": "2019-09-05T10:30:00+00:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1162-inside-numpy-preparing-for-the-next-decade", "title": "Inside NumPy: preparing for the next decade", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know?", "description": "Over the past year, and for the first time since its creation, NumPy has been operating with dedicated funding. NumPy developers think it has invigorated the project and its community. But is that true, and how can we know?\r\n\r\nWe will give an overview of the actions we\u2019ve taken, both successful and unsuccessful, to improve sustainability of the NumPy project and its community. We will draw some lessons from a first year of grant-funded activity, discuss key obstacles faced, attempt to quantify what we need to operate sustainably, and present a vision for the project and how we plan to realize it.\r\nTopics we will cover include the following:\r\n- Invigorating the community - what did we do, and are we correct in our opinion that it invigorated the community?\r\n  - doing things in the open as much as possible\r\n  - creating a roadmap\r\n  - NumPy Enhancement Proposal process\r\n  - commit rights\r\n  - in-person meetings\r\n\r\n- Measuring community/project health. We will use a number of published or proposed metrics to quantify this. Which ones do we think accurately represent the state of the project?\r\n- Lessons from the first grant and introducing paid work into a previously fully volunteer-driven project.\r\n  - What is the best profile for a salaried employee?\r\n    - Social profile\r\n    - From inside or outside?\r\n  - Have we succeeded in encouragin diversity?\r\n\r\n- A vision for future sustainabity\r\n  - Models for obtaining and funneling funding", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7e011e91-8d53-51d7-a3fe-bd87d772c045", "id": 63, "code": "UZVQTV", "public_name": "Matti Picus", "avatar": null, "biography": "Matti is a core developer of [PyPy](https://www.pypy.org), contributing to the internal numpy implementation _micronumpy and to the layer that allows python c-extension modules to run on the PyPy python interpreter. He has been active in the open source community both as a contributor, teacher, and presenter at conferences. Since April 2018, he works full-time developing [NumPy](http://www.numpy.org/), employed by [BIDS](https://bids.berkeley.edu/people/matti-picus)", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/3LXMC8/", "id": 2749, "guid": "0f32c765-33d6-5844-b2ef-b033ae070854", "date": "2019-09-05T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-2749-deep-learning-without-a-phd", "title": "Deep Learning without a PhD", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "In this talk, you'll learn how to transition from traditional machine learning tools, like scikit-learn, to deep learning with Keras, TensorFlow, and JAX. No prior experience with machine learning or with deep learning required, and no need to instal", "description": "In this talk, you'll learn how to transition from traditional machine learning tools, like scikit-learn, to deep learning with Keras, TensorFlow, and JAX. No prior experience with machine learning or with deep learning required, and no need to install anything to follow along - all examples will be run on Google Colab.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2b809844-8659-5d1d-8b97-04f12204bf0c", "id": 2553, "code": "MKKMBE", "public_name": "Paige Bailey", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/QGZTDZ/", "id": 1400, "guid": "d64c4ee5-ab39-50f3-907d-abb08ec16090", "date": "2019-09-05T11:30:00+00:00", "start": "11:30", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1400-the-magic-of-neural-embeddings-with-tensorflow-2", "title": "The Magic of Neural Embeddings with TensorFlow 2", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Neural Embeddings are a powerful tool of turning categorical into numerical values. Given reasonable training data semantics present in the categories can be preserved in the numerical representation.", "description": "Symbols, words, categories etc. need to be converted into numbers before they can be processed by neural networks or used into other ML methods like clustering or outlier detection. \r\n\r\nIt is desirable to have the converted numbers represent semantics of the encoded categories. That means, numbers close to each other indicate similar semantics.\r\n\r\nIn this session you will learn what you need to train a neural network for such embeddings. I will bring a complete example including code that I will share using TensorFlow 2 functional API and the Colab service. \r\n\r\nI will also share some tricks how to stabilize embeddings when either the model changes or you get more training data.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5841dd8e-098b-5959-93f8-e234a489db3e", "id": 1298, "code": "YMNMWY", "public_name": "Oliver Zeigermann", "avatar": "https://pretalx.com/media/avatars/olli-opa_S8QF38Q.jpeg", "biography": "Oliver Zeigermann is a developer and consultant from Hamburg, Germany. He has written several books and has recently published the \"Deep Learning Crash Course\" with Manning. More on http://zeigermann.eu/", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/SKNH3X/", "id": 2541, "guid": "c3ab7738-962e-5fca-86b9-0b6ac6f5e6aa", "date": "2019-09-05T12:00:00+00:00", "start": "12:00", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-2541-high-quality-video-experience-using-deep-neural-networks", "title": "High quality video experience using deep neural networks", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Video compression algorithms used to stream videos are lossy, and when compression rates increase they result in strong degradation of visual quality. We show how deep neural networks can eliminate compression artefacts and restore lost details.", "description": "Video compression algorithms result in a reduction of image quality, because of their lossy approach to reduce the required bandwidth. This affects commercial streaming services such as Netflix, or Amazon Prime Video, but affects also video conferencing and video surveillance systems. In all these cases it is possible to improve the video quality, both for human view and for automatic video analysis, without changing the compression pipeline, through a post-processing that eliminates the visual artefacts created by the compression algorithms. In this presentation we show how deep convolutional neural networks implemented in Python using TensorFlow, Scikit-Learn and Scipy can be used to reduce compression artefacts and reconstruct missing high frequency details that were eliminated by the compression algorithm.\r\n\r\nIn particular, we follow an approach based on Generative Adversarial Networks, that in the scientific literature have obtained extremely high quality results in image enhancement tasks. However, to obtain these results, typically, large generators are employed, resulting in high computational costs and processing time, and thus the method can be implemented using GPUs usually available only on desktop machines.\r\nIn this presentation we show also an architecture that can be used to reduce the computational cost and that can be implemented also on mobile devices. \r\n\r\nA possible application is to improve video conferencing, or live streaming. Since in these cases there is no original uncompressed video stream available, we report results using no-reference video quality metric showing high naturalness and quality even for efficient networks.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "dfc23d57-805e-529a-9655-5c2524074aee", "id": 2329, "code": "PFHTQ8", "public_name": "Marco Bertini", "avatar": null, "biography": null, "answers": []}, {"guid": "c619215b-576c-5fd8-8be3-127c70d83904", "id": 2417, "code": "KWBFEL", "public_name": "Tiberio Uricchio", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/GLSVQA/", "id": 2638, "guid": "2e44d67f-2623-5037-9aa4-d6fcb2f9ca2a", "date": "2019-09-05T14:00:00+00:00", "start": "14:00", "logo": null, "duration": "00:45", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-2638-in-the-shadow-of-the-black-hole", "title": "In the Shadow of the Black Hole", "subtitle": "", "track": null, "type": "Keynote", "language": "en", "abstract": "I will walk through the entire Event Horizon Telescope experiment and the global effort that led to the first-ever direct image of a black hole revealed to the world on April 10th of this year.", "description": "The Event Horizon Telescope (EHT) is a global network of millimeter-wavelength radio telescopes that uses Very Long Baseline Interferometry (VLBI) to synthesize the resolution of a single, Earth-sized telescope. In April 2017 the EHT observed the black hole at the center of the giant galaxy M87. Turning these observations into an image required the development of new software tools across the global EHT collaboration, and relied on a wealth of open-source software made available to the broader scientific community. In this talk, I will walk through the entire EHT experiment from the individual telescopes that record the data through the calibration, imaging, and interpretation of the observations that lead to the first-ever direct image of a black hole released to the world on April 10th of this year.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7eb930c0-3fb8-5bed-a0a7-f6af89cde438", "id": 2410, "code": "3GR9NW", "public_name": "Sara Issaoun", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/SKAH3U/", "id": 1132, "guid": "0b9d7c0a-a6a4-5656-a5db-d1b1a4159462", "date": "2019-09-05T14:45:00+00:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1132-a-practical-guide-towards-algorithmic-bias-and-explainability-in-machine-learning", "title": "A practical guide towards algorithmic bias and explainability in machine learning", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents. In this talk we demystify machine learning bias through a hands-on example. We'll be tasked to automate the loan approval process for a company", "description": "Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents that have been covered by the media. It is certainly a challenging topic, as it could even be said that the concept of societal bias is inherently biased in itself depending on an individual\u2019s (or group\u2019s) perspective. In this talk we avoid re-inventing the wheel, instead we use traditional methods to simplify this issue so it can be tackled from a practical perspective.\r\n\r\n# Content\r\nIn this talk we will cover the high level definitions of bias in machine learning to remove ambiguity, and we will demistify it through a hands on example. Our objective will be to automate the loan approval process for a company using machine learning. This will allow us to go through this challenge step by step, using key tools and techniques from latest research that will allow us to assess and mitigate undesired bias in our machine learning models.\r\n\r\n# Definitions \r\nWe will begin by providing a high level definition of undesired bias as two constituent parts: \u201ca-priori societal bias\u201d and \u201ca-posteriori statistical bias\u201d. We will provide tangible examples of how undesired bias is introduced in each step. This initial section will introduce very interesting research findings in this topic. Spolier alert: We will take a pragmatic approach, showing how any non-trivial system will always have an inherent bias, so the objective is not to remove bias, but to make sure 1) you can get as close as possible to your objectives, and 2) you can make sure your objectives are as close as possible to the \u201cideal solution\u201d.\r\n\r\n# Process\r\nIn this talk we introduce a pragmatic process to assess bias in machine learning models through three key steps: 1) Data analysis, 2) Inference result analysis, and 3) Production metrics analysis. For each of these three steps we will walk through a real life example. We will be tasked with the automation of a loan approval process. We will show how some bias may affect our results in a negative way, as well as how we can use various techniques to ensure we perform a reasonable analysis. Our objective is not to show how to completely remove bias from a machine learning model, but instead what are the tools and techniques available, as well as the key touch-points & metrics to ensure the right domain experts are involved.\r\n\r\n# Topics covered\r\nWe will cover fundamental topics in data science such as feature importance analysis, class imbalance assessment, model evaluation metrics, partial dependence, feature correlation, etc. More importantly, we will cover how these fundamentals can interact at different touch-points with the right domain experts to ensure undesired bias is identified and documented. All will be covered with a hands on example through a practical jupyter notebook experience.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "299b0cfe-84f3-54bd-9886-79b9da644565", "id": 37, "code": "HEZDEY", "public_name": "Alejandro Saucedo", "avatar": "https://pretalx.com/media/avatars/aletechuk-high-res_FINwk4s.png", "biography": "Alejandro is currently leading the research and development at the Institute for Ethical AI & Machine Learning as their Chief Scientist. With over 10 years of software development experience Alejandro has held technical leadership positions across hyper-growth scale-ups and tech giants including Eigen Tchnologies, Bloomberg LP and Hack Partners. Alejandro has a strong track record building departments of machine learning engineers from scratch, and leading the delivery of large-scale machine learning system across the financial, insurance, legal, transport, manufacturing and construction sectors (in Europe, US and Latin America). Alejandro has given multiple talks at international scientific, technical and business conferences, and has chaired panels & events with government ministers, senior executives and domain experts.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/HBHY9Q/", "id": 1247, "guid": "19a81e72-81db-5cb1-a071-f9600b114c3d", "date": "2019-09-05T15:15:00+00:00", "start": "15:15", "logo": null, "duration": "00:30", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1247-tracking-migration-flows-with-geolocated-twitter-data", "title": "Tracking migration flows with geolocated Twitter data", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Detect migration flows worldwide using geolocated Twitter data: routes, settlement areas, mobility to more than one country, spatial integration in cities, etc.", "description": "Traditionally, migration and refugee flows information is obtained from surveys and border control operatives. Here we propose a method to detect migration flows worldwide using geolocated Twitter data. In particular and as a practical example, we focus on the current migratory crisis in Venezuela. We study if the flows calculated are quantitatively reliable when compared with official numbers at the country level. Our method is versatile and can be used to study different features of migration such as the routes, settlement areas, mobility to more than one country, spatial integration in cities, etc.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8cbf8682-574b-5d09-83b3-46c5e43e0b0b", "id": 1304, "code": "ARGGAY", "public_name": "Ant\u00f2nia Tugores", "avatar": "https://pretalx.com/media/avatars/949b85a364f47ad90586cd0ce709e2ac_kThJggH.jpg", "biography": "Mathematician by formation, she spent most of her life developing software. She started collaborating with the creation of an open source game engine and framework in Tragnarion Studios. Later on, she moved to GridSystems and got involved in the development of an open source grid middleware. Some years later, she started working at IFISC (CSIC-UIB), a research institute. First, she was working on a grid project, but she got interested in data mining and now she is a data specialist working on human mobility and social sciences.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/UQHFD8/", "id": 1343, "guid": "570abacc-d1b5-5917-87f4-3a0d74f7fba0", "date": "2019-09-05T15:45:00+00:00", "start": "15:45", "logo": null, "duration": "00:15", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1343-deep-learning-for-understanding-human-multi-modal-behavior", "title": "Deep Learning for Understanding Human Multi-modal Behavior", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "Multi-modal sources of information are the next big step for AI. In this talk, I will present the use of deep learning techniques for automated multi-modal applications and some open benchmarks.", "description": "Multimedia automatic learning has drawn attention from companies and governments for a significant number of applications for automated recommendations, classification, and human brain understatement. In recent years, and an increased amount of research has explored using deep neural networks for multimedia related tasks. \r\nSome government security and surveillance applications are automated detections of illegal and violent behaviors, child pornography and traffic infractions. Companies worldwide are looking for content-based recommendation systems that can personalize clients consumption and interactions by understanding the human perception of memorability, interestingness, attractiveness, aesthetics. For these fields like event detection, multimedia affect and perceptual analysis are turning towards Artificial Neural Networks. In this talk, I will present the theory behind multi-modal fusion using deep learning and some open challenges and their state-of-the-art.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "020ff393-60dc-5c87-870d-de99e2a4148c", "id": 1451, "code": "PV7GD7", "public_name": "Ricardo Manh\u00e3es Savii", "avatar": "https://pretalx.com/media/avatars/4f74e02cf9e59e1ae1a8f793fa0c61d4_tafpTZk.jpg", "biography": "I am a former hotel manager; nowadays I am a student pursuing degrees as Computer Engineer B.S. and Computer Science M.S. I am a researcher and developer working in the fashion industry with Dafiti Group and Udacity\u2019s reviewer for Machine Learning and Deep Learning Nanodegrees.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/VKDH9K/", "id": 1451, "guid": "d9886f8b-395c-5429-8fb9-772561ba94d8", "date": "2019-09-05T16:30:00+00:00", "start": "16:30", "logo": null, "duration": "00:15", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1451-how-to-process-hyperspectral-data-from-a-prototype-imager-using-python", "title": "How to process hyperspectral data from a prototype imager using Python", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "We present a collection of software for handling hyperspectral data acquisition and preprocessing fully in Python utilising Xarray for metadata preservation from start to finish.", "description": "Our lab specializes in hyperspectral imaging using a spectral imager that combines tunable filters with colour sensors. Compared to simpler, more established imaging systems, this results in some unique challenges for the data processing. Especially, many of the original imaging parameters need to be preserved an d joined with calibration-derived values to actually compute radiance values from the raw sensor data since they are not automatically handled by the hardware. Handling this metadata with the resulting hyperspectral images results in combined datasets of large 3-dimensional datacube, and multiple smaller 2D and 1D arrays with linked dimensions.\r\n\r\nWe have built our solution to this problem utilizing Xarray for handling the multiple arrays of data as well as the existing Dask integration for providing easy parallelization for the required preprocessing. Xarray also provides us many other advantages, such as:\r\n\r\n * Exploration of very complex multi-dimensional datasets (especially when utilizing holoviews)\r\n * Interoperability with the scikit ecosystem\r\n * Serialization to NetCDF preserving all the data in a single file\r\n\r\nHowever, our extensive and somewhat non-conventional use of Xarray does also bring out it's shortcomings when trying to develop such a library as ours, such as indexing issues with multiple possible overlapping coordinates and performance issues with complex datasets.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "325a00b4-7c1a-5591-86ba-ce62458d3f40", "id": 1515, "code": "KQCRDQ", "public_name": "Matti Eskelinen", "avatar": null, "biography": "PhD student working on computational methods for hyperspectral imaging.\r\n\r\nLogio @ FreeNode, IRCNet etc.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/HXH3GN/", "id": 1438, "guid": "867454d2-f92a-5875-8173-adfb016d370f", "date": "2019-09-05T16:45:00+00:00", "start": "16:45", "logo": null, "duration": "00:15", "room": "Track 1 (Mitxelena)", "slug": "euroscipy-2019-1438-enhancing-re-designing-the-qgis-user-interface-a-deep-dive", "title": "Enhancing & re-designing the QGIS user interface \u2013 a deep dive", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "How can one of the largest code bases in open source Geographical Information Science \u2013 QGIS \u2013 be enhanced and re-designed? Through the powers of Python plugins. This talk demonstrates concepts on how to make QGIS more user-friendly.", "description": "Having been around for two decades, QGIS clearly is an organically grown project. It has primarily been fulfilling the various special needs of its developers. From an outsider's perspective, it is an amazingly rich patchwork of features. However, some are deeply hidden in numerous layers of user interface elements, requiring intense training for getting used to. Others are only accessibly through APIs, requiring not only training but also programming skills.\r\n\r\nBeing confronted with QGIS as professional users on a regular basis, we thought about what would make working with QGIS more attractive. What if QGIS has a pleasant, coherent theme, including not only colors but also icons? What if QGIS had the ability to store workbench configurations? What if QGIS had dedicated interface configurations for specific workflows? What if much more of the API's functionality was accessible through the GUI in a well-organized way? How could QGIS work in a useful manner with ribbons? How could the incredible amount of dialogs be tamed into tabs? \r\n\r\nWe demonstrate (live) a series of user interface experiments \u2013 all of which are or will be [available online](https://github.com/qgist) as Python plugins. \r\n\r\nIn this context, the current state of play with respect to Python and QGIS is explained in detail. The way QGIS is typically being distributed puts quite a few unusual limitations on Python plugin code. The case is made that some of those limitations are simply out of date and must be overcome, which may require help from the broader (scientific) Python community. \r\n\r\nWe seek a conversation with the audience.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8c58154d-62fd-59a6-907b-7ab119532ec0", "id": 535, "code": "JNFFHN", "public_name": "Sebastian M. Ernst", "avatar": "https://pretalx.com/media/avatars/favicon_kreis_cBuncka.jpg", "biography": "I am a free [scientist without specialization](http://www.pleiszenburg.de/?pda#retrospect). I have more than one and a half decades of experience in various scientific disciplines, related data analysis & computing from embedded systems to super computers as well as the development of instrumentation (hardware & sensors). A lot of my work has evolved around geophysics, [aero-] space engineering and (many) related disciplines. It can be broadly described as data science with strong ties to the mentioned domains. Python has been a critical part of my work for more than twelve years.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track 2 (Baroja)": [{"url": "https://pretalx.com/euroscipy-2019/talk/D7WAFW/", "id": 1286, "guid": "db2e6e6c-ff39-53cf-a455-24d7739cf99c", "date": "2019-09-05T10:30:00+00:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1286-visual-diagnostics-at-scale", "title": "Visual Diagnostics at Scale", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "Machine learning is a search for the best combination of features, model, and hyperparameters. But as data grow, so does the search space! Fortunately, visual diagnostics can focus our search and allow us to steer modeling purposefully, and at scale.", "description": "Even with a modestly-sized dataset, the hunt for the most effective machine learning model is *hard*. Arriving at the optimal combination of features, algorithm, and hyperparameters frequently requires significant experimentation and iteration. This leads some of us to stay inside algorithmic comfort zones, some to trail off on random walks, and others to resort to automated processes like gridsearch. But whatever path we take, we are often left in doubt about whether our final solution really is the optimal one. And as our datasets grow in size and dimension, so too does this ambiguity.\r\n\r\nFortunately, many of us have developed strategies for steering model search. Open source libraries like [seaborn](https://seaborn.pydata.org/), [pandas](https://pandas.pydata.org/) and [yellowbrick](https://www.scikit-yb.org/en/latest/) can help make machine learning more informed with visual diagnostic tools like histograms, correlation matrices, parallel coordinates, manifold embeddings, validation and learning curves, residuals plots, and classification heatmaps. These tools enable us to tune our models with visceral cues that allow us to be more strategic in our choices. Visualizing feature transformations, algorithmic behavior, cross-validation methods, and model performance allows us a peek into the multi-dimensional realm in which our models operate. \r\n\r\nHowever, large, high-dimensional datasets can prove particularly difficult to explore. Not only do the majority of people struggle to visualize anything beyond two- or three-dimensional space, many of our favorite open source Python tools are not designed to be performant with arbitrarily big data. So how well *do* our favorite visualization techniques hold up to large, complex datasets? \r\n\r\nIn this talk, we'll consider a suite of visual diagnostics &mdash; some familiar and some new &mdash; and explore their strengths and weaknesses with several publicly available datasets of varying size. Which suffer most from the curse of dimensionality in face of increasingly big data? What are the workarounds (e.g. sampling, brushing, filtering, etc.) and when should we use them? And most importantly, how can we continue to steer the machine learning process &mdash; not only purposefully but at scale?", "recording_license": "", "do_not_record": false, "persons": [{"guid": "047c1526-e7c3-5d44-a773-0b3c56d04d38", "id": 1407, "code": "CNDFND", "public_name": "Dr. Rebecca Bilbro", "avatar": "https://pretalx.com/media/avatars/Rebecca_Bilbro1_small_86sNPj9.png", "biography": "Dr. Rebecca Bilbro is a data scientist, Python and Go programmer, teacher, speaker, and author in Washington, DC. She specializes in visual diagnostics for machine learning, from feature analysis to model selection and hyperparameter tuning, and has conducted research on natural language processing, semantic network extraction, entity resolution, and high dimensional information visualization. An active contributor to the open source software community, Rebecca enjoys collaborating with other developers on inclusive projects like Scikit-Yellowbrick - a pure Python visualization package for machine learning that extends scikit-learn and Matplotlib to support model selection and diagnostics. In her spare time, she can often be found either out-of-doors riding bicycles with her family or inside practicing the ukulele. Rebecca earned her doctorate from the University of Illinois, Urbana-Champaign, where her research centered on communication and visualization in engineering.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/H3NTLX/", "id": 1536, "guid": "44fad5a7-8fa3-5113-a61d-ff240fbc8b74", "date": "2019-09-05T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1536-histogram-based-gradient-boosting-in-scikit-learn-0-21", "title": "Histogram-based Gradient Boosting in scikit-learn 0.21", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "In this presentation we will present some recently introduced features of the scikit-learn Machine Learning library with a particular emphasis on the new implementation of Gradient Boosted Trees.", "description": "scikit-learn 0.21 was recently released and this presentation will give an overview its main new features in general and present the new implementation of Gradient Boosted Trees.\r\n\r\nGradient Boosted Trees (also known as Gradient Boosting Machines) are very competitive supervised machine learning models especially on tabular data.\r\n\r\nScikit-learn offered a traditional implementation of this family of methods for many years. However its computational performance was no longer competitive and was dramatically dominated by specialized state of the art libraries such as XGBoost and LightGBM. The new implementation in version 0.21 uses histograms of binned features to evaluate the tree node spit candidates. This implementation can efficiently leverage multi-core CPUs and is competitive with XGBoost and LightGBM.\r\n\r\nWe will also introduce pygbm, a numba-based implementation of gradient boosted trees that was used as prototype for the scikit-learn implementation and compare the numba vs cython developer experience.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "91114ee9-3e12-54e9-8119-9813674ba951", "id": 1530, "code": "NEUMLP", "public_name": "Olivier Grisel", "avatar": "https://pretalx.com/media/avatars/ogrisel_portrait_870x550_PMry4Oq.jpg", "biography": "Olivier is a Software Engineer at Inria working on scikit-learn and related projects of the Python Data ecosystem.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/EQNGSQ/", "id": 1380, "guid": "ae978bb1-96bc-5573-aded-1b044d90ff49", "date": "2019-09-05T11:30:00+00:00", "start": "11:30", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1380-recent-advances-in-python-parallel-computing", "title": "Recent advances in python parallel computing", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "*Modern hardware is multi-core*. It is crucial for Python to provide\r\nefficient parallelism. This talk exposes the current state and advances\r\nin Python parallelism, in order to help practitioners and developers take\r\nbetter decisions on this matter.", "description": "# Parallel computing in Python: Current state and recent advances\r\n\r\n*Modern hardware is multi-core*. It is crucial for Python to provide\r\nhigh-performance parallelism. This talk will expose to both data-scientists and\r\nlibrary developers the current state of affairs and the recent advances for\r\nparallel computing with Python. The goal is to help practitioners and\r\ndevelopers to make better decisions on this matter.\r\n\r\nI will first cover how Python can interface with parallelism, from leveraging\r\nexternal parallelism of C-extensions \u2013especially the BLAS family\u2013 to Python's\r\nmultiprocessing and multithreading API. I will touch upon use cases, e.g single\r\nvs multi machine, as well as and pros and cons of the various solutions for\r\neach use case. Most of these considerations will be backed by benchmarks from\r\nthe [scikit-learn](https://scikit-learn.org/stable/) machine\r\nlearning library.\r\n\r\nFrom these low-level interfaces emerged higher-level parallel processing\r\nlibraries, such as concurrent.futures,\r\n[joblib](https://joblib.readthedocs.io/en/latest/) and\r\n[loky](https://loky.readthedocs.io/en/latest/) (used by\r\n[dask](https://dask.org/) and [scikit-learn](https://dask.org/)) These\r\nlibraries make it easy for Python programmers to use safe and reliable\r\nparallelism in their code. They can even work in more exotic situations, such\r\nas interactive sessions, in which Python\u2019s native multiprocessing support tends\r\nto fail. I will describe their purpose as well as the canonical use-cases they\r\naddress.\r\n\r\nThe last part of this talk will focus on the most recent advances in the Python\r\nstandard library, addressing one of the principal performance bottlenecks of\r\nmulti-core/multi-machine processing, which is data communication. We will\r\npresent a [new\r\nAPI](https://docs.python.org/3.8/library/multiprocessing.shared_memory.html)\r\nfor shared-memory management between different Python processes, and\r\nperformance improvements for the serialization of large Python objects ([PEP\r\n574](https://www.python.org/dev/peps/pep-0574/), [pickle\r\nextensions](https://github.com/cloudpipe/cloudpickle)). These performance\r\nimprovements will be leveraged by distributed data science frameworks such as\r\ndask, [ray](https://ray.readthedocs.io/en/latest/) and\r\n[pyspark](https://spark.apache.org/docs/latest/api/python/index.html).", "recording_license": "", "do_not_record": false, "persons": [{"guid": "db858a50-f255-51a7-9889-f331952d4758", "id": 1475, "code": "TKZ7QY", "public_name": "Pierre Glaser", "avatar": "https://pretalx.com/media/avatars/XzAlLLPy_400x400.jpg", "biography": "Hi! My name is Pierre. I currently work as a research engineer in the Parietal Team at a French research institute called INRIA. You may know my team as we created many machine-learning and scientific computing libraries among which scikit-learn, joblib, nilearn and others. I am currently improving Python's multiprocessing tools across the whole scientific computing ecosystem. I notably contributed to scikit-learn, joblib, numpy, cpython, cloudpickle and many other libraries. You can follow me on twitter (https://twitter.com/PierreGlaser) and github (https://github.com/pierreglaser).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/QU88B8/", "id": 2675, "guid": "aaea6080-0532-5df8-9f72-119fa787c960", "date": "2019-09-05T12:00:00+00:00", "start": "12:00", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-2675-data-sciences-in-a-polyglot-world-with-xtensor-and-xframe", "title": "Data sciences in a polyglot world with xtensor and xframe", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "The main scientific computing programming languages have different models the main data structures of data science such as dataframes and n-d arrays. In this talk, we present our approach to reconcile the data science tooling in this polyglot world.", "description": "In this presentation, we demonstrate how xtensor can be used to implement numerical methods very efficiently in C++, with a high-level numpy-style API, and expose it to Python, Julia, and R for free. The resulting native extension operates in-place on Python, Julia, and R infrastructures without overhead.\r\n\r\nWe then dive into the xframe package, a dataframe project for the C++ programming language, exposing an API very similar to Python's xarray.\r\n\r\nFeatures of xtensor and xframe will be demonstrated using the xeus-cling jupyter kernel, enabling interactive use of the C++ programming language in the notebook.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "766dbe51-f9ff-5b16-91f8-eac165988618", "id": 2425, "code": "YJNSWT", "public_name": "Sylvain Corlay", "avatar": null, "biography": null, "answers": []}, {"guid": "348256f4-5a67-5ef2-9228-07cece452a45", "id": 2426, "code": "DHTCZF", "public_name": "Wolf Vollprecht", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/EDNVGJ/", "id": 1418, "guid": "b6a4281a-6988-5b76-82c5-f1b9811d1783", "date": "2019-09-05T14:45:00+00:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1418-understanding-numba", "title": "Understanding Numba", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "In this talk I will take you on a whirlwind tour of Numba and you will be quipped with a mental model of how Numba works and what it is good at. At the end, you will be able to decide if Numba could be useful for you.", "description": "In this talk I will take you on a whirlwind tour of Numba, the just-in-time,\r\ntype-specializing, function compiler for accelerating numerically-focused\r\nPython. Numba can compile the computationally intensive functions of your\r\nnumerical programs and libraries from Python/NumPy to highly optimized binary\r\ncode. It does this by inferring the data types used inside these functions and\r\nuses that information to generate code that is specific to those data types\r\nand specialised for your target hardware.  On top of that, it does all of this\r\non-the-fly---or just-in-time---as your program runs. This significantly reduces\r\nthe potential complexity that traditionally comes with pre-compiling and\r\nshipping numerical code for a variety of operating systems, Python versions and\r\nhardware architectures. All you need in principle, is to `conda install numba`\r\nand decorate your compute intensive functions with `@nuba.jit`!\r\n\r\nThis talk will equip you with a mental model of how Numba is implemented and\r\nhow it works at the algorithmic level. You will gain a  deeper understanding of\r\nthe types of use-cases where Numba excels and why. Also, you will understand\r\nthe limitations and caveats that exist within Numba, including any potential\r\nideas and strategies that might alleviate these. At the end of the talk you\r\nwill be in a good position to decide if Numba is for you and you will have\r\nlearnt about the concrete steps you need to take to include it as a dependency\r\nin your program or library.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2e824709-7612-50cb-9d67-5bf963037d0f", "id": 1499, "code": "LFHYZA", "public_name": "Valentin Haenel", "avatar": "https://pretalx.com/media/avatars/1570bec4897bb18b702105182f2951b5_E7FScDr.jpg", "biography": "Valentin is a long-time \"Python for Data\" user and developer who still\r\nremembers hearing Travis Oliphant's keynote at the EuroScipy 2007. This was\r\nduring a time where he first became aware of the nascent scientific Python\r\nstack. He started using Python for simple modeling of spiking neurons and\r\nevaluation of data from perception experiments during his Masters degree in\r\ncomputational neuroscience. Since then he has been active as a contributor\r\nacross more than 75 open source projects. For example, within the Blosc\r\necosystem where he still maintains and contributes to Python-Blosc and\r\nBloscpack.  Furthermore, he has acquired significant experience as a Git\r\ntrainer and consultant and had published the first German language book about\r\nthe topic in 2011. In 2014 and 2015 he helped kickstart the PyData Berlin\r\ncommunity alongside a few other volunteers and co-organized the first two\r\neditions of the PyData Berlin Conference. He now works for Anaconda as a\r\nsoftware engineer / open source developer on the Numba project.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/ULT3M7/", "id": 1825, "guid": "00382bd2-c99d-531b-b064-14f6d08073b6", "date": "2019-09-05T15:15:00+00:00", "start": "15:15", "logo": null, "duration": "00:30", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1825-pypy-meets-scipy", "title": "PyPy meets SciPy", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "PyPy, the fast and compliant alternative implementation of Python, is now compatible with the SciPy ecosystem. We'll explore how scientific programmers can use it.", "description": "PyPy is a fast and compliant implementation of Python. In other words, it's an interpreter for the Python language that can act as a full replacement for the reference interpreter, CPython. It's optimised to enable efficient just-in-time compilation of Python code to machine code, and has releases matching versions 2.7, and 3.6. It now also supports the main pillars of the scientific ecosystem (numpy, Cython, scipy, pandas, ...) thanks to its emulation layer for the C API of CPython.\r\n\r\nPerformance is a major concern for Python programmers. When using CPython, this leads to splitting out the performance-sensitive parts of the computation and rewriting them in a faster, but less convenient, language such as C or Cython. With PyPy, there is no need to choose between clear, Pythonic code and good performance. This talk aims to convince the audience that PyPy should be part of every scientific programmer's toolbox.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6f73a153-b831-5158-bbfc-fe93d9fa3e96", "id": 186, "code": "T8FUFL", "public_name": "Ronan Lamy", "avatar": "https://pretalx.com/media/avatars/a8f4ba9cf0d41e36ee03ae2c487b170c_Zclz8II.jpg", "biography": "I'm an open-source developer and consultant. I've been working on PyPy since 2012, with particular focus on the RPython annotator, Python 3 features, and cpyext.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/SUFHZT/", "id": 1413, "guid": "033398cc-bb11-5b7f-9275-c0de33a1553a", "date": "2019-09-05T15:45:00+00:00", "start": "15:45", "logo": null, "duration": "00:15", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1413-high-performance-machine-learning-with-dislib", "title": "High performance machine learning with dislib", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "This talk will present dislib, a distributed machine learning library built on top of PyCOMPSs programming model. One of the main focuses of dislib is solving large-scale scientific problems on high performance computing clusters.", "description": "PyCOMPSs is a distributed programming model and runtime for Python. PyCOMPSs' main goal is to make distributed computing accessible to non-expert developers by providing a simple programming model, and a runtime that automates many aspects of the parallel execution. In addition to this, PyCOMPSs is infrastructure agnostic, and can run on top of a wide range of platforms, from HPC clusters to clouds, and from GPUs to FPGAs.\r\n\r\nThis talk will present dislib, a distributed machine learning library built on top of PyCOMPSs. Inspired by scikit-learn, dislib programming interface is based on the concept of *estimators*. This provides a clean and easy-to-use API that highly increases the productivity of building large-scale machine learning pipelines. Thanks to PyCOMPSs, dislib can run in multiple distributed platforms without changes in the source code, and can handle up to billions of input samples using thousands of CPU cores. This makes dislib a perfect tool for scientists (and other users) that are not machine learning experts, but that still want to extract useful knowledge from extremely large data sets.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "ab00708c-fdac-5a85-819f-9b2bd1bfb9d8", "id": 1491, "code": "VLTEFG", "public_name": "Javier \u00c1lvarez", "avatar": null, "biography": "Javier \u00c1lvarez is a researcher at the Workflows and Distributed Computing group of the Barcelona Supercomputing Center. His research interests include parallel programming models for distributed infrastructures and large-scale distributed machine learning. Javier received his Ph.D. in computer science from the University of Adelaide in 2018.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/WCXYRQ/", "id": 1824, "guid": "64be42c6-ff6f-5ea8-8a1e-81da68643c43", "date": "2019-09-05T16:30:00+00:00", "start": "16:30", "logo": null, "duration": "00:15", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1824-can-we-make-python-fast-without-sacrificing-readability-numba-for-astrodynamics", "title": "Can we make Python fast without sacrificing readability? numba for Astrodynamics", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "There are several solutions to make Python faster, and choosing one is not easy: we would want it to be fast without sacrificing its readability and high-level nature. We tried to do it for an Astrodynamics library using numba. How did it turn out?", "description": "We are lucky there are very diverse solutions to make Python faster that have been in use for a while: from wrapping compiled languages (NumPy), to altering the Python syntax to make it more suitable to compilers (Cython), to using a subset of it which can in turn be accelerated (numba). However, each of these options has a tradeoff, and there is no silver bullet.\r\n\r\npoliastro is a library for Astrodynamics written in pure Python. All its core algorithms are accelerated with numba, which allows poliastro to be decently fast while having minimal code complexity and avoid using other languages. \r\n\r\nHowever, even though numba is quite mature as a library and most of the Python syntax and NumPy functions are supported, there are still some limitations that affect its usage. In particular, we strive to offer a high-level API with support for physical units and reusable functions which can be passed as arguments, which sometimes require using complex objects or introspective Python behavior which is not available.\r\n\r\nIn this talk we will discuss the strategies and workarounds we have developed to overcome these problems, and what advanced numba features we can leverage.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b55ad0f4-e31d-5970-9414-45c1514cd7a3", "id": 1791, "code": "PCVGYT", "public_name": "Juan Luis Cano Rodr\u00edguez", "avatar": "https://pretalx.com/media/avatars/oscw19_centered-cropped-small_FzT601M.jpg", "biography": "Juan Luis Cano is an Aerospace Engineer based in Barcelona, Spain working as a Software Engineer at Satellogic, where he develops Python tools for geospatial data processing and scheduling algorithms for satellite operations. He also freelances for R&D Aerospace companies and Business schools, and in his spare time he contributes to open source, chairs Python Espa\u00f1a non-profit, rides his bicycle, listens to '70s British Hard Rock, and pursues impossible dreams.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/7A3ZQF/", "id": 1809, "guid": "7146d217-7b68-5d03-85fb-7463f28fd09e", "date": "2019-09-05T16:45:00+00:00", "start": "16:45", "logo": null, "duration": "00:15", "room": "Track 2 (Baroja)", "slug": "euroscipy-2019-1809-psydac-a-parallel-finite-element-solver-with-automatic-code-generation", "title": "PSYDAC: a parallel finite element solver with automatic code generation", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "PSYDAC takes input from SymPDE (a SymPy extension for partial differential equations), applies a finite-element discretization, generates MPI-parallel code, and accelerates it with Numba, Pythran, or Pyccel. We present design, usage and performance.", "description": "PSYDAC is a Python 3 library for the solution of partial differential equations. Its current focus is on isogeometric analysis using B-spline finite elements, but extensions to other methodologies are under consideration. In order to use PSYDAC, the user defines geometry and model equations in an abstract form using SymPDE, an extension of Sympy that provides the mathematical expressions and checks their semantic validity. Once a finite element discretization has been chosen, PSYDAC maps the abstract concepts into concrete objects, the basic building blocks being MPI-distributed vectors and matrices. Python code is generated for all the computationally intensive operations (matrix and vector assembly, matrix-vector products, etc.), and it is accelerated using either Numba, Pythran, or Pyccel. We present the library design, the user interface, and the performance results.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "22ca0d2c-2920-5d4a-bf2d-bc4cb0b27c20", "id": 1782, "code": "PRDZK8", "public_name": "Yaman G\u00fc\u00e7l\u00fc", "avatar": "https://pretalx.com/media/avatars/cv_photo_0.png", "biography": "I am a post-doctoral researcher at the Max Planck Institute for Plasma Physics (IPP) in Garching, Germany, since 2014. I work in the division of Numerical Methods for Plasma Physics, where my research has focused on semi-Lagrangian methods for the gyrokinetic description of strongly magnetized plasmas.\r\n\r\nI graduated in Aerospace Engineering at the University of Padova (Italy) in 2007, where I also obtained a Ph.D. in Science, Technologies and Measurements for Space in 2011. From 2011 to 2014, year when I moved to Germany, I was a post-doc at the Department of Mathematics of the Michigan State University (USA).\r\n\r\nIn my career Python has always been an invaluable language for fast prototyping of numerical algorithms, as well as for data visualization. The final code was usually written in C, C++, or Fortran.\r\n\r\nLately I have been progressively more interested in using Python also for high-performance scientific computing. Together with my colleagues at IPP, in the last few years I investigated how to ease the transition from prototype to production in academic research codes. PSYDAC, our Python parallel environment for spline finite elements, is the product of our recent efforts.", "answers": []}], "links": [], "attachments": [], "answers": []}], "Track 3 (Oteiza)": [{"url": "https://pretalx.com/euroscipy-2019/talk/TXQW9H/", "id": 1131, "guid": "ef745d81-5ca5-583f-9f67-9d4e2f3421a7", "date": "2019-09-05T11:00:00+00:00", "start": "11:00", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1131-exceeding-classical-probabilistic-data-structures-in-data-intensive-applications", "title": "Exceeding Classical: Probabilistic Data Structures in Data Intensive Applications", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "We interact with an increasing amount of data but classical data structures and algorithms can't fit our requirements anymore. This talk is to present the probabilistic algorithms and data structures and describe the main areas of their applications.", "description": "*Nowadays, research in every scientific domain, from medicine to astronomy, is impossible without processing huge amounts of data to check hypotheses, find new relations, and make discoveries. However, the traditional technologies which include data structures and algorithms, become ineffective or require too many resources. This creates a demand for various optimization techniques, new data processing paradigms, and, finally, appropriate algorithms.*\r\n\r\nThe presentation is dedicated to *probabilistic data structures*, that is a common name for advanced data structures based mostly on different hashing techniques. Unlike classical ones, these provide approximated answers but with reliable ways to estimate possible errors and uncertainty. They are designed for extremely low memory requirements, constant query time, and scaling, the factors that are essential for data applications. It is hard to imagine a branch that requires learning from data, where they cannot be applicable.\r\n\r\nThey are not necessarily new. Probably, everybody knows about the Bloom filter data structure, designed in the 70s, it efficiently solves the problem of performing membership queries (a task to decide whether some element belongs to the dataset or not) in a constant time without requirements to store all elements. This is an example of a probabilistic data structure, but there are much more that have been designed for various tasks in many domains.\r\n\r\nIn this talk, I explain **the five most important problems in data processing** that occurred in different domains but **can be efficiently solved with probabilistic data structures and algorithms**. We cover the *membership querying*, *counting* of unique elements, *frequency* and *rank* estimation in data streams, and *similarity*. \r\n\r\nEverybody interested in such a topic is welcome to participate in contributing a free and open-source Python (Cython) library called [PDSA](https://github.com/gakhov/pdsa).", "recording_license": "", "do_not_record": false, "persons": [{"guid": "37d8d2e1-2a1c-5db3-8440-a660832e7c48", "id": 1221, "code": "3GJQNE", "public_name": "Andrii Gakhov", "avatar": "https://pretalx.com/media/avatars/gakhov_big_head_sq.png", "biography": "Andrii Gakhov is a mathematician and software engineer holding a Ph.D. in mathematical modeling and numerical methods. He has been a teacher in the School of Computer Science at V. Karazin Kharkiv National University in Ukraine for a number of years and currently works as a software practitioner for ferret go GmbH, the leading community moderation, automation, and analytics company in Germany. His fields of interests include machine learning, stream mining, and data analysis.\r\n\r\nThe author of \"Probabilistic Data Structures and Algorithms for Big Data Applications\"  (ISBN: 9783748190486)", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/8K7TA9/", "id": 1774, "guid": "24ac8f1c-2b1b-5b43-a4fc-502a20f31078", "date": "2019-09-05T11:30:00+00:00", "start": "11:30", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1774-driving-a-30m-radio-telescope-with-python", "title": "Driving a 30m Radio Telescope with Python", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "The IRAM 30m radio telescope is one of the best in the world. The telescope control software, monitoring, data archiving as well as some of the data processing code is written in Python. We will describe how and why Python is used at the telescope.", "description": "The IRAM 30m radio telescope is one of the best in the world. It has been in operation non-stop since the mid 80s and is used to observe 24-hours a day, 365 days a year. All of the high-level telescope control software, monitoring, data archiving as well as some of the data processing software is written in Python. This choice, controversial at first, proved to be extremely successful making the IRAM 30m telescope extremely efficient.\r\n\r\nThis talk will describe how Python is used at the telescope, the reasons behind these choices, lessons learned and future developments.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "098fc160-d6f4-51c4-a8f7-3e7fe999b964", "id": 1228, "code": "TZTL7A", "public_name": "Francesco Pierfederici", "avatar": "https://pretalx.com/media/avatars/b20d5ee150462f7c487c7660574633b9_zZ6N91W.jpg", "biography": "Launched genomics data processing platform for the largest food company in the world. Helped shoot satellites in orbit at NASA. Optimised numerical weather forecast models on 200k cores. Asteroid 22435 Pierfederici named in recognition of his contributions to astronomy software development. Author of \"Distributed Computing with Python\", 2016 PACKT Publishing.\r\n\r\nLoves Python.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/F8X9BY/", "id": 1321, "guid": "f7f06f56-fdb7-5051-b5c8-8bc545995b8a", "date": "2019-09-05T12:00:00+00:00", "start": "12:00", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1321-matrix-calculus-with-sympy", "title": "Matrix calculus with SymPy", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "In this talk we explore a recent addition to SymPy which allows to find closed-form solutions to matrix derivatives. As a consequence, generation of efficient code for optimization problems is now much easier.", "description": "The recent popularization of libraries relying on tensor algebra operations has led to a rise in the requirement of computational tools to calculate the gradient and hessian of tensorial expressions. The derivative of a tensor *A* by tensor *B* is the tensor containing all combinations of the elements of *A* derived by the elements of *B*. While tensor derivative operations are commonly supported by most computer algebra systems and frameworks through iterative algorithms, these derivatives can be expressed mathematically in closed-form solutions, which are computationally many orders of magnitude faster.\r\n\r\nSymPy has been recently extended in order to support the computation of symbolic matrix derivatives, and is currently the only computer algebra system endowed with this feature (lacking even in Wolfram Mathematica). Matrix calculus plays indeed a central role in optimization and machine learning, but was unfortunately often limited to pen on papers or chalk on blackboards.\r\n\r\nIn this talk, we will introduce matrix expressions in SymPy, and address the three ways they can be represented:\r\n\r\n1. explicit matrices with symbolic entries,\r\n2. indexed symbols with proper summation convention,\r\n3. implicit matrix expressions.\r\n\r\nWe illustrate the way matrix derivatives are implemented for all three representations, with special emphasis to the third way, the fastest and most elegant. The derived expressions can then be passed to SymPy's code generation utilities and the resulting code can be compared in speed with other frameworks, such as TensorFlow.\r\n\r\nThe support of matrix derivatives can turn SymPy into a simple tool to create the code for optimization algorithms or the code to train machine learning algorithms. The code generation utilities of SymPy are indeed aware of how to export matrix expressions into other programming languages and frameworks. We will give some examples using maximum likelihood estimation and the expectation-maximization algorithms.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b5dca714-5f27-5773-9f40-0e0dd6840f14", "id": 1284, "code": "NYVKBG", "public_name": "Francesco Bonazzi", "avatar": null, "biography": "MSc. in physics from the University of Milano, Italy (2012).\r\nSoftware engineer in the industry (2012-2015).\r\nResearcher at the Max Planck Institute of Colloids and Interfaces, Potsdam, Germany (2015-2018).\r\nData scientist (2018-2019).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/FLM8R7/", "id": 1188, "guid": "31276e06-e576-59b4-b039-231d68067031", "date": "2019-09-05T14:45:00+00:00", "start": "14:45", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1188-veloxchem-python-meets-quantum-chemistry-and-hpc", "title": "VeloxChem: Python meets quantum chemistry and HPC", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "A new and efficient Python/C++ modular library for real and complex response functions at the\r\nlevel of Kohn-Sham density functional theory", "description": "Zilvinas Rinkevicius, Xin Li, Olav Vahtras, Manuel Brand, Karan Ahmadzadeh, Magnus\r\nRingholm, Nanna List, and Patrick Norman\r\n\r\nWith the ease of Python library modules, VeloxChem offers a front end to quantum chemical\r\ncalculations on contemporary high-performance computing (HPC) systems and aims at\r\nharnessing the future compute power within the EuroHPC initiative. At the heart of this\r\nsoftware lies a module for the evaluation of electron-repulsion integrals (ERIs) using the ObaraSaika recurrence scheme, where a high degree of efficiency is achieved by employing\r\narchitecture-independent vectorization via OpenMP SIMD pragmas in the auto-generated C++\r\nsource code. The software is topology aware and with a Python-controlled work and task flow,\r\nthe idle time is minimized using an MPI/OpenMP partitioning of resources.\r\nIn the second software layer, we have implemented a highly accurate SCF start guess based\r\non atomic densities and a first-level of iterations in a reduced version of the user-defined basis\r\nset, leading to a very smooth convergence in the subsequent standard DIIS scheme. This layer\r\nalso includes vectorized and OpenMP/MPI parallelized modules for efficient generation of DFT\r\ngrid points and weights as well as kernel integration.\r\nIn the third software layer, we present real and complex response functions as to address\r\ndispersive and absorptive molecular properties in spectroscopy. The kernel module in this layer\r\nis the iterative linear response equation solver that we have formulated and implemented for a\r\ncombination of multiple optical frequencies and multiple perturbation operators. With efficient\r\nuse of computer memory, we enable the simultaneous reference to, and solving of, in the order\r\nof 1,000 response equations for sizable biochemical systems without spatial symmetry, and we\r\ncan thereby determine electronic response spectra in arbitrary wavelength regions, including\r\nUV/vis and X-Ray, without resolving the sometimes embedded excited states in the spectrum.\r\nE.g. the electronic CD spectrum (involving the Cartesian sets of electric and magnetic\r\nperturbations) over a range of some 10 eV is obtained at a computational cost comparable to\r\nthat of determining the transition energy of the lowest excited state, or optimizing the electronic\r\nstructure of the reference state.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7e55d00b-8b75-5937-881a-c6af91cb2122", "id": 62, "code": "ZRZEXZ", "public_name": "Olav Vahtras", "avatar": "https://pretalx.com/media/avatars/f5c1e8bcae4e86abfd073a5b344fde40_6tdgnLu.jpg", "biography": "Professor of Theoretical Chemistry at KTH Royal Institute of Technology, Stockholm, Sweden.\r\nPhD in Quantum Chemistry 1992 \r\nMSc in Engineering Physics 1988\r\n\r\nResearch interest, method development in quantum chemistry. Coauthor of the Dalton Program package.\r\n\r\nInvolvement in the Python community as co-editor of Scipy Lecture Notes and instructor for Software Carpentry/Data Carpentry.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/YJQH7M/", "id": 1271, "guid": "fa8afe50-10ee-5901-99d4-9db834419df2", "date": "2019-09-05T15:15:00+00:00", "start": "15:15", "logo": null, "duration": "00:30", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1271-emzed-a-python-based-framework-for-analysis-of-mass-spectrometry-data", "title": "emzed: a Python based framework for analysis of mass-spectrometry data", "subtitle": "", "track": null, "type": "Talk (long)", "language": "en", "abstract": "This talk is about emzed, a Python library  to support biologists with little programming knowledge to implement ad-hoc analyses as well as workflows for mass-spectrometry data.", "description": "Many of the existing mass spectrometry data analysis tools are desktop applications designed for specific applications without support for customization. In addition, many of the commercial solutions offer no or only limited functionality for exporting results.\r\n\r\nIn addition, the existing programming libraries in this area are scattered across different languages, mostly R, Java and Python.\r\n\r\nAs a result, data analysis in this area often consists of manual import/export steps from/to various tools and self-developed scripts that prevent the reproducibility of results obtained or automated execution on high-performance infrastructures.\r\n\r\nemzed tries to avoid these problems by integrating existing libraries and tools from Python, R (and in the near future also Java) into an easy-to-use API. \r\n\r\nTo support workflow development and increase confidence in end results \r\nemzed also offers tools for interactive visualization of mass spectrometry related data structures.\r\n\r\nThe presentation introduces basics and concepts of emzed, some lessons learned and current development of the next version of emzed.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "9fc396e0-1e6f-57a1-8d16-45029d0c5590", "id": 1123, "code": "89DLZK", "public_name": "Uwe Schmitt", "avatar": "https://pretalx.com/media/avatars/uwe_photo_id_smaller_more_quadratic.jpg", "biography": "- master in mathematics 1994 at University of Saarbr\u00fccken, Germany.\r\n- PHD in applied mathematics since 2001 at University of Saarbr\u00fccken, Germany.\r\n- Postdoc position until 2008 at University of Saarbr\u00fccken\r\n- 2008-2014 working as software developer and data scientist for mineway GmbH\r\n- since 2014 senior software developer at Scientific IT Services of ETH Zurich, Switzerland.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/PUCWVY/", "id": 1456, "guid": "20629f65-e405-5a82-83ab-07fdcb239720", "date": "2019-09-05T15:45:00+00:00", "start": "15:45", "logo": null, "duration": "00:15", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1456-vtext-fast-text-processing-in-python-using-rust", "title": "vtext: fast text processing in Python using Rust", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "In this talk, we present some of the benefits of writing extensions for Python in Rust. We then illustrate this approach on the [vtext](https://github.com/rth/vtext) project, that aims to be a high-performance library for text processing.", "description": "Scientific Python has historically relied on compiled extensions for performance critical parts of the code. In this talk, we outline how to write Rust extensions for Python using [rust-numpy](https://github.com/rust-numpy/rust-numpy),\r\nproject. Advantages and limitations of this approach as compared to Cython or wrapping Fortran, C or C++ are also discussed.\r\n\r\nIn the second part, we introduce the [vtext](https://github.com/rth/vtext) project that allows fast text processing in Python using Rust. In particular, we consider the problems of text tokenization, and (parallel) token counting resulting in a sparse vector representation of documents. These can then be used as input in machine learning or information retrieval applications. We outline the approach used in vtext and compare to existing solutions of these problems in the Python ecosystem.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "667b8437-a201-5b23-bf37-a515a33ec779", "id": 1519, "code": "BXVSCG", "public_name": "Roman Yurchak", "avatar": "https://pretalx.com/media/avatars/90e4251aee531aa36f6e09d7b935378f_X6t0Gow.jpg", "biography": "Roman Yurchak has a background in computational physics, and is currently working\r\nas an independent consultant for data science related projects. He is also an open\r\nsource contributor to several Open-Source projects, mostly in Python.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/7EYS3W/", "id": 1421, "guid": "3a589ee1-2919-58ae-ac64-4a5edae394b7", "date": "2019-09-05T16:30:00+00:00", "start": "16:30", "logo": null, "duration": "00:15", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1421-pystencils-speeding-up-stencil-computations-on-cpus-and-gpus", "title": "pystencils: Speeding up stencil computations on CPUs and GPUs", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "[pystencils](https://i10git.cs.fau.de/pycodegen/pystencils) speeds up stencil computations on numpy arrays using a sympy-based high level description, that is compiled into optimized C code.", "description": "[Interactive Notebooks are available here](https://mybinder.org/v2/gh/mabau/pystencils/master?filepath=doc%2Fnotebooks).\r\n\r\nMany operations on structured arrays can be formulated as stencil codes, where the update of one array cell depends only on values in its local neighborhood. Stencil codes arise in many different fields, for example in image processing or in computational fluid dynamics by discretizing partial differential equations (PDEs) using finite differences or finite volume schemes. \r\n\r\nWe present the [pystencils](https://i10git.cs.fau.de/pycodegen/pystencils) package that allows for fast execution of stencil codes on numpy arrays using code generation techniques.\r\nThe stencil is formulated in sympy and transformed into an intermediate representation (IR).\r\n*pystencils* comes with a set of optimizing transformations that can be applied on this IR, for example cache blocking or explicit SIMD vectorization with intrinsics. The intermediate representation is transformed into C or CUDA code and automatically loaded as a C extension module. This approach yields highly efficient implementations, outperforming current acceleration techniques like Cython or numba. Additionally, together with the [waLBerla](https://www.walberla.net/) package, the resulting stencil codes can be run on large computing clusters, using MPI parallelization. \r\n\r\n*pystencils* also comes with functions to automatically derive the sympy-based stencil representation from a continuous PDE. Symbolic, continuous differential operators are automatically discretized by finite difference schemes of arbitrary order. \r\n\r\nWe show two examples of large-scale setups run with *pystencils*: a phase-field method simulating solidification of alloys and a CFD simulation based on the lattice-Boltzmann method.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b7844e7e-6918-5535-939d-ed106dde133a", "id": 1496, "code": "JZH83K", "public_name": "Martin Bauer", "avatar": null, "biography": "Martin Bauer is a PhD student at the chair for system simulation at the University Erlangen-Nuremberg. \r\nHis research interests are CFD simulations with the lattice Boltzmann method,  meta-programming techniques and high performance computing. He is one of the core developers of the waLBerla lattice Boltzmann framework.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2019/talk/LE7AAH/", "id": 1784, "guid": "bf25ae85-d253-5b7f-a7f2-077b3f787e87", "date": "2019-09-05T16:45:00+00:00", "start": "16:45", "logo": null, "duration": "00:15", "room": "Track 3 (Oteiza)", "slug": "euroscipy-2019-1784-telapy-a-python-module-to-compute-free-surface-flows-and-sediments-transport-in-geosciences", "title": "TelApy a Python module to compute free surface flows and sediments transport in geosciences", "subtitle": "", "track": null, "type": "Talk", "language": "en", "abstract": "TelApy a Python module to compute free surface flows and sediments transport in geosciences and examples of how it is used to inter-operate with other Python libraries for Uncertainty Quantification, Optimization, Reduced Order Model.", "description": "This talk is focused on the application of TelApy module (www.opentelemac.org). TelApy aims to provide a Python wrapper of TELEMAC-MASCARET API (Application Program Interface). The goal of TelApy is to have a full control on the simulation while running a case. For example, it must allow the user to stop the simulation at any time step, get values of some variables and change them. In order to make this possible, a Fortran structure called instantiation was developed with the API. It contains a list of strings pointing to TELEMAC variables. This gives direct access to the physical memory of variables, and allows therefore to get and set their values. Furthermore, changes have been made in TELEMAC-MASCARET main subroutines to make hydraulic cases execution possible time step by time step. It is useful to drive the TELEMAC-MASCARET SYSTEM APIs using Python programming language. In fact, Python is a portable, dynamic, extensible, free language, which allows (without imposing) a modular approach and object oriented programming. In addition of benefits of this programming language, Python offers a large amounts of interoperable libraries. The link between various interoperable libraries with TELEMAC-MASCARET SYSTEM APIs allows the creation of an ever more efficient computing chain able to more finely respond to various complex problems. Therefore, the TelApy module has the ambition to enable a new way of use for the TELEMAC-MASCARET system. In particular one can think about high performance computing for the calculation of uncertainties, optimization, code coupling and so on. The objectives of this talk is to present some examples of the TelApy module in the case of Uncertainty Quantification, Optimization, Reduced Order Model.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "30cfbe11-5ae0-53c5-8197-cf901b5b2da8", "id": 1766, "code": "ZL7ERQ", "public_name": "yoann audouin", "avatar": null, "biography": "In charge of the architecture and environment of the open source code TELEMAC-MASCARET (www.opentelemac.org) since 2012.\r\nDiscovered Python in 2010.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 5, "date": "2019-09-06", "day_start": "2019-09-06T04:00:00+00:00", "day_end": "2019-09-07T03:59:00+00:00", "rooms": {}}]}}}