{"$schema": "https://c3voc.de/schedule/schema.json", "generator": {"name": "pretalx", "version": "2024.2.0.dev0"}, "schedule": {"url": "https://pretalx.com/euroscipy-2023/schedule/", "version": "Re-Revised Final Revision Final Fun 2.2 last-minute-fix", "base_url": "https://pretalx.com", "conference": {"acronym": "euroscipy-2023", "title": "EuroSciPy 2023", "start": "2023-08-14", "end": "2023-08-18", "daysCount": 5, "timeslot_duration": "00:05", "time_zone_name": "Europe/Zurich", "colors": {"primary": "#3aa57c"}, "rooms": [{"name": "HS 120", "guid": "36dc1d19-2daa-5722-b3dc-00881fbfae6b", "description": "Big room II", "capacity": 101}, {"name": "HS 118", "guid": "8587992e-8091-5201-9c6c-62e46de7ca6b", "description": "Small room I", "capacity": 117}, {"name": "Aula", "guid": "de96f281-5239-5232-a89c-04a338ed7f6c", "description": null, "capacity": 340}, {"name": "HS 119 - Maintainer track", "guid": "f67cb0bc-d6ad-5f84-af1f-193585de6ce3", "description": "Maintainer Room", "capacity": 30}, {"name": "Rosshof", "guid": "f52ccea8-924e-5e68-8923-72eaf4cb38ec", "description": "Not a real room: just a placeholder for sprints until we'll fix the website", "capacity": 1}], "tracks": [{"name": "Community, Education, and Outreach", "color": "#8E7510"}, {"name": "Data Science and Visualisation", "color": "#000000"}, {"name": "High Performance Computing", "color": "#D149DF"}, {"name": "Scientific Applications", "color": "#5C85EA"}, {"name": "Machine and Deep Learning", "color": "#9CEECD"}], "days": [{"index": 1, "date": "2023-08-14", "day_start": "2023-08-14T04:00:00+02:00", "day_end": "2023-08-15T03:59:00+02:00", "rooms": {"Aula": [{"url": "https://pretalx.com/euroscipy-2023/talk/ZZWYAT/", "id": 31636, "guid": "720bd483-4257-5875-95f5-05ff371c20a1", "date": "2023-08-14T08:30:00+02:00", "start": "08:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-31636-network-analysis-made-simple-and-fast-", "title": "Network Analysis Made Simple (and fast!)", "subtitle": "", "track": "Data Science and Visualisation", "type": "Tutorial", "language": "en", "abstract": "Through the use of NetworkX's API, tutorial participants will learn about the basics of graph theory and its use in applied network science. Starting with a computationally-oriented definition of a graph and its associated methods, we will build out into progressively more advanced concepts (path and structure finding). We will also discuss new advances to speed up NetworkX Code with dispatching to alternate computation backends like GraphBLAS. This will be a hands-on tutorial, so stretch your muscles and get ready to go through the exercises!", "description": "Have you ever wondered about how those data scientists at Facebook and LinkedIn make friend recommendations? Or how epidemiologists track down patient zero in an outbreak? If so, then this tutorial is for you. In this tutorial will cover the basic of network analysis, we will use a variety of datasets to help you understand the fundamentals of network thinking, with a particular focus on constructing, summarizing, and visualizing complex networks. We will also cover recent changes in NetworkX which enables users to dispatch their code to more efficient backends like GraphBLAS to speed up their code.\r\n\r\nBy the end of the tutorial, participants will have learned how to use network thinking to better understand relationship problems while analyzing data. They will also be comfortable using the NetworkX API to model their data.\r\n\r\nPart 1: Introduction (30 min)\r\n\r\n- Networks of all kinds: biological, transportation, web.\r\n- Representation of networks, NetworkX data structures\r\n- Introduction to NetworkX API for modelling and graph operations.\r\n\r\nPart 2: Hubs and Paths (30 min)\r\n\r\n- Finding important nodes; applications\r\n- Pathfinding algorithms and their applications\r\n- Hands-on: implementing path-finding algorithms\r\n- Visualize degree and betweenness centrality distributions.\r\n\r\nPart 3: Speed up your code with NetworkX dispatching (30 min)\r\n\r\n- Quick introduction to GraphBLAS\r\n- Moving between GraphBLAS and NetworkX.\r\n- Speed up your NetworkX code by changing one line of code!", "recording_license": "", "do_not_record": false, "persons": [{"guid": "18c9d15e-9959-521b-ba27-bc3c849b197e", "id": 30173, "code": "UAM73R", "public_name": "Mridul Seth", "avatar": "https://pretalx.com/media/avatars/picture_zEBWWdC.jpg", "biography": "I am currently working on the NetworkX open source project (work funded through a grant from Chan Zuckerberg Initiative!). Also collaborating with folks from the Scientific Python project (Berkeley Institute of Data Science), Anaconda Inc. Before this I used to work on the GESIS notebooks and gesis.mybinder.org.\r\nI am also interested in the development and maintenance of the open source data & science software ecosystem. I try to help around with the Scientific Open Source ecosystem wherever possible. To share my love of Python and Network Science, I have presented workshops at multiple conferences like PyCon, (Euro)SciPy, PyData London and many more!", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/X8LYJY/", "id": 32175, "guid": "83357020-9da3-5d27-b74d-b4aeafcda3b5", "date": "2023-08-14T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-32175-introduction-to-geospatial-machine-learning-with-srai", "title": "Introduction to Geospatial Machine Learning with SRAI", "subtitle": "", "track": "Scientific Applications", "type": "Tutorial", "language": "en", "abstract": "This tutorial offers a thorough introduction to the srai library for Geospatial Artificial Intelligence. Participants will learn how to use this library for geospatial tasks like downloading and processing OpenStreetMap data, extracting features from GTFS data, dividing an area into smaller regions, and representing regions in a vector space using various spatial features. Additionally, participants will learn to pre-train embedding models and train predictive models for downstream tasks.", "description": "In this tutorial, we intend to provide a comprehensive introduction to the Spatial Representations for Artificial Intelligence (srai) library. Participants will learn how to utilize this library for various geospatial applications, such as downloading and processing OpenStreetMap data, extracting features from GTFS data, splitting a given area into smaller regions, and embedding regions into a vector space based on different spatial features. Moreover, users will learn how to pre-train a model of their choice and build predictive models for use in downstream tasks.\r\n\r\nBy the end of the tutorial, attendees will be able to:\r\n1. Install and set up the SRAI library.\r\n2. Use SRAI to download and process geospatial data.\r\n3. Apply various regionalization and embedding techniques to geospatial data.\r\n4. Utilize pre-trained embedding models for clustering and similarity search.\r\n5. Build predictive models on top of SRAI embeddings\r\n6. Pre-train available models from scratch.\r\n7. Understand the potential applications and future enhancements of the SRAI library.\r\n\r\nIf you want to follow along, please find the material and installation instructions at https://github.com/kraina-ai/srai-tutorial. We encourage you to set up the repository and install the dependencies before the tutorial.\r\n\r\nLastly, if you're not familiar with geospatial data, we can recommend a great tutorial by Joris Van den Bossche - [Introduction to geospatial data analysis with GeoPandas](https://github.com/jorisvandenbossche/geopandas-tutorial).   \r\nIt is not required to understand this tutorial, but it might allow you to build a deeper understanding of geospatial data and tooling in this domain. Consider it an optional pre-reading.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7ef54ffa-0587-5d71-8ded-fdd8ca5ccf65", "id": 30520, "code": "DLQCAA", "public_name": "Piotr Szyma\u0144ski", "avatar": "https://pretalx.com/media/avatars/195597154_10227086175086495_942861357450494441_n_1kuipAd.jpg", "biography": "Piotr Szyma\u0144ski is a scientist with a mathematical and computer science background. He obtained his Ph.D. in Computer Science at Wroc\u0142aw University of Science and Technology in 2020. As a scholar, he visited Stanford University, Hasso Plattner Institute in Potsdam, Technical University of Sydney, Dortmund Technical University, and Josef Stefan Institute in Ljubljana. He is the primary author of the scikit-multilearn library for multi-label classification. He also has extensive corporate R&D experience. He was one of the authors of the ML/AI layers of Avaya Conversational Intelligence, a contact-center personnel support solution used widely in American call centers. Currently, he leads a Spatial AI group at the Department of Artificial Intelligence at Wroc\u0142aw University of Science and Technology, Poland.", "answers": []}, {"guid": "9bcf3c91-8f0c-5920-b983-138c5fe445dc", "id": 30514, "code": "NEMRFE", "public_name": "Szymon Wo\u017aniak", "avatar": "https://pretalx.com/media/avatars/PAWL8668_1_1_1_1_1_uXnNEHn.jpg", "biography": "Co-creator of SRAI library,\r\nPassionate AI Researcher and ML Engineer working in NLP and GeoAI.\r\nGraduated from the Wroc\u0142aw University of Science and Technology with a Bachelor's in Computer Science and a Master's Degree in Data Science", "answers": []}, {"guid": "11eac674-6eae-5bed-ae78-d010f878e297", "id": 30522, "code": "CEEVGF", "public_name": "Piotr Gramacki", "avatar": "https://pretalx.com/media/avatars/20210920_173029_sJrRixS.jpg", "biography": "Co-creator of SRAI library. Master of Science in Data Science @ Wroc\u0142aw University of Science and Technology. Machine Learning Engineer @ Brand24", "answers": []}, {"guid": "cdbe27e7-54e5-51f8-9555-abcc57e7acc7", "id": 32129, "code": "AEKH3Q", "public_name": "Kamil Raczycki", "avatar": "https://pretalx.com/media/avatars/ProfilePicturePhoto_5Irk1Jh.jpg", "biography": "Co-creator of SRAI library.\r\nSpatial Data Scientist working at Allegro during the day and passionate open-source developer and geospatial researcher at night.\r\nGraduated Master of Science in Data Science @ Wroc\u0142aw University of Science and Technology.", "answers": []}, {"guid": "b23acd82-a7fc-58ac-a008-24ab0c6516ea", "id": 32191, "code": "PHTF7Q", "public_name": "Kacper Le\u015bniara", "avatar": "https://pretalx.com/media/avatars/IMG_20220501_1914402_n9wO0P9.jpg", "biography": "Co-creator of the SRAI library,\r\nAn ML Engineer passionate about the geospatial domain and an author of highway2vec.\r\nBackground in Computer and Data Science from the Wroc\u0142aw University of Science and Technology and a proud member of the KRAINA Lab tackling geospatial problems. MLOps Engineer @ GetInData", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/9VVRFQ/", "id": 35942, "guid": "76710c23-cea7-532e-89ae-b9001737dccb", "date": "2023-08-14T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-35942-developing-pandas-extensions-in-rust", "title": "Developing pandas extensions in Rust", "subtitle": "", "track": "High Performance Computing", "type": "Tutorial", "language": "en", "abstract": "pandas is a batteries included dataframe library, implementing hundreds of generic operations for tabular data, such as math or string operations, aggregations and window functions... In some case, domain specific code may benefit from user defined functions (UDFs) that implement some particular logic. These functions can sometimes be implemented using more basic pandas vectorized operations, and they will be reasonably fast, but in some others a Python function working with the individual values needs to be implemented, and those will execute orders of magnitude slower than their equivalent vectorized  versions. In this tutorial we will see how to implement functions in Rust that can be used with dataframe values at the individual level, but run at the speed of vectorized code, and in some cases faster.", "description": "While this tutorial will cover complex topics of low level programming languages like Rust, it'll be presented for a beginner audience. No previous knowledge about Rust is required, or any other knowledge other than basic pandas understanding is needed to follow the tutorial.\r\n\r\nThe tutorial will cover how libraries developed in a low level programming language like Rust can be called from Python, the basics of the internal representation of pandas dataframes, the Apache Arrow C data interface, and how to write a simple function in Rust.\r\n\r\nTo be able to follow the hands on part of this tutorial, participants should bring their laptops and have a working Python with a recent version od pandas and PyArrow, and have a Rust compiler.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "73a7f484-df4e-5926-ba4d-af8429e588d4", "id": 1100, "code": "SXWKLX", "public_name": "Marc Garcia", "avatar": "https://pretalx.com/media/avatars/photo5821414706567558354_s4wALp9.jpg", "biography": "Marc is a pandas core developer and the release manager for pandas 1.5 and 2.0. He is also an Ibis and ASV core developer, a fellow of the Python Software Foundation, and the VP of infrastructure at NumFOCUS. Marc works as an independent software and data consultant for clients such as Bank of America, Unilever, Bumble, Tesco and NTT Communications.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/WBSYCM/", "id": 32157, "guid": "d3d3508f-f9eb-524f-a73f-c3251f4aa0a3", "date": "2023-08-14T15:30:00+02:00", "start": "15:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-32157-predictive-survival-analysis-with-scikit-learn-scikit-survival-and-lifelines", "title": "Predictive survival analysis with scikit-learn, scikit-survival and lifelines", "subtitle": "", "track": "Machine and Deep Learning", "type": "Tutorial", "language": "en", "abstract": "This tutorial will introduce how to train machine learning models for time-to-event prediction tasks (health care, predictive maintenance, marketing, insurance...) without introducing a bias from censored training (and evaluation) data.", "description": "Tutorial notebooks:\r\n\r\n- https://vincent-maladiere.github.io/survival-analysis-demo\r\n\r\nAccording to Wikipedia:\r\n\r\nSurvival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as deaths in biological organisms and failure in mechanical systems. [...]. Survival analysis attempts to answer certain questions, such as what is the proportion of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the probability of survival?\r\n\r\nIn this tutorial we will deep dive into a practical case study of predictive maintenance using tools from the scientific Python ecosystem. Here is a tentative agenda:\r\n\r\n- What is time-censored data and why it is a problem to train time-to-event regression models.\r\n- Single event survival analysis with Kaplan-Meier using scikit-survival.\r\n- Evaluation of the calibration of survival analysis estimators using the integrated brier score (IBS) metric.\r\n- Predictive survival analysis modeling with Cox Proportional Hazards, Survival Forests using scikit-survival, GradientBoostedIBS implemented from scratch with scikit-learn.\r\n- How to use a trained GradientBoostedIBS model to estimate the median survival time and the probability of survival at a fixed time horizon.\r\n- Inspecting the learned statistical association between input features and survival probabilities using partial dependence plot.\r\n\r\nThe tutorial notebooks also contain additional material that we probably won't have time to present in 90 min, namely:\r\n\r\n- Competing risks modeling with Nelson\u2013Aalen, Aalen-Johansen using lifelines.\r\n- Estimation of the cause-specific cumulative incidence function (CIF) using our GradientBoostedIBS model.\r\n- Extracting implicit failure data from operation logs using sessionization with Ibis and DuckDB.\r\n\r\nTarget audience: good familiarity with machine learning concepts, with prior experience using scikit-learn (you know what cross-validation means and how to fit a Random Forest on a Pandas dataframe).", "recording_license": "", "do_not_record": false, "persons": [{"guid": "91114ee9-3e12-54e9-8119-9813674ba951", "id": 1530, "code": "NEUMLP", "public_name": "Olivier Grisel", "avatar": "https://pretalx.com/media/avatars/ogrisel_portrait_870x550_PMry4Oq.jpg", "biography": "Machine Learning software engineer at Inria and member of the maintainers' team of the scikit-learn open source project.", "answers": []}, {"guid": "3a5f85c2-418b-562a-8207-830f0672e63a", "id": 22185, "code": "QCVBZD", "public_name": "Vincent Maladiere", "avatar": "https://pretalx.com/media/avatars/vincent_maladiere_vBmX9Nu.jpg", "biography": "Machine Learning Engineer at Inria \u2022 Contributor of scikit-learn, skrub and hazardous \u2022 Eager to talk about deploying stuff and MLOps :)", "answers": []}], "links": [], "attachments": [], "answers": []}], "HS 120": [{"url": "https://pretalx.com/euroscipy-2023/talk/HJLJCQ/", "id": 32144, "guid": "7b0803c8-a25f-55ff-b87e-5070df192505", "date": "2023-08-14T08:30:00+02:00", "start": "08:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-32144-getting-started-with-jupyterlab", "title": "Getting started with JupyterLab", "subtitle": "", "track": "Scientific Applications", "type": "Tutorial", "language": "en", "abstract": "JupyterLab is very widely used in the Python scientific community. Most, if not all, of the other tutorials will use Jupyter as a tool. Therefore, a solid understanding of the basics is very helpful for the rest of the conference as well as for your later daily work.\r\nThis tutorial provides an overview of  important basic Jupyter features.", "description": "# Outline\r\n\r\n## Introduction\r\n\r\n* Terminology: JupyterLab, Notebook, IPython (10 min)\r\n* Notebook approach - cells, code, markdown and more (15 min)\r\n\r\n## Tools\r\n\r\n* Help system and history (10 min)\r\n* Magic functions basics (15 min)\r\n\r\n## Development\r\n\r\n* Runtime measurements and profiling (20 min)\r\n* Exceptions and debugging (20 min)\r\n\r\nThe tutorial will be hands on.\r\nWhile the students will receive a comprehensive PDF with all course content,\r\nI will not distribute pre-filled Notebooks.\r\nInstead, I will start with a blank Notebook for each topic and develop the\r\ncontent step-by-step.\r\nThe participants are encouraged to type along.\r\nMy typing speed is usually appropriate and allows participants to follow.\r\nIn addition, the supplied PDF contains all needed code and commands to get back\r\non track, if I should be too fast.\r\nI also explicitly ask for feedback if I am too fast or things are unclear.\r\nI encourage questions at any time.\r\nIn fact, questions and my answers are often an important part of my teaching,\r\nmaking the learning experience much more lively and typically more useful.\r\n\r\n### Software Requirements\r\n\r\nYou need to have Python and JupyterLab installed. I will use Python 3.11. Older versions such as 3.8., 3.9 or 3.10 should work too. If you use Anaconda, you should be all set. Otherwise, if you use conda install with `conda install -c conda-forge jupyterlab` (or use `mamba` instead of `conda`); if you use `pip` install with `pip install jupyterlab`.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c84d882a-39c5-51be-b6d5-7b7408e7002b", "id": 4, "code": "9KSJ3K", "public_name": "Mike M\u00fcller", "avatar": "https://pretalx.com/media/avatars/4f3782b004830f622e19029e5f7fc146_41xklgK.jpg", "biography": "I am a Python user since 1999 and been teaching Python since 2004, including 60+ conference tutorials.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/PQRRLN/", "id": 34812, "guid": "3850ad78-d6e4-5eaa-8047-10466b71d9ac", "date": "2023-08-14T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-34812-introduction-to-python-for-scientific-programming", "title": "Introduction to Python for scientific programming", "subtitle": "", "track": "Scientific Applications", "type": "Tutorial", "language": "en", "abstract": "This tutorial will provide an introduction to Python intended for beginners.\r\n\r\nIt will notably introduce the following aspects:\r\n\r\n- built-in types\r\n- controls flow (i.e. conditions, loops, etc.)\r\n- built-in functions\r\n- basic Python class", "description": "This tutorial will provide an introduction to Python intended for beginners.\r\n\r\nIt will notably introduce the following aspects:\r\n\r\n- built-in types\r\n- controls flow (i.e. conditions, loops, etc.)\r\n- built-in functions\r\n- basic Python class\r\n\r\nWe introduce here the Python language. Only the bare minimum necessary for getting started with Numpy and Scipy is addressed here. To learn more about the language, consider going through the excellent tutorial https://docs.python.org/tutorial.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5e57e4e9-d142-5b33-bc07-b6094f9f4c16", "id": 20272, "code": "JZ3LXH", "public_name": "Milton Gomez", "avatar": null, "biography": "PhD student from Nicaragua studying the application of machine learning to environmental sciences (more specifically tropical meteorology) in Lausanne.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/EX3C7K/", "id": 34810, "guid": "d2a84464-af91-5898-bad2-db9b41881435", "date": "2023-08-14T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-34810-introduction-to-numpy", "title": "Introduction to NumPy", "subtitle": "", "track": "Scientific Applications", "type": "Tutorial", "language": "en", "abstract": "NumPy is one of the foundational packages for doing data science with Python. It enables numerical computing by providing powerful N-dimensional arrays and a suite of numerical computing tools. In this tutorial, you'll be introduced to NumPy arrays and learn how to create and manipulate them. Then, you'll see some of the tools that NumPy provides, including random number generators and linear algebra routines.", "description": "This tutorial will introduce numerical computing with Python and the NumPy library. It's intended for people new to NumPy and Python's scientific stack or those needing a refresher. \r\n\r\nIn this tutorial, you'll learn about different ways to create NumPy arrays. You'll see how to pull out individual elements or sub-arrays from them. Even more importantly, you'll learn about broadcasting and vectorization. These are NumPy's superpowers that help you write readable and fast code.\r\n\r\nNumPy's arrays are the foundational building blocks. However, you'll also dip your toes into some of the numerical computing tools that are part of the library. In particular, you'll learn the best practices for working with random numbers in NumPy. Furthermore, you'll see how to perform linear algebra operations like matrix addition, multiplication, and inversion.\r\n\r\nThe workshop consists of 90 minutes of live code demonstrations and hands-on exercises.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "828241c7-9154-5d26-8477-3f23b87743ad", "id": 58, "code": "CSTAZX", "public_name": "Geir Arne Hjelle", "avatar": "https://pretalx.com/media/avatars/geirarne13_square_ZTQtWjH.jpg", "biography": "Geir Arne teaches Python at Real Python. He has a background in mathematics and has worked with data analysis in different fields, such as electricity markets, satellite geodesy, and computer vision. In his spare time, Geir Arne enjoys hammock camping, square roots, and aimless forest wandering.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/PWER3Z/", "id": 31497, "guid": "d8cf4d5d-7414-5c1f-8e49-dcd42bb6b152", "date": "2023-08-14T15:30:00+02:00", "start": "15:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-31497-introduction-to-data-analysis-using-pandas", "title": "Introduction to Data Analysis Using Pandas", "subtitle": "", "track": "Data Science and Visualisation", "type": "Tutorial", "language": "en", "abstract": "Working with data can be challenging: it often doesn\u2019t come in the best format for analysis, and understanding it well enough to extract insights requires both time and the skills to filter, aggregate, reshape, and visualize it. This session will equip you with the knowledge you need to effectively use pandas \u2013 a powerful library for data analysis in Python \u2013 to make this process easier.", "description": "#### Section 1: Getting Started With Pandas\r\nWe will begin by introducing the Series, DataFrame, and Index classes, which are the basic building blocks of the pandas library, and showing how to work with them. By the end of this section, you will be able to create DataFrames and perform operations on them to inspect and filter data.\r\n\r\n#### Section 2: Data Wrangling\r\nTo prepare our data for analysis, we need to perform data wrangling. We will learn how to clean and reformat data (e.g. renaming columns, fixing data type mismatches), restructure/reshape it, and enrich it (e.g. discretizing columns, calculating aggregations, combining data sources).\r\n\r\n#### Target Audience\r\nThis tutorial is for anyone with basic knowledge of Python and an interest in learning how to analyze data in Python. We will be working with Jupyter Notebooks, so attendees should familiarize themselves with the interface (i.e., know how to run/edit a cell) beforehand.\r\n\r\n#### Prerequisites\r\nBring a laptop (preferably your personal one) with the virtual environment configured as indicated [here](https://github.com/stefmolin/pandas-workshop#setup-instructions). Come to the session with your environment set up so we can dive right into the material.", "recording_license": "", "do_not_record": true, "persons": [{"guid": "42641dd4-5365-5ec3-9f16-e9b41187ad42", "id": 15572, "code": "9WJJPL", "public_name": "Stefanie Molin", "avatar": "https://pretalx.com/media/avatars/li_7EgJHSM.jpeg", "biography": "Stefanie Molin is a software engineer and data scientist at Bloomberg in New York City, where she tackles tough problems in information security, particularly those revolving around data wrangling/visualization, building tools for gathering data, and knowledge sharing. She is also the author of [Hands-On Data Analysis with Pandas](https://www.amazon.com/Hands-Data-Analysis-Pandas-visualization/dp/1800563450), which is currently in its second edition. She holds a bachelor\u2019s of science degree in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, as well as a master\u2019s degree in computer science, with a specialization in machine learning, from Georgia Tech. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 2, "date": "2023-08-15", "day_start": "2023-08-15T04:00:00+02:00", "day_end": "2023-08-16T03:59:00+02:00", "rooms": {"Aula": [{"url": "https://pretalx.com/euroscipy-2023/talk/HCATHS/", "id": 31551, "guid": "46885286-d5c5-565c-93a3-485b9206297b", "date": "2023-08-15T08:30:00+02:00", "start": "08:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-31551-ibis-a-fast-flexible-and-portable-tool-for-data-analytics-", "title": "Ibis: A fast, flexible, and portable tool for data analytics.", "subtitle": "", "track": "Data Science and Visualisation", "type": "Tutorial", "language": "en", "abstract": "Ibis provides a common dataframe-like interface to many popular databases and analytics tools  (BigQuery, Snowflake, Spark, DuckDB, \u2026). This lets users analyze data using the same consistent API, regardless of which backend they\u2019re using, and without ever having to learn SQL. No more pains rewriting pandas code to something else when you run into performance issues; write your code once using Ibis and run it on any supported backend. In this tutorial users will get experience writing queries using Ibis on a number of local and remote database engines.", "description": "Tabular data is ubiquitous, and Pandas has been the de facto tool in Python for analyzing it. However, as data size scales, analysis using Pandas may become untenable. Luckily, modern analytical databases (like DuckDB) are able to analyze this same tabular data, but perform orders-of-magnitude faster than Pandas, all while using less memory. Many of these systems only provide a SQL interface though; something far different from Pandas\u2019 dataframe interface, requiring a rewrite of your analysis code.\r\n\r\nThis is where Ibis comes in. Ibis provides a common dataframe-like interface to many popular databases and analytics tools (BigQuery, Snowflake, Spark, DuckDB, \u2026). This lets users analyze data using the same consistent API, regardless of which backend they\u2019re using, and without ever having to learn SQL. No more pains rewriting pandas code to something else when you run into performance issues; write your code once using Ibis and run it on any supported backend.\r\n\r\nIn this tutorial we\u2019ll cover:\r\n\r\n- The basic operations of Ibis (select, filter, group_by, join, and aggregate), and how these operations may be composed to form more complicated queries.\r\n- How Ibis may be used on a number of different local and remote backend engines to execute the same queries on different systems.\r\n- The tradeoffs of different database engines, and recommendations for how to choose the best tool for the job.\r\n- How Ibis integrates into the larger Python data ecosystem, including tools like Scikit-Learn or Matplotlib\r\n\r\nThis is a hands-on tutorial, with numerous examples to get your hands dirty. Participants should ideally have some experience using Python and Pandas, but no SQL experience is necessary.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b564c119-c171-5527-8c27-a2a87d8f3e3b", "id": 30121, "code": "YE9YBK", "public_name": "Phillip Cloud", "avatar": null, "biography": "I'm fascinated by a variety of problems related to computers. I've solved hard problems in a variety of software engineering domains including digital video, Rust, systems programming, computer vision, and analytics. I'm currently helping build next generation Python analytics tooling at Voltron Data.", "answers": []}, {"guid": "192fdaa7-5869-5c3c-a8a2-d0758a23a3ea", "id": 24705, "code": "WUEKGF", "public_name": "Gil Forsyth", "avatar": "https://pretalx.com/media/avatars/ivoted_HeGGGyj.png", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/7P3AYM/", "id": 34247, "guid": "98c56a4c-fcad-5edf-b05a-102bcbb7efbf", "date": "2023-08-15T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-34247-ppml-machine-learning-on-data-you-cannot-see", "title": "PPML: Machine Learning on data you cannot see", "subtitle": "", "track": "Machine and Deep Learning", "type": "Tutorial", "language": "en", "abstract": "Privacy guarantee is **the** most crucial requirement when it comes to analyse sensitive data. However, data anonymisation techniques alone do not always provide complete privacy protection; moreover Machine Learning models could also be exploited to _leak_ sensitive data when _attacked_, and no counter-measure is applied. *Privacy-preserving machine learning* (PPML) methods hold the promise to overcome all these issues, allowing to train machine learning models with full privacy guarantees. In this tutorial we will explore several methods for privacy-preserving data analysis, and how these techniques can be used to safely train ML models _without_ actually seeing the data.", "description": "Privacy guarantees are **the** most crucial requirement when it comes to analyse sensitive data. These requirements could be sometimes very stringent, so that it becomes a real barrier for the entire pipeline. Reasons for this are manifold, and involve the fact that data could not be _shared_ nor moved from their silos of resident, let alone analysed in their _raw_ form. As a result, _data anonymisation techniques_ are sometimes used to generate a sanitised version of the original data. However, these techniques alone are not enough to guarantee that privacy will be completely preserved. Moreover, the _memoisation_ effect of Deep learning  models could be maliciously exploited to _attack_ the models, and _reconstruct_  sensitive information about samples used in training, even if these information were not originally provided. \r\n\r\n*Privacy-preserving machine learning* (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.\r\n\r\nThis workshop will be mainly organised in **two** main parts. In the first part,  we will focus on Machine learning demonstrating how DL models could be exploited (i.e. _inference attack_ ) to reconstruct original data solely analysing models predictions; and then we will explore how **differential privacy** can help us protecting the privacy of our model, with _minimum disruption_ to the original pipeline. \r\n\r\nIn the second part we will considering more complex ML scenarios to train Deep learning networks on encrypted data, with specialised **distributed federated learning** strategies.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "47dbdbbd-1657-53bd-8f04-80611fe324a5", "id": 1097, "code": "7FNYLG", "public_name": "Valerio Maggio", "avatar": "https://pretalx.com/media/avatars/me_Nwbnd3M.jpg", "biography": "Valerio Maggio is a Researcher, and a Data scientist Advocate at Anaconda. He is well versed in open science and research software, supporting the adoption of best software development practice (e.g. [Code Review](https://www.software.ac.uk/blog/2022-03-18-treat-your-research-code-code-review)) in Data Science. He has been recently awarded a fellowship from the Software Sustainability Institute ([profile](https://www.software.ac.uk/about/fellows/valerio-maggio)) focused on developing open teaching modules [[1](https://github.com/leriomaggio/privacy-preserving-data-science)][[2](https://github.com/leriomaggio/ppml-tutorial)] on Privacy-Preserving Machine learning technologies. Valerio is also an open-source contributor, and an active member of the Python community. Over the last twelve years he has contributed and volunteered in the organization of many international conferences and community meetups like PyCon Italy, PyData, EuroPython, and EuroSciPy. All his talks, workshop materials and open source contributions are publicly available on his[ Speaker Deck](https://speakerdeck.com/leriomaggio) and[ GitHub](https://github.com/leriomaggio) profiles.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/WPEXXC/", "id": 31508, "guid": "66497d34-55e8-54da-8e6f-1aa0f0f0bcc2", "date": "2023-08-15T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-31508-generating-data-frames-for-your-test-using-pandas-stratgies-in-hypothesis", "title": "Generating Data Frames for your test - using Pandas stratgies in Hypothesis", "subtitle": "", "track": "Machine and Deep Learning", "type": "Tutorial", "language": "en", "abstract": "Do you test your data pipeline? Do you use Hypothesis? In this workshop, we will use Hypothesis - a property-based testing framework to generate Pandas DataFrame for your tests, without involving any real data.", "description": "In this short 90 mins workshop, we will first go through the basics of hypothesis and what is property-based testing. After that, we will introduce the strategies for Pandas objects - available via the extras in Hypothesis. We will have a glimpse of what the strategies are doing to generate the testing object, including Pandas Series and DataFrames. In the end, we will apply what we learn in real testing applications - testing a data pipeline that involves DataFrames.\r\n\r\n## Preparation\r\n\r\nNo preparation is needed, however, if you want to make sure you are not relying on the wifi at the venue for installation and download. You can [clone the workshop repo](https://github.com/Cheukting/hypothesis-dataframe) and [follow the setup instruction](https://github.com/Cheukting/hypothesis-dataframe#installation).\r\n\r\n## Outline\r\n- Introduction of Property-based testing (15 mins)\r\n- Introduction and basic use of Hypothesis exercises (30 mins)\r\n- Deep dive into Pandas strategies (20 mins)\r\n- Do it yourself - apply property-based testing to data pipelines (20 mins)\r\n- Conclusion (5 mins)\r\n\r\n## Prerequisits\r\nNo prior knowledge of property-based testing or hypothesis is required. However, we assume the attendee has experience using Pandas and has a basic understanding of Pandas objects. Knowledge about Numpy array and typing would also be beneficial in understanding the Pandas Strategies.\r\n\r\n## Goal\r\nWe hope the attendee will learn about property-based testing and see how it can benefit their work involved data - especially those that use Pandas. After the workshop, attendees should be able to understand how the Pandas strategies in Hypothesis works and to use Hypotheses to test codes that involve Pandas Series or DataFrame input.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "716d26c2-170b-5a5e-86e5-9d4cecf3bbdd", "id": 54, "code": "8EGVC9", "public_name": "Cheuk Ting Ho", "avatar": "https://pretalx.com/media/avatars/IMG_1037_vjqZpqv.jpg", "biography": "Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, Cheuk is now the Developer Advocate at Anaconda. Cheuk also contributes to multiple Open Source libraries like Hypothesis and Pandas.\r\n\r\nBesides her work, Cheuk enjoys talking about Python on personal streaming platforms and podcasts. Cheuk has also been a speaker at Universities and various conferences. Besides speaking at conferences, Cheuk also organises events for developers. Conferences that Cheuk has organized include EuroPython (which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/LYNMEB/", "id": 31737, "guid": "7a365191-34c8-5926-ad1f-4a5cacab254f", "date": "2023-08-15T15:30:00+02:00", "start": "15:30", "logo": null, "duration": "01:30", "room": "Aula", "slug": "euroscipy-2023-31737-from-complex-scientific-notebook-to-user-friendly-web-application", "title": "From Complex Scientific Notebook to User-Friendly Web Application", "subtitle": "", "track": "Data Science and Visualisation", "type": "Tutorial", "language": "en", "abstract": "Learn how to show your work with the MERCURY framework. This open-source tool perfectly matches your computed notebook (e.g., written in Jupyter Notebook). Without knowledge of frontend technologies, you can present your results as a web app (with interactive widgets), report, dashboard, or report. Learn how to improve your notebook and make your work understandable for non-technical mates. Python only!", "description": "Mercury is a tool that lets you add interactive widgets to your Jupyter Notebook. With these widgets, you can easily turn your notebook into a web application for creating dashboards and presentations. You can even schedule automatic updates. Mercury also provides a way to control who can access your notebooks with a built-in authentication module. Best of all, it's free and open-source.\r\n\r\nThe tutorial will include the following:\r\n\r\n1. Start with Jupyter Notebook.\r\n2. How to start with MERCURY (installing and setting up the needed environment).\r\n3. Overview of the features as downloading results as PDF, restricting authentication, showing/hiding code. \r\n3. Add widgets to your notebook. Select the right widgets.\r\n4. Set up a web app with MERCURY.\r\n5. Deploy and share your web with others.\r\n\r\nIt would be great if you will have install Mercury before tutorial. Please check installation instructions in our repository https://github.com/mljar/mercury", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c73d6e85-e360-5103-8ddb-92112467689a", "id": 27523, "code": "B9JJ3N", "public_name": "Aleksandra Plonska", "avatar": "https://pretalx.com/media/avatars/me1_L5qNAtk.png", "biography": "Lawyer, a graphic designer with a passion for promoting data science tools. Open source enthusiast. From 2019, Executive Director at MLJAR - (the best open-source AutoML available).", "answers": []}, {"guid": "24bfe709-bce6-509e-8d2e-6d72a0d92e49", "id": 17586, "code": "JWK7RV", "public_name": "Piotr P\u0142o\u0144ski", "avatar": "https://pretalx.com/media/avatars/pplonski_smile_Mz6QCW4.jpeg", "biography": "Software engineer trying to make data science tools easier to use for everyone. Working on open source tools: mljar-supervised and mercury.", "answers": []}], "links": [], "attachments": [], "answers": []}], "HS 120": [{"url": "https://pretalx.com/euroscipy-2023/talk/AHXAXM/", "id": 34813, "guid": "e117f064-26b1-565d-ba22-806b9464c120", "date": "2023-08-15T08:30:00+02:00", "start": "08:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-34813-introduction-to-matplotlib-for-visualization-in-python", "title": "Introduction to matplotlib for visualization in Python", "subtitle": "", "track": "Data Science and Visualisation", "type": "Tutorial", "language": "en", "abstract": "This tutorial explains the fundamental ideas and concepts of matplotlib. It's suited for complete beginners to get started as well as existing users who want to improve their plotting abilities and learn about best practices.", "description": "Matplotlib is one of the most-used and powerful visualization libraries for python. Nevertheless, there has been and still is some confusion on how use it properly. This has a number of reasons ranging from an evolution of the API and lack of good documentation to the complexity that comes with the large feature set and flexibility. But these issues can be overcome.\r\n\r\nThis tutorial will explain the main concepts and intended usage patterns of matplotlib. Knowing these, lets you effectively use high-level functions for most of the cases. But you will be able to go into the details if you need to fine-tune certain aspects of the plot. We'll also touch some nowadays discouraged ways of working from the past (you should know what not to do - even though that's still found in lots of examples on the web) and we may get a glimpse into the future.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7b094ab3-5091-538c-b07d-e77ca0006b55", "id": 20316, "code": "X8YQJR", "public_name": "Tim Hoffmann", "avatar": "https://pretalx.com/media/avatars/WIN_20211021_12_24_09_Pro_3_BeaLi6Q.jpg", "biography": "Tim Hoffmann is a physicist and software expert passionate to bring science and high-quality software together. He works as Simulation Architect at Carl Zeiss Semiconductor Manufacturing Technology, where he covers all aspects from coding, architecture and training up to software strategy. Tim is an active contributor to the Python open source community. In particular, he is core developer and API lead for the visualization library matplotlib.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/AQUCLV/", "id": 34814, "guid": "7175cd1a-e1eb-5d3d-9990-f7cc2b4a5bcb", "date": "2023-08-15T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-34814-introduction-to-scikit-learn", "title": "Introduction to scikit-learn", "subtitle": "", "track": "Machine and Deep Learning", "type": "Tutorial", "language": "en", "abstract": "Update: Here, I provide a prepared jupyter notebook for your to fill with code during the tutorial: https://github.com/StefanieSenger/Talks/blob/main/2023_EuroSciPy/2023_EuroSciPy_Intro_to_scikit-learn_fillout-notebook.ipynb. Please download it and have it at hand when the tutorial starts. You can still download it during the introduction part of the tutorial.\r\n\r\nThis tutorial will provide a beginner introduction to scikit-learn. Scikit-learn is a Python package for machine learning. We will talk about what Machine Learning is and how scikit-learn can implement it. In the practical part we will learn how to create a predictive modelling pipeline and how to fine tune its hyperparameters to improve the model's score.", "description": "### Workshop Outline\r\n\r\n<ul>\r\n  <li>Machine Learning 101 (10 min.)</li>\r\n  <li>What is scikit-learn? (5 min.)</li>\r\n  <li>Practical Part (+60 min.)\r\n    <ul>\r\n      <li>Predictive modeling pipeline</li>\r\n      <li>Evaluation of models</li>\r\n      <li>Hyperparameters tuning</li>\r\n    </ul>\r\n  </li>\r\n</ul> \r\n\r\n### Description\r\nWe will start with covering the main ideas behind Machine Learning and we introduce scikit-learn as a machine learning library.  There will be plenty of room to ask questions.\r\n\r\nThe practical part of his tutorial will be subdivided into three parts. First, we will present how to design a predictive modeling pipeline that deals with heterogeneous types of data. Then, we will go more into detail in the evaluation of models and the type of trade-offs to consider. Finally, we will show how to tune the hyperparameters of the pipeline. \r\n\r\nYou are encouraged to code along with me.\r\n\r\n### Prerequisites\r\nThis workshop will serve you best when you have some basic knowledge of Python and know how to use a Jupyter Notebook. We will start from a prepared notebook and add code at every step.\r\n\r\nBring your laptop.\r\n\r\nHave a virtual environment with numpy, pandas and scikit-learn installed.\r\n\r\nHave the prepared notebook at hand.", "recording_license": "", "do_not_record": true, "persons": [{"guid": "0b369ed8-7d15-586f-b55c-a5bd8531bfec", "id": 33092, "code": "G3NKSV", "public_name": "Stefanie Sabine Senger", "avatar": "https://pretalx.com/media/avatars/26b685dc_b_gHvoOnP.jpg", "biography": "Historian (PhD) that went astray. I'm teaching Data Science to career changers at Le Wagon and started contributing to scikit-learn during the last months.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/SE9WTD/", "id": 31429, "guid": "64947c84-2f0e-56f3-995d-18cbb916a21d", "date": "2023-08-15T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-31429-introduction-to-numerical-optimization", "title": "Introduction to numerical optimization", "subtitle": "", "track": "High Performance Computing", "type": "Tutorial", "language": "en", "abstract": "In this hands-on tutorial, participants will delve into numerical optimization fundamentals and engage with the optimization libraries scipy.optimize and estimagic. estimagic provides a unified interface to many popular libraries such as nlopt or pygmo and provides additional diagnostic tools and convenience features. Throughout the tutorial, participants will get the opportunity to solve problems, enabling the immediate application of acquired knowledge. Topics covered include core optimization concepts, running an optimization with scipy.optimize and estimagic, diagnostic tools, algorithm selection, and advanced features of estimagic, such as bounds, constraints, and global optimization.", "description": "In this focused tutorial, participants will be introduced to the fundamentals of numerical optimization and various optimization libraries, including scipy.optimize and estimagic. The session is divided into three blocks, with each focusing on a specific aspect of optimization:\r\n\r\nIntroduction to numerical optimization and scipy.optimize\r\nIntroduction to estimagic and how to pick optimizers\r\nStrategies and tools for advanced optimization\r\n\r\nThe tutorial is designed to be hands-on, dedicating ample time to practice sessions. Including numerous smaller practice sessions allows participants to apply their knowledge of each topic immediately.\r\n\r\nDuring the first block, participants will learn the basics of numerical optimization, code up their first optimization problem and solve it with scipy.optimize.\r\n\r\nIn the second block, we introduce estimagic, as a unified interface to algorithms from scipy, nlopt, pygmo, and others. Participants will try different optimizers and learn how estimagic's diagnostic tools can be used to select the suitable algorithm for a given problem.\r\n\r\nThe third block will focus on practical strategies for advanced optimization problems. Participants will learn about logging and restarting optimizations, global optimization, bounds, and constraints, as well as derivative-free and noisy optimization problems.\r\n\r\nThe session will conclude with a summary of the main learnings, ensuring that participants have gained a solid understanding of numerical optimization and the various optimization libraries covered.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "59ed7187-535d-5550-af07-41c089387ce9", "id": 30074, "code": "STKQES", "public_name": "Tim Mensinger", "avatar": "https://pretalx.com/media/avatars/me_M8fQCIx.jpeg", "biography": "I'm a Ph.D. candidate in economics at the University of Bonn, currently working on topics related to computational econometrics. My projects range from contributing to optimization libraries to implementing statistical methods or models of human behavior. I try to develop software that is easy to use and extend. Besides that, I'm a big advocate for reproducibility and the open-source philosophy, which I try to support by being an active member of the Open Source Economics initiative.", "answers": []}, {"guid": "94bfba81-5ace-5f50-a282-8b96cd4f01be", "id": 30075, "code": "DMPC8P", "public_name": "Janos Gabler", "avatar": "https://pretalx.com/media/avatars/janos_T1rmlew.jpg", "biography": "Author of estimagic | PhD in economics | Expert in numerical optimization | Building Bandsaws, Pizza Ovens and Furniture", "answers": []}, {"guid": "9eb0eb72-5024-56b2-9622-0ff816ecc233", "id": 30137, "code": "EF8NZ3", "public_name": "Tobias Raabe", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/URZEJE/", "id": 35606, "guid": "2a7bc0ee-aab5-51c5-b952-edc9654e02e8", "date": "2023-08-15T15:30:00+02:00", "start": "15:30", "logo": null, "duration": "01:30", "room": "HS 120", "slug": "euroscipy-2023-35606-image-processing-with-scikit-image", "title": "Image processing with scikit-image", "subtitle": "", "track": "Scientific Applications", "type": "Tutorial", "language": "en", "abstract": "This tutorial explores scikit-image, the numpy-native library in the scientific python ecosystem, for visual data analysis and manipulation.\r\nDesigned for beginners and advanced users, it empowers image analysis skills and offers insights into scikit-image documentation.\r\n\r\nIt covers basic concepts like image histogram, contrast, filtering, segmentation, and descriptors through practical exercises.\r\nThe tutorial concludes with advanced performance optimization techniques.\r\n\r\nFamiliarity with numpy arrays is essential as it the underlying data representation.", "description": "Manipulating and analiyzing visual data is key in many scientific fields such as astronomy, life sciences or material sciences. This tutorial explores the scikit-image library, the numpy-native library in the\r\nscientific python echosistem for image processing.\r\n\r\nThe tutorial aims to empower beginners to analyze images using scikit-image. It facilitates understanding of fundamental image processing concepts, and guides participants in how to find help and documentation to go further after the tutorial. For more advanced users, a last part will focus on performance aspects.\r\n\r\nBasic familiarity with manipulating numpy arrays is required, as we will begin by manipulating image pixels as elements within numpy arrays and conducting fundamental image transformations using these arrays.\r\n\r\nNext, we'll transition to fundamental image processing concepts, offering practical exercises and guidance on navigating the scikit-image documentation. These concepts might include:\r\n\r\n- image histogram and contrast\r\n- image filtering: transformations of an image resulting in a new image of similar size (for example, thresholding, edge enhancement, etc.)\r\n- image segmentation: partitioning an image into several regions (objects)\r\n- image descriptors\r\n\r\nThe last part is devoted to advanced topics, particularly focusing on image processing performance and acceleration.\r\n\r\nmaterial:  https://github.com/glemaitre/euroscipy-2023-scikit-image", "recording_license": "", "do_not_record": true, "persons": [{"guid": "d6297904-fa91-50b7-9510-cc23c0cf9edb", "id": 55, "code": "KMDJAL", "public_name": "Guillaume Lemaitre", "avatar": "https://pretalx.com/media/avatars/guillaumelemaitre.jpg__200x200_q85_crop_subsampling-2_upscale_9Ptqss3.jpg", "biography": null, "answers": []}, {"guid": "b0570f78-b208-5361-9da8-3f95be1dbd84", "id": 33316, "code": "DG7LSG", "public_name": "Joan Massich", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 3, "date": "2023-08-16", "day_start": "2023-08-16T04:00:00+02:00", "day_end": "2023-08-17T03:59:00+02:00", "rooms": {"Aula": [{"url": "https://pretalx.com/euroscipy-2023/talk/V7KNNE/", "id": 34862, "guid": "7fae4560-e343-5a8f-8df8-a854df1adfcb", "date": "2023-08-16T09:00:00+02:00", "start": "09:00", "logo": null, "duration": "01:00", "room": "Aula", "slug": "euroscipy-2023-34862-integrating-ethics-in-ml-from-philosophical-foundations-to-practical-implementations", "title": "Integrating Ethics in ML: From Philosophical Foundations to Practical Implementations", "subtitle": "", "track": null, "type": "Keynote", "language": "en", "abstract": "In the rapidly evolving landscape of Machine Learning (ML), significant advancements like Large Language Models (LLMs) are gaining critical importance in both industrial and academic spheres. However, the rush towards deploying advanced models harbors inherent ethical tensions and potential adverse societal impacts. The keynote will start with a brief introduction to the principles of ethics, viewed through the lens of philosophy, emphasizing how these fundamental concepts find application within ML. Grounding our discussion in tangible realities, we will delve into pertinent case studies, including the BigScience open science initiative, elucidating the practical application of ethical considerations. Additionally, the keynote will touch upon findings from my recent research, which investigates the synergy between ethical charters, legal tools, and technical documentation in the context of ML development and deployment.", "description": "In the rapidly evolving landscape of Machine Learning (ML), significant advancements like Large Language Models (LLMs) are gaining critical importance in both industrial and academic spheres. However, the rush towards deploying advanced models harbors inherent ethical tensions and potential adverse societal impacts. The keynote will start with a brief introduction to the principles of ethics, viewed through the lens of philosophy, emphasizing how these fundamental concepts find application within ML. Grounding our discussion in tangible realities, we will delve into pertinent case studies, including the BigScience open science initiative, elucidating the practical application of ethical considerations. Additionally, the keynote will touch upon findings from my recent research, which investigates the synergy between ethical charters, legal tools, and technical documentation in the context of ML development and deployment.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "e70398dc-9338-53fc-9fb2-69d464eb1c6c", "id": 32573, "code": "KGWFV8", "public_name": "Giada Pistilli", "avatar": null, "biography": "Giada Pistilli is a philosophy researcher specializing in ethics applied to Conversational AI. Her research is mainly focused on ethical frameworks, value theory, and applied and descriptive ethics. After obtaining a master\u2019s degree in ethics and political philosophy at Sorbonne University, she pursued her doctoral research in the same faculty. Giada is also Principal Ethicist at Hugging Face, where she conducts philosophical and interdisciplinary research on AI Ethics and content moderation. Her publications, resume, and contact information are available on [her website](https://www.giadapistilli.com/).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/BYFB3Y/", "id": 31524, "guid": "84de3181-e509-57a0-aaea-bf5396bf75a7", "date": "2023-08-16T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-31524-ibis-because-sql-is-everywhere-but-you-don-t-want-to-use-it", "title": "Ibis: Because SQL is everywhere but you don't want to use it", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "We love to use Python in our day jobs, but that enterprise database you run your ETL job against may have other ideas. It probably speaks SQL, because SQL is ubiquitous, it\u2019s been around for a while, it\u2019s standardized, and it\u2019s concise.\r\nBut is it really standardized? And is it always concise? No!\r\n\r\nDo we still need to use it? Probably!\r\n\r\nWhat\u2019s a data-person to do? String-templated SQL?\r\nprint(f\u201dThat way lies {{ m\u0334\u030f\u0301\u0355\u0330\u0345\u033ba\u0338\u0351\u031f\u031c\u0349d\u0335\u0311\u0328\u032bn\u0335\u0312\u0351\u033e\u0316\u0332e\u0338\u034c\u0318\u033c\u032ds\u0335\u033d\u0347\u0316\u031cs\u0338\u0357\u034c\u030f\u030a\u0332\u035c\u0322\u0316 }}\u201d.)\r\n\r\nInstead, come and learn about Ibis! It offers a dataframe-like interface to construct concise and composable queries and then executes them against a wide variety of backends (Postgres, DuckDB, Spark, Snowflake, BigQuery, you name it.).", "description": "Ibis is a pure Python library that lets you write Python to build up expressions that can be executed on a wide array of backends (SQLite, DuckDB, Postgres, Spark, Clickhouse, Snowflake, BigQuery, and more!). It offers a dataframe-like interface and helps you to write concise and composable interactive analytics code.\r\n\r\nHave you ever had to translate a proof-of-concept from Pandas to PySpark to run on the \u201creal data\u201d?\r\n\r\nOr download a huge parquet file because the upstream data is the result of 500 lines of dense SQL and you\u2019re afraid to mess with it?\r\n\r\nHave a love/hate relationship with SQL because it lets you get your job done, but think, there must be a better way?\r\n\r\nWell, if you\u2019re a data-engineer, data-scientist, data-hobbyist, or data-anything, come and join us for a tour of what Ibis can do for you!", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b564c119-c171-5527-8c27-a2a87d8f3e3b", "id": 30121, "code": "YE9YBK", "public_name": "Phillip Cloud", "avatar": null, "biography": "I'm fascinated by a variety of problems related to computers. I've solved hard problems in a variety of software engineering domains including digital video, Rust, systems programming, computer vision, and analytics. I'm currently helping build next generation Python analytics tooling at Voltron Data.", "answers": []}, {"guid": "192fdaa7-5869-5c3c-a8a2-d0758a23a3ea", "id": 24705, "code": "WUEKGF", "public_name": "Gil Forsyth", "avatar": "https://pretalx.com/media/avatars/ivoted_HeGGGyj.png", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/YTW3UF/", "id": 32172, "guid": "6c7f1f5d-0f3c-561d-912c-bc59ea6633a5", "date": "2023-08-16T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-32172-pandas-2-0-and-beyond", "title": "Pandas 2.0 and beyond", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Pandas has reached a 2.0 milestone in 2023. But what does that mean? And what is coming after 2.0? This talk will give an overview of what happened in the latest releases of pandas and highlight some topics and major new features the pandas project is working on", "description": "The pandas 2.0 release is targeted for the first quarter of 2023. This is a major milestone for the pandas project, and this talk will start with an overview of this release. Pandas 2.0 includes some new (experimental) features, but mostly means enforcing deprecations that have been accumulated in the 1.x series, along with some necessary breaking changes.\r\n\r\nBut that doesn\u2019t mean there are no interesting features to talk about! The main part of the presentation will showcase some new features, both already released as opt-in features or to come in future releases.\r\nSupport for non-nanosecond resolution datetimes, allowing time spans ranging over a billion of years. Improved support for nullable data types, including easy opt-in options for I/O functions. Experimental integration with pyarrow to back columns of a DataFrame (beyond the string dtype).\r\nA major change that is under way is a change to the copy and view semantics of operations in pandas (related to the well-known (or hated) SettingWithCopyWarning). This is already available as an experimental opt-in to test and use the new behaviour, and will probably be a highlight of pandas 3.0.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7e876587-827f-57eb-8ec2-ba1bbb58a7f3", "id": 75, "code": "7VUXWM", "public_name": "Joris Van den Bossche", "avatar": "https://pretalx.com/media/avatars/profile_Rc56sfi.png", "biography": "I am a core contributor to Pandas and Apache Arrow, and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas).", "answers": []}, {"guid": "62547b76-1391-5f2b-ace7-9bd4d276daeb", "id": 34048, "code": "AWZYBX", "public_name": "Richard Shadrach", "avatar": "https://pretalx.com/media/avatars/45562402_9WddwZQ.jpeg", "biography": "I am a core contributor to pandas. I earned a PhD in Mathematics from Michigan State University studying Arithmetic Geometry and am now a Director of Data Science at 84.51\u00b0 specializing in large scale optimization problems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/N8YGEW/", "id": 32162, "guid": "4fb5ec15-b0b9-5803-8f52-5e7cd1217f70", "date": "2023-08-16T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:20", "room": "Aula", "slug": "euroscipy-2023-32162-dataframe-agnostic-code-are-we-there-yet-", "title": "DataFrame-agnostic code: are we there yet?", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Have you ever wanted to write a DataFrame-agnostic function, which should perform the same operation regardless of whether the input is pandas / polars / something else? Did you get stuck with special-casing to handle all the different APIs? All is good, the DataFrame Standard is here to help!", "description": "If you want to write a DataFrame-agnostic function, you currently have three choices:\r\n- convert the input DataFrame to pandas (say), perform operations, then convert\r\n  back to the original DataFrame library;\r\n- write the same code multiple times, with if-then statements to deal with the differences between APIs;\r\n- give up, and only support a single DataFrame (usually pandas).\r\n\r\nHowever, there's a new solution in town: use the DataFrame Standard. The DataFrame Standard provides you with a minimal, strict, and predictable API. It allows you to develop with confidence, knowing that your code will work regardless of whether the caller uses pandas, polars, or some other DataFrame library.\r\n\r\nTalk outline will be (roughly):\r\n5 mins: motivation - why do we even need this?\r\n5 mins: demo - let's write a DataFrame-agnostic function!\r\n5 mins: stability, usability, future plans", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6fa6c8b8-7b91-5fe9-8b55-ae35ecc17ec8", "id": 30515, "code": "KEUJ9U", "public_name": "Marco Gorelli", "avatar": "https://pretalx.com/media/avatars/53773211732_a06cc7fd9d_k_69RhAwM.jpg", "biography": "Marco works as a Senior Software Engineer at Quansight Labs. He mainly works on pandas and the DataFrame Consortium (as part of work) and on polars (as a volunteer).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/ZA3FUY/", "id": 31779, "guid": "2f1601e2-df86-5d22-8c0c-fbb1d1187f2d", "date": "2023-08-16T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-31779-timing-and-benchmarking-scientific-python", "title": "Timing and Benchmarking Scientific Python", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Scientific code is often complex, resource-intensive, and sensitive to performance issues, making accurate timing and benchmarking critical for optimising performance and ensuring reproducibility. However, benchmarking scientific code presents several challenges, including variability in input data, hardware and software dependencies, and optimisation trade-offs. In this talk, I discuss the importance of timing and benchmarking for scientific code and outline strategies for addressing these challenges. Specifically, I emphasise the need for representative input data, controlled benchmarking environments, appropriate metrics, and careful documentation of the benchmarking process. By following these strategies, developers can effectively optimise code performance, select efficient algorithms and data structures, and ensure the reliability and reproducibility of scientific computations.", "description": "Scientific code plays a crucial role in advancing scientific research and discovery, but its complexity, resource-intensiveness, and sensitivity to performance issues make accurate timing and benchmarking critical for optimal performance and reproducibility. To this end, this talk addresses the importance of timing and benchmarking for scientific code, and outlines the challenges and strategies associated with it.\r\n\r\nOne of the main challenges in benchmarking scientific code is the variability of input data, which can influence the benchmarking results. To overcome this, it is essential to use representative input data that accurately reflects real-world scenarios. In addition, it is crucial to establish a controlled benchmarking environment to minimise the impact of external variables on the results. This includes running benchmarks on the same hardware and software configurations, using the same input data, and running multiple trials to ensure consistency.\r\n\r\nAnother challenge is the choice of appropriate metrics to measure performance. Depending on the specific requirements of the application, this may involve measuring execution time, memory usage, or other metrics. In addition, optimisation trade-offs can also affect benchmarking results, highlighting the importance of carefully balancing performance with other factors such as accuracy and maintainability.\r\n\r\nTo ensure reproducibility, careful documentation of the benchmarking process is necessary, including the input data, hardware and software configurations, and benchmarking methodology. By following best practices such as these, developers can effectively optimise code performance, select efficient algorithms and data structures, and ensure the reliability and reproducibility of scientific computations.\r\n\r\nIn summary, this talk highlights the significance of accurate timing and benchmarking for scientific code, and presents strategies and best practices for overcoming the challenges associated with it. By implementing these strategies, researchers and developers can accelerate scientific progress and drive innovation through robust and reliable scientific computations.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b557a6e6-20a6-5d93-b31d-e4ea048cd131", "id": 30284, "code": "EQJWS9", "public_name": "Kai Striega", "avatar": null, "biography": "Kai is a SciPy maintainer and a software developer at BHP. He is interested in all things Python, particularly in pushing Python's performance to the language's limits.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/FU9SYJ/", "id": 31553, "guid": "5f2d24ed-e9ec-5eed-bfea-a270b05c1705", "date": "2023-08-16T14:05:00+02:00", "start": "14:05", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-31553-accelerating-your-python-code-a-systematic-overview", "title": "Accelerating your Python code - a systematic overview", "subtitle": "", "track": "High Performance Computing", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Python is slow. We feel the performance limitations when doing computationally intensive work. There are many libraries and methods to accelerate your computations, but which way to go? This talk serves as a navigation guide through the world of speeding up Python. At the end, you should have a high-level understanding of performance aspects and know which way to go when you want to speed up your code next time.", "description": "We start with the fundamental reasons why Python is slow by design - and why it's nevertheless often a good language choice. From there we'll cover basic Python programming paradigms, standard data libraries (NumPy, pandas), Just-in-time compilation (PyPy, numba), GPU-Acceleration, Multithreading, Multiprocessing, calling other languages (C/C++, Julia, Rust) as well as distributed computing. We'll discuss the benefits and costs of all these technologies, so that you know which way to go in different usage scenarios.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "7b094ab3-5091-538c-b07d-e77ca0006b55", "id": 20316, "code": "X8YQJR", "public_name": "Tim Hoffmann", "avatar": "https://pretalx.com/media/avatars/WIN_20211021_12_24_09_Pro_3_BeaLi6Q.jpg", "biography": "Tim Hoffmann is a physicist and software expert passionate to bring science and high-quality software together. He works as Simulation Architect at Carl Zeiss Semiconductor Manufacturing Technology, where he covers all aspects from coding, architecture and training up to software strategy. Tim is an active contributor to the Python open source community. In particular, he is core developer and API lead for the visualization library matplotlib.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/JACZHL/", "id": 31494, "guid": "840e9f5d-e825-5f3f-93dd-f32c70092038", "date": "2023-08-16T14:40:00+02:00", "start": "14:40", "logo": null, "duration": "00:20", "room": "Aula", "slug": "euroscipy-2023-31494-estimagic-a-library-that-enables-scientists-and-engineers-to-solve-challenging-numerical-optimization-problems", "title": "Estimagic: A library that enables scientists and engineers to solve challenging numerical optimization problems", "subtitle": "", "track": "Community, Education, and Outreach", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "estimagic is a Python package for nonlinear optimization with or without constraints. It is particularly suited to solving difficult nonlinear estimation problems. On top, it provides functionality to perform statistical inference on estimated parameters.\r\n\r\nIn this presentation, we give a tour through estimagic's most notable features and explain its position in the ecosystem of Python libraries for numerical optimization.", "description": "Challenging numerical optimization problems arise in many places in science and industry, for example, in the calibration of scientific models, engineering, and statistics. Solving them requires high-quality optimizers and diagnostic tools that help select a suitable algorithm and monitor the optimization's progress. \r\n\r\nEstimagic provides a unified interface to optimization algorithms from scipy, nlopt, pygmo, and many other libraries. The minimize function feels familiar to users of scipy.optimize who are looking for a more extensive set of supported optimizers. Advanced users can use optional arguments to configure every aspect of the optimization, create a persistent log file, turn local optimizers global with a multistart framework, and more. Estimagic can calculate numerical derivatives in parallel, and many optimizers can leverage parallel hardware without requiring changes to the user's criterion function. \r\n\r\nIn this presentation, we give a tour through estimagic's most notable features and explain its position in the ecosystem of Python libraries for numerical optimization.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "94bfba81-5ace-5f50-a282-8b96cd4f01be", "id": 30075, "code": "DMPC8P", "public_name": "Janos Gabler", "avatar": "https://pretalx.com/media/avatars/janos_T1rmlew.jpg", "biography": "Author of estimagic | PhD in economics | Expert in numerical optimization | Building Bandsaws, Pizza Ovens and Furniture", "answers": []}], "links": [], "attachments": [], "answers": []}], "HS 120": [{"url": "https://pretalx.com/euroscipy-2023/talk/SYEFDW/", "id": 32244, "guid": "3d62285c-4a06-5221-9fc9-d194b4084a86", "date": "2023-08-16T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-32244-anomaly-detection-in-time-series-techniques-tools-and-tricks", "title": "Anomaly Detection in Time Series: Techniques, Tools and Tricks", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "From sensor data to epidemic outbreaks, particle dynamics to environmental monitoring, much of crucial real world data has temporal nature. Fundamental challenges facing data specialist dealing with time series include not only predicting the future values, but also determining when these values are alarming. Standard anomaly detection algorithms and common rule-based heuristics often fall short in addressing this problem effectively. In this talk, we will closely examine this domain, exploring its unique characteristics and challenges. You will learn to apply some of the most promising techniques for detecting time series anomalies as well as relevant scientific Python tools that can help you with it.", "description": "This talk will walk you through several most common and effective approaches for tackling anomaly detection in time series, while explaining why traditional anomaly detection techniques might not be very applicable here. Among these approaches we will discuss rule-based anomaly detection, Error-Trend-Seasonality decomposition, structural modelling approach, and short-term forecasting model solutions. Each time we will differentiate between different types of temporal anomalies and why each method may or may not be suited for them. Further, for each approach we will consider several open-source scientific Python tools such as scipy, statsmodels, Prophet, tensorflow / keras and more. At the center of our conversation will be a real-world dataset from the field of environmental monitoring, which can also be easily translated into other fields.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8bebefbf-3f16-5068-8773-a39c1f17f4a0", "id": 25891, "code": "JF8QLJ", "public_name": "Vadim Nelidov", "avatar": "https://pretalx.com/media/avatars/Vadim_main_cropped_u0hy7sO.png", "biography": "Vadim Nelidov is a Lead Data Science consultant at Xebia Data with diverse data & research experience in a variety of industries from energy sector to skincare and agriculture. He also has a research background in decision making sciences as well as several publications in this domain. Throughout his years in the data world, Vadim has been combining advanced data science with practical insights to make data work with an impact for the world. He aspires to see far beyond what is on the surface and get to the essence of the problems.\r\n\r\nVadim is passionate about sharing his knowledge and insights, believing that Data literacy should not be a privilege of a few. And his goal is to be there to make this a reality. Making the intricacies of data science intelligible and uncovering the regularities hiding in the data is a major source of inspiration for Vadim. With this goal in mind, he combines his years of experience in consulting with his background in statistics, research and teaching to make this knowledge accessible to businesses and individuals in need.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/GYYTCH/", "id": 32122, "guid": "fdc3b98a-8742-5bd6-8d2f-2a11f04084b7", "date": "2023-08-16T11:05:00+02:00", "start": "11:05", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-32122-get-the-best-from-your-scikit-learn-classifier-trusted-probabilties-and-optimal-binary-decision", "title": "Get the best from your scikit-learn classifier: trusted probabilties and optimal binary decision", "subtitle": "", "track": "Machine and Deep Learning", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "When operating a classifier in a production setting (i.e. predictive phase), practitioners are interested in potentially two different outputs: a \"hard\" decision used to leverage a business decision or/and a \"soft\" decision to get a confidence score linked to each potential decision (e.g. usually related to class probabilities).\r\n\r\nScikit-learn does not provide any flexibility to go from \"soft\" to \"hard\" predictions: it uses a cut-off point at a confidence score of 0.5 (or 0 when using `decision_function`) to get class labels. However, optimizing a classifier to get a confidence score close to the true probabilities (i.e. a calibrated classifier) does not guarantee to obtain accurate \"hard\" predictions using this heuristic. Reversely, training a classifier for an optimum \"hard\" prediction accuracy (with the cut-off constraint at 0.5) does not guarantee obtaining a calibrated classifier.\r\n\r\nIn this talk, we will present a new scikit-learn meta-estimator allowing us to get the best of the two worlds: a calibrated classifier providing optimum \"hard\" predictions. This meta-estimator will land in a future version of scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/26120.\r\n\r\nWe will provide some insights regarding the way to obtain accurate probabilities and predictions and also illustrate how to use in practice this model on different use cases: cost-sensitive problems and imbalanced classification problems.", "description": "When operating a classifier in a production setting (i.e. predictive phase), practitioners are interested in potentially two different outputs: a \"hard\" decision used to leverage a business decision or/and a \"soft\" decision to get a confidence score linked to each potential decision (e.g. usually related to class probabilities).\r\n\r\nScikit-learn does not provide any flexibility to go from \"soft\" to \"hard\" predictions: it uses a cut-off point at a confidence score of 0.5 (or 0 when using `decision_function`) to get class labels. However, optimizing a classifier to get a confidence score close to the true probabilities (i.e. a calibrated classifier) does not guarantee to obtain accurate \"hard\" predictions using this heuristic. Reversely, training a classifier for an optimum \"hard\" prediction accuracy (with the cut-off constraint at 0.5) does not guarantee obtaining a calibrated classifier.\r\n\r\nIn this talk, we will present a new scikit-learn meta-estimator allowing us to get the best of the two worlds: a calibrated classifier providing optimum \"hard\" predictions. This meta-estimator will land in a future version of scikit-learn: https://github.com/scikit-learn/scikit-learn/pull/26120.\r\n\r\nWe will provide some insights regarding the way to obtain accurate probabilities and predictions and also illustrate how to use in practice this model on different use cases: cost-sensitive problems and imbalanced classification problems.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d6297904-fa91-50b7-9510-cc23c0cf9edb", "id": 55, "code": "KMDJAL", "public_name": "Guillaume Lemaitre", "avatar": "https://pretalx.com/media/avatars/guillaumelemaitre.jpg__200x200_q85_crop_subsampling-2_upscale_9Ptqss3.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/YP9N8H/", "id": 33270, "guid": "5eb014cb-ac7e-5a42-89ea-9a989154f14c", "date": "2023-08-16T11:40:00+02:00", "start": "11:40", "logo": null, "duration": "00:20", "room": "HS 120", "slug": "euroscipy-2023-33270-gpt-generated-text-detection-problems-and-solution-in-the-scientific-publishing", "title": "GPT generated text detection: problems and solution in the scientific publishing", "subtitle": "", "track": "Machine and Deep Learning", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Since its release, ChatGPT is now widely adopted as \"the\" text generation tool used across all industries and businesses. This also includes the domain of scientific research where we do observe more and more scientific papers partially or even fully generated by AI. The same also applies to the peer-reviews reports created while reviewing a paper.\r\n\r\nWhat are the guidelines in the scientific research world? What is now the meaning of the written word and how do we build a model that can identify whether a text is AI-generated? What are the potential solutions to solve this important issue?\r\n\r\nWithin this talk, we are discussing on how to detect AI-generated text and  how to create a scalable architecture integrating this tool.", "description": "ChatGPT and its Open Source alternatives are nowadays being integrated in different text writing workflows. The scientific research, being also heavenly impacted by those new technologies, creates a lot of debate about whether or not these technologies should be used for scientific papers writing. The COPE guidelines are very clear about it, however, there is currently no effective tools available that can  efficiently identify whether a text is AI-generated.\r\n\r\nThere are already some promising principles on how to solve this problem that can be categorized as: watermarking, likelihood detection, and classification. How can these algorithms be used within the scientific writing to detect AI-generated text and how to integrate them into the AI-infrastructure using Python and other modern tools (such as vector search engine for example)?", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b8fa7222-900b-5f98-a7e1-6912cc2fd836", "id": 31348, "code": "PP3NZ9", "public_name": "Dr. Milos Cuculovic", "avatar": "https://pretalx.com/media/avatars/Milos_HD_YKoqIBV.png", "biography": "PhD in Computer Science, IT executive, certified project and product manager oriented to complex assignments with 12 years` working experience in the academic publishing business, focusing on distinct R&D, technology innovation, system administration and information security projects covering ML/AI, Web development and Linux infrastructure.", "answers": []}, {"guid": "29983033-53e9-5710-b39e-8745e30b6c79", "id": 33422, "code": "SBADQX", "public_name": "Andrea Guzzo", "avatar": "https://pretalx.com/media/avatars/me_pc_light_resized_PPqmfzE.jpeg", "biography": "AI Tech Leader at MDPI, Founder and organiser of PythonBiellaGroup, Computer scientist and Nerd by Night.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/UBT8PH/", "id": 31445, "guid": "c75523c5-31e7-5318-a227-311fb531c7c4", "date": "2023-08-16T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-31445-why-i-follow-ci-cd-principles-when-writing-code-building-robust-and-reproducible-applications", "title": "Why I Follow CI/CD Principles When Writing Code: Building Robust and Reproducible Applications", "subtitle": "", "track": "Scientific Applications", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "This talk will discuss the importance of Continuous Integration and Continuous Delivery (CI/CD) principles in the development of scientific applications, with a focus on creating robust and reproducible code that can withstand rigorous testing and scrutiny. The presentation will cover best practices for project structure and code organization, as well as strategies for ensuring reproducibility, collaboration, and managing dependencies. By implementing CI/CD principles in scientific application development processes, researchers can improve efficiency, reliability, and maintainability, ultimately accelerating research.", "description": "Scientists often face the challenge of creating robust and reproducible code that can withstand rigorous testing and scrutiny, which is essential to ensure the validity and reliability of research findings. This talk will explore how Continuous Integration and Continuous Delivery (CI/CD) principles can be applied in the context of scientific application development to help address these challenges.\r\n\r\nThe presentation will begin with the code organization for maintainability and testability. Reproducibility is a critical aspect of scientific research, and the talk will explore how version control and automated testing can help ensure that code is reproducible and that results can be validated by other researchers. Collaboration is also essential, and strategies for setting up a repository and introducing automated testing to ensure the codebase is consistent and minimise errors will be discussed.\r\n\r\nThe talk will also show the benefits of running automated tests at every stage of the development process, which can catch errors and defects early on, leading to higher quality and reduced risk of errors and inconsistencies. \r\n\r\nFinally, during the presentation I will explore effective strategies for managing dependencies using tools like Renovate, which can automate dependency updates. By implementing CI/CD principles in scientific application development processes, scientists can improve efficiency, reliability, and maintainability, ultimately accelerating research.\r\n\r\nIn this talk, I will cover:\r\n* The basic principles of CI/CD and how they apply to scientific application development\r\n* Best practices for project structure and code organization to enhance maintainability and testability\r\n* The benefits of running automated tests and ensuring reproducibility in scientific research\r\n* How CI/CD facilitates collaboration and enables robust, error-free applications\r\n* Effective strategies for managing dependencies with tools like Renovate", "recording_license": "", "do_not_record": false, "persons": [{"guid": "6b68583c-3b7f-5d6c-b7d3-4c1f218f9f2b", "id": 25704, "code": "3W79QG", "public_name": "Artem Kislovskiy", "avatar": "https://pretalx.com/media/avatars/IMG_6965_2_HPqmwVP.jpg", "biography": "As a software engineer with a degree in computational physics, Artem brings a unique perspective to the intersection of science and technology. With a passion for both computations and physics, Artem has applied his knowledge and skills to various projects, including computational fluid dynamics and fluid-structure interactions. While Artem's background is in physics, he has found a love for software development and is now a full-time software engineer. However, his passion for physics and desire to make a meaningful impact in scientific research has led him to explore how software engineering principles can best be applied in scientific projects.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/CJCV9W/", "id": 32174, "guid": "076f0de0-9030-5bad-8e9e-903737c521fa", "date": "2023-08-16T14:05:00+02:00", "start": "14:05", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-32174-solara-a-pure-python-react-style-framework-for-scaling-your-data-apps", "title": "Solara: A Pure Python, React-style Framework for Scaling Your Data Apps", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Solara is a pure Python web framework designed to scale complex applications. Leveraging a React-like API, Solara offers the scalability, component-based coding, and simple state management that have made React a standard for large web applications. Solara uses a pure Python implementation of React, Reacton, to create ipywidgets-based applications that work both in the Jupyter Notebook environment and as standalone web apps with frameworks like FastAPI. This talk will explore the design principles of Solara, illustrate its potential with case studies and live examples, and provide resources for attendees to incorporate Solara into their own projects. Whether you're a researcher developing interactive visualizations or a data scientist building complex web applications, Solara provides a Python-centric solution for scaling your projects effectively.", "description": "Python's growing prominence in the scientific community has led to a demand for more sophisticated and scalable web applications. While several Python web frameworks exist, most are designed for smaller data applications or employ paradigms that haven't proven their scalability for larger projects.\r\n\r\nIn this talk, we introduce Solara, a pure Python web framework developed to address this gap. Solara employs a React-like API, leveraging the scalability, component-based code, and simple state management principles that have made React a go-to for large-scale web applications.\r\n\r\nOur framework uses a pure Python implementation of React, named Reacton, to create applications based on ipywidgets. These applications function both within the Jupyter Notebook environment and as standalone web apps in conjunction with frameworks like FastAPI. By building atop ipywidgets, we harness an existing ecosystem of widgets and support a range of platforms, including JupyterLab, Jupyter Notebook, Voila, Google Colab, DataBricks, and JetBrains Datalore.\r\n\r\nWe'll explore the design principles of Solara, the benefits of a React-like approach in Python, and demonstrate its potential through case studies and live examples. Our goal is to showcase how Solara can help the Python scientific community develop more robust and scalable web applications, thereby expanding the reach and potential of their work.\r\n\r\nAttendees will leave with an understanding of Solara's capabilities and potential applications, as well as resources to begin exploring its utility in their own projects. Whether you're a researcher seeking to develop interactive visualizations or a data scientist looking to build complex web applications, Solara offers a Python-centric solution to scale your projects effectively.\r\n\r\n * https://github.com/widgetti/solara/\r\n * https://solara.dev", "recording_license": "", "do_not_record": false, "persons": [{"guid": "5ec0270e-50b8-55b2-96ab-ccef2e5d93a2", "id": 1510, "code": "38EYUA", "public_name": "Maarten Breddels", "avatar": "https://pretalx.com/media/avatars/DSC_0701_bright_small.jpeg", "biography": "Maarten Breddels is an entrepreneur and ex-scientist mainly working with Python, C++, and Javascript in the Jupyter ecosystem. He is the creator of Solara, ipyvolume, and Vaex and Co-founder of Widgetti. His expertise includes fast numerical computation, API design, 3D visualization, and building data apps. He has a Bachelor's in ICT, a Master's, and Ph.D. in Astronomy, and he likes to solve real problems.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/UCAYVT/", "id": 31501, "guid": "95191f93-7272-5b80-9cac-35a5c51ce279", "date": "2023-08-16T14:40:00+02:00", "start": "14:40", "logo": null, "duration": "00:20", "room": "HS 120", "slug": "euroscipy-2023-31501-chalk-it-an-open-source-framework-for-rapid-web-applications", "title": "Chalk\u2019it: an open-source framework for rapid web applications", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Chalk'it is an open-source framework that transforms Python scripts into distributable web app dashboards. It utilizes drag-and-drop widgets to establish an interface linked to a dataflow connecting Python code and various data sources. Chalk'it supports multiple Python graphics libraries, including Plotly, Matplotlib and Folium for interactive mapping and visualization. The framework operates entirely in web browsers using Pyodide. In our presentation, we will showcase Chalk'it, emphasizing its primary features, software architecture, and key applications, with a special focus on geospatial data visualization.", "description": "Chalk'it is a new open-source framework designed for effortlessly converting Python scripts into shareable standalone web application dashboards. Powered by Pyodide, Chalk'it allows for installing and running Python packages directly in the browser. It supports various graphical libraries like Matplotlib, Plotly, and JavaScript libraries such as ECharts, Vega, Plotly.js, and Leaflet. Dashboard design is streamlined through intuitive widget drag-and-drop functionality, while data and widget interactions are established via bidirectional bindings called connections. Scripts execution and orchestration are organized as a directed acyclic graph, and besides Python, Chalk'it can also use JavaScript libraries.\r\n\r\nChalk'it's web-application development revolves around the following concepts:\r\n\r\n* Workspace: Comparable to Matlab or IPython workspaces, it contains JSON or Python objects with specific states, accessible to other tool entities.\r\n* DataNode: Similar to a Jupyter Notebook cell, it functions as a JSON result producer or a Python object output. DataNodes are typically evaluated based on user interactions or at predetermined intervals.\r\n* \u201cDataNodes\u201d keyword: Facilitates dataflow creation between dataNodes (nodes) and accesses data from other nodes, establishing execution dependencies on preceding nodes.\r\n* Execution graph: Represents the sum of dependency relationships implicitly defined by using the \u201cdataNodes\u201d keyword, organizing application logic. User actions on widgets trigger the relevant dataflow execution.\r\n* Widget: A graphic object defined by an HTML page layout rectangle, interacting with the workspace through connection points called actuators. Actions include clicking buttons, entering text/numeric values, scrolling cursors, and selecting items from drop-down lists.\r\n* Connection: Defines the link between a widget (via its actuator) and a dataNode.\r\n* Document: A JSON document that describes the entire working model, including the Chalk'it dashboard. The document has an \u201cxprjson\u201d extension.\r\n* Page: The result of the xprjson document, an HTML page containing the dashboard, acting as the web application. It can be hosted on a static page server and shared via its URL.\r\n\r\nWhile popular Python dashboard-building packages like Streamlit and Plotly Dash exist, Chalk'it differentiates itself with four main features: drag-and-drop dashboard editing, document-based dashboard serialization, dataflow application description, and full in-browser execution. \r\n\r\nChalk'it originates from an IFPEN tool called xDash, utilized internally for creating hundreds dashboards for various topics, such as CO2 reduction at city level, real-driving emissions characterization, image-based rock geological feature prediction, or catalyst performance estimation.\r\n\r\nDuring our presentation, we will introduce the Chalk'it tool and demonstrate its primary features. We will then explore its software architecture and present two application examples: geospatial data analysis and car total cost of ownership estimation. We will conclude with a discussion of the roadmap.\r\n\r\nPlease visit our Github repository: https://github.com/ifpen/chalk-it\r\n\r\nPlease also visit the templates galleries, that can be explored online using the hosted version of Chalk\u2019it : https://ifpen.github.io/chalk-it/", "recording_license": "", "do_not_record": false, "persons": [{"guid": "ed54ec53-aad0-50fb-8518-a4f6b2332209", "id": 30111, "code": "JJ8PPC", "public_name": "Mongi BEN GAID", "avatar": null, "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}], "HS 119 - Maintainer track": [{"url": "https://pretalx.com/euroscipy-2023/talk/YBQTGU/", "id": 36478, "guid": "f3f995a3-f2f8-5f34-9516-38ecd034d854", "date": "2023-08-16T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "HS 119 - Maintainer track", "slug": "euroscipy-2023-36478-contributor-developer-and-volunteer-experience-navigating-challenges-beyond-code", "title": "Contributor, Developer and Volunteer Experience: Navigating Challenges Beyond Code", "subtitle": "", "track": "Community, Education, and Outreach", "type": "Maintainer track long", "language": "en", "abstract": "Let's Talk Inclusivity and Mental Health.\r\n\r\nWhat's beyond the lines of code? Let's explore the spectrum of experiences, from contributors to volunteers, developers to conference attendees.\r\n\r\nJoin us to share your insights, experiences, and solutions for a more supportive and inclusive scientific Python ecosystem. Let's empower one another and shape a community that thrives on empathy, understanding, and collaboration.", "description": "In the ever-evolving realm of the scientific Python community, we often emphasize the importance of inclusivity and accessibility. But what does it truly mean to be inclusive, especially when it comes to mental health?\r\n\r\nFrom contributors shaping open-source projects to volunteers lending their time, and developers harnessing powerful libraries, every experience is a piece of the puzzle.\r\n\r\nHow can we strike a balance between accommodating mental health needs and maintaining the productivity and efficiency of open source projects or community contributions?\r\nHow do we ensure that individuals with mental health challenges are heard, and their perspectives are valued equally?\r\nWhat strategies can open source maintainers and community leaders implement to create a more inclusive space where individuals feel comfortable discussing their mental health concerns?\r\nAre there specific practices or policies that organizations or communities should adopt to support contributors or volunteers facing mental health challenges without compromising the quality of their work?\r\nHow can we foster an atmosphere of empathy and understanding within developer communities, where individuals feel comfortable expressing their struggles without fear of judgment?\r\nWhat role should conference organizers play in ensuring that their events are accessible and accommodating to participants with various mental health needs?\r\nHow can we ensure that the dialogue around mental health remains respectful and productive, even when there are disagreements or differing opinions within the community?\r\n\r\nJoin us to share your insights, experiences, and solutions for a more supportive and inclusive scientific Python ecosystem. Let's empower one another and shape a community that thrives on empathy, understanding, and collaboration.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "e70398dc-9338-53fc-9fb2-69d464eb1c6c", "id": 32573, "code": "KGWFV8", "public_name": "Giada Pistilli", "avatar": null, "biography": "Giada Pistilli is a philosophy researcher specializing in ethics applied to Conversational AI. Her research is mainly focused on ethical frameworks, value theory, and applied and descriptive ethics. After obtaining a master\u2019s degree in ethics and political philosophy at Sorbonne University, she pursued her doctoral research in the same faculty. Giada is also Principal Ethicist at Hugging Face, where she conducts philosophical and interdisciplinary research on AI Ethics and content moderation. Her publications, resume, and contact information are available on [her website](https://www.giadapistilli.com/).", "answers": []}, {"guid": "716d26c2-170b-5a5e-86e5-9d4cecf3bbdd", "id": 54, "code": "8EGVC9", "public_name": "Cheuk Ting Ho", "avatar": "https://pretalx.com/media/avatars/IMG_1037_vjqZpqv.jpg", "biography": "Before working in Developer Relations, Cheuk has been a Data Scientist in various companies which demands high numerical and programmatical skills, especially in Python. To follow her passion for the tech community, Cheuk is now the Developer Advocate at Anaconda. Cheuk also contributes to multiple Open Source libraries like Hypothesis and Pandas.\r\n\r\nBesides her work, Cheuk enjoys talking about Python on personal streaming platforms and podcasts. Cheuk has also been a speaker at Universities and various conferences. Besides speaking at conferences, Cheuk also organises events for developers. Conferences that Cheuk has organized include EuroPython (which she is a board member), PyData Global and Pyjamas Conf. Believing in Tech Diversity and Inclusion, Cheuk constantly organizes workshops and mentored sprints for minority groups. In 2021, Cheuk has become a Python Software Foundation fellow.", "answers": []}, {"guid": "16190769-82a1-5bc5-b251-2de032c5ed41", "id": 20240, "code": "R9KUCJ", "public_name": "Maren Westermann", "avatar": "https://pretalx.com/media/avatars/profile-photo2_MW_JHouYo9.jpg", "biography": "Dr Maren Westermann works as a machine learning engineer at DB Systel GmbH and holds a PhD in environmental science. She is a self taught Pythonista, an active open source contributor, especially to the library scikit-learn, and is a co-organiser of PyLadies Berlin where she hosts monthly open source hack nights.", "answers": []}, {"guid": "9f40d627-ea8f-57ae-afe4-1d3f6c8055d2", "id": 31830, "code": "RJBKQN", "public_name": "Stefania Delprete", "avatar": "https://pretalx.com/media/avatars/StefaniaDelprete_STbuUQ9.png", "biography": "Stefania studied physics and worked in IT and data science in the UK, Germany and Italy. She's involved with Python, Mozilla and data science communities,  and data science projects.\r\n\r\nShe manages the Italian chapter of effective altruism and a professional group of experienced or aspiring people in the field of data science, machine learning and artificial intelligence involved in that community of effective altruists. She recently joined ENAIS (European Network for AI Safety) as executive director.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/8ZANVV/", "id": 32170, "guid": "391e0e12-03a5-581f-b627-c20af9796676", "date": "2023-08-16T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "00:45", "room": "HS 119 - Maintainer track", "slug": "euroscipy-2023-32170-sparse-data-in-the-scientific-python-ecosystem-current-needs-recent-work-and-future-improvements", "title": "Sparse Data in the Scientific Python Ecosystem: Current Needs, Recent Work, and Future Improvements", "subtitle": "", "track": "High Performance Computing", "type": "Maintainer track", "language": "en", "abstract": "This maintainer track aims to lead discussions about the current needs for sparse data in the scientific python Ecosystem. It will present achievements and pursuit of the work initiated in the first Scientific Python Developer Summit, which took from 22nd May to 28th May 2023.", "description": "Sparse data refers to datasets where a high percentage of the values are zero or empty. Sparse arrays are one possible data structure for efficiently handling such datasets. Sparse Matrices from SciPy have been existing and have been used extensively within the scientific python ecosystem since its beginning.\r\n\r\nWhile those foundational representations are still relevant for most use cases, edge cases and recent downstream libraries' needs remain to be considered. Moreover, the generalization of sparse matrices to sparse arrays comes with an important refactoring whose changes impact existing workflow and historical decisions and implementations.\r\n \r\nThis maintainer track aims to lead discussions about the current needs for sparse data in the scientific python Ecosystem. It will present achievements and pursuit of the work initiated in the first Scientific Python Developer Summit, which took from 22nd May to 28th May 2023.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "b2c14bc1-2dec-5d1d-8721-6d9d438fbf54", "id": 28448, "code": "PHNY9R", "public_name": "Julien Jerphanion", "avatar": "https://pretalx.com/media/avatars/me_zg4Ipcd.jpg", "biography": "Julien is a Scientific Software Engineer at QuantStack. He holds a MSc. Computer Science & Engineering from Universit\u00e9 de Technologie de Compi\u00e8gne and a MSc. Applied Mathematics, Computer Vision and Machine Learning from \u00c9cole Normale Sup\u00e9rieure Paris-Saclay.\r\n\r\nJulien is involved in the Scientific Python ecosystem and co-maintain scikit-learn\r\n\r\nPrior to joining QuantStack, Julien worked as a Research Software Engineer at Inria.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/WJTLSW/", "id": 32179, "guid": "722dde65-4d73-5620-9d5c-e245489b4fca", "date": "2023-08-16T14:15:00+02:00", "start": "14:15", "logo": null, "duration": "00:45", "room": "HS 119 - Maintainer track", "slug": "euroscipy-2023-32179-what-not-to-expect-from-numpy-2-0", "title": "What-not to expect from NumPy 2.0", "subtitle": "", "track": "Scientific Applications", "type": "Maintainer track", "language": "en", "abstract": "NumPy is planning a 2.0 release early next year replacing the 1.X release.  While we hope that the release will not be disruptive to most users we do plan some larger changes that may affect many.  These changes include modifications to the Python and C-API, for example making the NumPy promotion rules more consistent around scalar values.", "description": "The release of a NumPy 2.0 has long been avoided since NumPy tries to have a high threshold for breaking changes.  But due to its age, NumPy has also numerous issues which are difficult to change through a slow deprecation process.\r\nWe have finally reached the decision to release a NumPy 2.0 in order to address some of these issues.  A main issue is adoption of NEP 50 to change the scalar promotion rules which will make them more consistent and also is part of pushing towards Array-API adoption.  Further clean-ups of the Python API and changes such as making 64bit integers the default on 64bit windows are also planned.\r\nWhile our C-API will hopefully get some additions to start supporting our new API around ufuncs and DTypes, we also plan to remove or change API to allow evolution and simplify it.  Let's review some of these changes and open discussion about these or other changes.\r\n\r\nWhile it is good to move forward, it is also very important that a majority of users will not have difficulties with updating or transitioning.  Let's discuss potential issues and solutions.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8c4f09e1-846b-5229-a564-fdb188f27fe0", "id": 30521, "code": "QKVYNA", "public_name": "Sebastian Berg", "avatar": null, "biography": "Sebastian Berg is a NumPy maintainer and steering council member working at NVIDIA. He started contributing to NumPy during his undergrad and PhD and Physics and continued working on NumPy at the Berkeley Institute for Data Science before continuing to contribute at NVIDIA.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 4, "date": "2023-08-17", "day_start": "2023-08-17T04:00:00+02:00", "day_end": "2023-08-18T03:59:00+02:00", "rooms": {"Aula": [{"url": "https://pretalx.com/euroscipy-2023/talk/UX8CTK/", "id": 34864, "guid": "9d3fc965-e174-526a-8531-c23d0c7506a8", "date": "2023-08-17T09:00:00+02:00", "start": "09:00", "logo": null, "duration": "01:00", "room": "Aula", "slug": "euroscipy-2023-34864-keynote-on-polars", "title": "Keynote on polars", "subtitle": "", "track": null, "type": "Keynote", "language": "en", "abstract": "Polars is the \"relatively\" new fast dataframe implementation that redefines what DataFrames are able to do on a single machine, both in regard to performance and dataset size.\r\nIn this talk, we will dive into polars and see what makes them so efficient. It will touch on technologies like Arrow, Rust, parallelism, data structures, query optimization and more.", "description": "Polars is the \"relatively\" new fast dataframe implementation that redefines what DataFrames are able to do on a single machine, both in regard to performance and dataset size.\r\nIn this talk, we will dive into polars and see what makes them so efficient. It will touch on technologies like Arrow, Rust, parallelism, data structures, query optimization and more.\r\nCome to this keynote to learn more about the DataFrame world.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "f240ef3a-74da-5c8e-9b19-a2ce7b9f9c7f", "id": 28095, "code": "37ZAQC", "public_name": "Ritchie Vink", "avatar": "https://pretalx.com/media/avatars/headshot_c6AvDG7.jpg", "biography": "Ritchie Vink is the author of the Polars DataFrame library and query engine.\r\nHe has been working as a software engineer and machine learning engineer for 8 years.\r\nBefore he started polars, he did many side projects on varying topics in computer science and statistics.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/LGUKNE/", "id": 32145, "guid": "6f1df311-2ff4-528e-bb6a-15c33c16949e", "date": "2023-08-17T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:20", "room": "Aula", "slug": "euroscipy-2023-32145-from-implementation-to-ecosystem-the-journey-of-zarr", "title": "From Implementation to Ecosystem: The Journey of Zarr", "subtitle": "", "track": "Community, Education, and Outreach", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Zarr is an API and cloud-optimized data storage format for large, N-dimensional, typed arrays, based on an open-source technical specification. In the last 4 years it grew from a Python implementation to a large ecosystem. In this talk, we want to share how this transformation happened and our lessons learned from this journey. Today, Zarr is driven by an active community, defined by an extensible specification, has implementations in C++, C, Java, Javascript, Julia, and Python, and is used across domains such as Geospatial, Bio-imaging, Genomics and other Data Science domains.", "description": "This talk covers the following points:\r\n\r\n* What is Zarr & how does it work?\r\n    * Illustrated Mechanisms of Zarr & Examples\r\n    * When and Why should you use Zarr?\r\n    * Cloud-optimized file/object-storage systems\r\n* Early Development of Zarr and Adaption Across Implementations & Domains\r\n  * Implementations in C++, C, Java, Javascript, Julia, and Python\r\n  * Usage across Geospatial, Bio-imaging, Genomics and other Data Science domains\r\n* The Zarr Enhancement Proposal ([ZEP](https://zarr.dev/zeps/)) process\r\n* [Zarr v3](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html) & [ZEP0001](https://zarr.dev/zeps/draft/ZEP0001.html): From Implementation-driven Development to Spec first\r\n* Lessons learned while developing Zarr v3\r\n\r\nIn this talk you will\r\n\r\n* understand the basics of Zarr and its specification,\r\n* find inspiration for processes and tools in growing projects and ecosystems, and\r\n* get essential takeaways regarding OSS project transitions from a young to a mature stage.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "d0f99712-dbfc-57a7-a6ff-0a83e74321e6", "id": 16246, "code": "YWKXWU", "public_name": "Jonathan Striebel", "avatar": "https://pretalx.com/media/avatars/profile_cropped_bw_97GMX9a.jpg", "biography": "Jonathan is a ML software engineer at Aignostics in Berlin, Germany. He works on machine-learning pipelines for medical image analysis, ensuring scalability and maintainability. Also, he\u2019s an active member of the Zarr community, and one of the authors of the Zarr v3 specification.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/ZNFWD3/", "id": 32141, "guid": "8a560d54-6b18-5238-8045-38e4233f4df3", "date": "2023-08-17T10:55:00+02:00", "start": "10:55", "logo": null, "duration": "00:20", "room": "Aula", "slug": "euroscipy-2023-32141-building-divserve-open-source-communities-learnings-from-pyladies-berlin-s-monthly-open-source-hack-nights", "title": "Building divserve open source communities - learnings from PyLadies Berlin\u2019s monthly open source hack nights", "subtitle": "", "track": "Community, Education, and Outreach", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Today state of the art scientific research as well as industrial software development strongly depend on open source libraries. The demographic of the contributors to these libraries is predominantly white and male. In order to increase participation of groups who have been historically underrepresented in this domain PyLadies Berlin, a volunteer run community group focussed on helping marginalised people to professionally establish themselves in tech, has been running hands on monthly open source hack nights for more than a year. After some initial challenges the initiative yielded encouraging results. This talk summarises the learnings and teaches how they can be applied in the wider open source community.", "description": "Contributing to open source projects is a highly rewarding activity that benefits not only society as whole but also individual contributors. For example, open source contributions provide a great opportunity for skill development and building a public portfolio. However, contributing to open source projects can be challenging especially for people belonging to underrepresented groups in tech because the majority of these people are career changers. In order to lower the barriers of contributing to open source projects PyLadies Berlin are running monthly open source hack nights during which participants are guided through making contributions to open source projects. After facing some hurdles in the early phase of this initiative, the hack nights have become popular events that attract people from all across Germany and even Europe. In this talk Dr Maren Westermann, the creator of this event series will walk the audience through the work involved behind the scenes, the challenges, and the lessons learned to make this initiative a success.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "16190769-82a1-5bc5-b251-2de032c5ed41", "id": 20240, "code": "R9KUCJ", "public_name": "Maren Westermann", "avatar": "https://pretalx.com/media/avatars/profile-photo2_MW_JHouYo9.jpg", "biography": "Dr Maren Westermann works as a machine learning engineer at DB Systel GmbH and holds a PhD in environmental science. She is a self taught Pythonista, an active open source contributor, especially to the library scikit-learn, and is a co-organiser of PyLadies Berlin where she hosts monthly open source hack nights.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/3ENHXR/", "id": 33262, "guid": "c5f9b132-201e-5794-8967-ad29b4602a37", "date": "2023-08-17T11:20:00+02:00", "start": "11:20", "logo": null, "duration": "00:20", "room": "Aula", "slug": "euroscipy-2023-33262-exploring-geospatial-data-for-machine-learning-using-google-earth-engine-an-introduction", "title": "Exploring Geospatial data for Machine Learning using Google Earth Engine: An introduction", "subtitle": "", "track": "Scientific Applications", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Have you ever wondered what type of data you can get about a certain location on the globe? What if I told you that you can access an enormous amount of information while sitting right there at your laptop? In this talk, I'll show you how to use Google Earth Engine to enrich your dataset. Either your exploring, or planning your next ML project, Geospatial data can provide you with a lot of information you did not know you had access to. Let me show you how!", "description": "This talk is aimed at Machine Learning Engineers, Data Scientists, and researchers that have good experience with Python. The goal is to teach these professsionals how they can leverage Google Earth Engine (GEE) to enrich and explore their datasets. By the end of this talk, I expect that the audience has a good grasp of what GEE is, and how they can use it in their next project!\r\n\r\nWe'll go through a small introduction of Geospatial data, and the different types of providers out there. I'll introduce Google Earth Engine and the Python API that has been recently in the works. Finally, I'll go over some use cases that users might explore, as well as some examples from the trenches. \r\n\r\nHere's the outline: \r\n\r\n- Geospatial data: What is it? \r\n- A miriad of providers: From LandSat, to Modius, to Sentinel \r\n- Bringing it all together: Google Earth Engine\r\n- An intro to Google Earth Engine\r\n- The Python API, how to use it? \r\n- Example: Enriching a dataset for carbon stock measurements in Farms\r\n- Tips and tricks\r\n- Where to go from here", "recording_license": "", "do_not_record": false, "persons": [{"guid": "761edf82-6cce-527e-bdb5-e9eefc28241e", "id": 24788, "code": "KZDHYW", "public_name": "Duarte Carmo", "avatar": "https://pretalx.com/media/avatars/duarte_RRRTuOk.jpg", "biography": "I'm a technologist, born and raised in sunny Portugal, now based in Copenhagen. My work lies in the intersection of Machine Learning, Data, Software Engineering, and People. I'm in love with Technology, and how it can improve people's lives.\r\n\r\nIn the past, I've worked in Consumer Electronics, Public Institutions, Big Three Management Consulting, and Startups. The common thread? Solving problems end-to-end. \r\n\r\nNow, I run my own ML consulting shop, where I focus on solving tough problems end-to-end.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/HXD9EH/", "id": 31625, "guid": "53a62636-e727-5605-b32b-8be2cd301244", "date": "2023-08-17T11:45:00+02:00", "start": "11:45", "logo": null, "duration": "00:20", "room": "Aula", "slug": "euroscipy-2023-31625-deploying-multi-gpu-workloads-on-kubernetes-in-python", "title": "Deploying multi-GPU workloads on Kubernetes in Python", "subtitle": "", "track": "High Performance Computing", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "By using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk, we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.", "description": "The RAPIDS suite of open-source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs with minimal code changes and no new tools to learn.\r\n\r\nDask is an open-source library which provides advanced parallelism for Python by breaking functions into a task graph that can be evaluated by a task scheduler that has many workers.\r\n\r\nBy using Dask to scale out RAPIDS workloads on Kubernetes you can accelerate your workloads across many GPUs on many machines. In this talk, we will discuss how to install and configure Dask on your Kubernetes cluster and use it to run accelerated GPU workloads on your cluster.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "effa7a13-dc1e-59d2-ad5d-09840937dc0c", "id": 20296, "code": "EE7H7J", "public_name": "Jacob Tomlinson", "avatar": "https://pretalx.com/media/avatars/profile-2024_x1uY6kh.png", "biography": "Jacob Tomlinson is a senior software engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with Opsdroid in his spare time. He lives in Exeter, UK.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/FRHXYW/", "id": 31502, "guid": "cdfcfe9d-b699-582a-9996-2eac8a55548a", "date": "2023-08-17T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-31502--in-complete-introduction-to-ai-safety", "title": "(in)Complete introduction to AI Safety", "subtitle": "", "track": "Machine and Deep Learning", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "AI is poised to be \"Our final invention,\" either the key to a never-ending utopia or a direct road to dystopia (or apocalypse). Even without the eschatological framing, it's still a revolutionary technology increasingly embedded in every aspect of our life, from smartphones to smart cities, from autonomous agents to autonomous weapons. In the face of acceleration, there can be no delay: if we want AI to shape a better tomorrow, we must discuss safety today.", "description": "AI is our generation's most important technological breakthrough; beyond all the discussion about hype and Doom lie serious safety, technical and ethical considerations. In the face of accelerating AI capabilities, if we want to create a better world, we must also accelerate our safety efforts, exploring ethics and biases, deep learning failures and their alignment implications, or AI and computing policies.\r\n\r\nThis talk aims at providing a brief introduction and overview of the principal axes of AI safety: Ethics, Alignment, and Policies). Hopefully, it will work as a gateway for further forays into these crucial and intertwined areas; references and conceptual maps will accompany the talk's slides.\r\n\r\nThe talk will be roughly structured like this (sections and subsections may vary, but a higher focus will be put on Alignment):\r\n\r\n- Risks: From misuse to Doom\r\n- Alignment:\r\n  - Black Boxes: Interpretability or the lack thereof\r\n  - Mesa-Optimizer and Reward Hacking\r\n  - Of RLHF and Waluigis: on the brittleness of LLMs\r\n- Policies: Can we Regulate A(G)I?\r\n- Ethics: breaking the vicious cycle of unfair models, datasets, and societies.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "8a1b48e1-f1ea-5bbe-a80b-c04fbd0cfd91", "id": 82, "code": "GAWBSA", "public_name": "Michele \"Ubik\" De Simoni", "avatar": "https://pretalx.com/media/avatars/48255185872_a968d54df2_o1_con1iRU.jpg", "biography": "\"Ever Tried. Ever Failed. Try Again. Fail Again. Fail Better. (Beckett)\"\r\n\r\nI work full-time as a Senior Machine Learning Scientist (handling many Data Engineering tasks as well) with a focus on ML for medical imaging at Align Tech.\r\n\r\nCurrent tech hobbies: working with LLMs, worrying about AI risks (focusing on x-risks and Alignment, but it's not looking too good even for more \"mundane\" threats), and contributing to AI Safety.\r\n\r\nI was active as a Python & Data Science/Machine Learning teacher and speaker for local and European meetups and conferences, but that ground to a halt due to the plague. I plan to resume in 2023, as I love traveling and teaching!\r\n\r\nI can usually be found next to some source of caffeine, be it a chawan of Matcha or a cup of V60, bookstores & libraries, cooking classes, tabletop RPGs, and Python/ML/Data meetups.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/ARBBQF/", "id": 32357, "guid": "965bb5ca-b484-521c-9cbd-70352c70e97f", "date": "2023-08-17T14:05:00+02:00", "start": "14:05", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-32357-let-s-exploit-pickle-and-skops-to-the-rescue-", "title": "Let\u2019s exploit pickle, and `skops` to the rescue!", "subtitle": "", "track": "Machine and Deep Learning", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Pickle files can be evil and simply loading them can run arbitrary code on your system. This talk presents why that is, how it can be exploited, and how `skops` is tackling the issue for scikit-learn/statistical ML models. We go through some lower level pickle related machinery, and go in detail how the new format works.", "description": "The pickle format has many vulnerabilities and loading them alone can run arbitrary code on the user\u2019s system [1]. In this session we go through the process used by the pickle module to persist python objects, while demonstrating how they can be exploited. We go through how `__getstate__` and `__setstate__` are used, and how the output of a `__reduce__` method is used to reconstruct an object, and how one can have a malicious implementation of these methods to create a malicious pickle file without knowing how to manually create a pickle file by manipulating a file on a lower level. We also briefly touch on other known exploits and issues related to the format [2]. \r\n\r\nWe also show how one can look inside a pickle file and the operations run by it while loading it, and how one could get an equivalent python script which would result in the output of the pickle file [3]\r\nThen I present an alternative format from the `skops` library [4] which can be used to store scikit-learn based models. We talk about what the format is, and how persistence and loading is done, and what we do to prevent loading malicious objects or to avoid running arbitrary code. This format can be used to store almost any scikit-learn estimator, as well as xgboost, lightgbm, and catboost models.\r\n\r\n- [1] https://peps.python.org/pep-0307/#security-issues\r\n- [2] https://github.com/moreati/pickle-fuzz\r\n- [3] https://github.com/trailofbits/fickling\r\n- [4] https://skops.readthedocs.io/en/stable/persistence.html", "recording_license": "", "do_not_record": false, "persons": [{"guid": "a4868d84-4229-51c6-9af9-2ed9d356b361", "id": 1219, "code": "HGSWKF", "public_name": "Adrin Jalali", "avatar": "https://pretalx.com/media/avatars/Farb3SW3229-w-small_mpaHfiN.png", "biography": "Adrin works on a few projects, including skops which tackles some of the MLOps challenges related to scikit-learn models. He has a PhD in Bioinformatics, has worked as a consultant, as well as working in an algorithmic privacy and fairness team. He's also a core developer of scikit-learn and fairlearn.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/A3EMMC/", "id": 31552, "guid": "12c31a1f-0c26-559f-a616-60730644e0cb", "date": "2023-08-17T14:40:00+02:00", "start": "14:40", "logo": null, "duration": "00:20", "room": "Aula", "slug": "euroscipy-2023-31552-python-versioning-in-a-changing-world", "title": "Python versioning in a changing world", "subtitle": "", "track": "Scientific Applications", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Python versioning is a critical aspect of maintaining a consistent ecosystem of packages, yet it can be challenging to get right. In this talk, we will explore the difficulties of Python versioning, including the need for upper bounds, and discuss mitigation strategies such as lockfiles in the Python packaging ecosystem (pip, poetry, and conda / mamba). We will also highlight a new community effort to analyze Python libraries dynamically and statically to detect the symbols (or libraries) they are using. By analyzing symbol usage, we can predict when package combinations will start breaking with each other, achieving a high rate of correct predictions. Our goal is to gather more community inputs to create a robust compatibility matrix. Additionally, we are doing similar work in C/C++ using libabigail to address ABI problems.", "description": "Python versioning is crucial for ensuring compatibility between different packages, but it can be challenging to get right. In this talk, we will discuss the challenges of Python versioning and present mitigation strategies in different Python packaging systems (lockfiles in poetry and conda-lock, repodata patching in the conda-forge ecosystem). We then introduce a new community effort to analyze Python libraries dynamically and statically to detect the symbols (or libraries) they are using. \r\nBy analyzing symbol usage, we can predict when package combinations will start breaking with each other, achieving a high rate of correct predictions. For this we are using the `caliper` tooling.\r\n\r\nOur approach relies on building a compatibility matrix that takes into account the usage of symbols in Python libraries. We will discuss how this matrix can be used to help developers make informed decisions about which package combinations to use to avoid compatibility issues. We will also present some preliminary results showing the effectiveness of our approach.\r\n\r\nIn addition to Python, we are also working on similar efforts for C/C++ using libabigail to address ABI problems. We believe that this approach will be useful for developers across a wide range of industries and projects, and we are eager to gather more community inputs to further improve the accuracy of our compatibility matrix.\r\n\r\nOverall, this talk will be of interest to anyone working with Python or C/C++ who wants to ensure compatibility between different packages and maintain a consistent ecosystem of tools and libraries. We look forward to sharing our work with the SciPy community and hearing your feedback and suggestions.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "c1a59892-19f6-59e5-bdfd-e50a1ff815c3", "id": 12256, "code": "M7CWJZ", "public_name": "Wolf Vollprecht", "avatar": "https://pretalx.com/media/avatars/photo_wolf_V2aSSVN.jpg", "biography": "Wolf is the CEO of prefix.dev, a company that specializes in cross-platform package management with the open source mamba package manager and more.\r\nHe is a core member of the conda-forge project, the RoboStack project and main author of the mamba package manager.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/UVBBQZ/", "id": 31500, "guid": "48600f22-a5cb-5bdf-9cc7-74f8f3384f3c", "date": "2023-08-17T15:30:00+02:00", "start": "15:30", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-31500-exploring-gpu-powered-backends-for-scikit-learn", "title": "Exploring GPU-powered backends for scikit-learn", "subtitle": "", "track": "High Performance Computing", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Could scikit-learn future be GPU-powered ? This talk will discuss the performance improvements that GPU computing could bring to existing scikit-learn algorithms, and will describe a plugin-based design that is being foresighted to open-up scikit-learn compatibility to faster compute backends, with special concern for user-friendliness, ease of installation, and interoperability.", "description": "GPUs are known to be the preferred hardware for deep-learning based applications, but their use for a wide range of other algorithms has also been proved to be relevant: k-means, random forests, nearest neighbors search,... CPU-based implementations can be outshined and more particularly so where the data is plentiful to the point where the duration for training an estimator becomes a bottleneck. But at what point does it really start to matter, and can it really be a concern for scikit-learn users ? we explore a few usecases to try to highlight what is at stake.\r\n\r\nBut bringing more options for accelerated computing backends could challenge the principles of ease-of-use, ease of installation, and user friendliness, that are at the core of scikit-learn design. As for today, GPU computing software doesn't benefit from seemlessly cross-vendor portability features as much as CPU software does, as a result end-users risk confusion with choosing carefully the hardware and compatible libraries, some of which could be proprietary, and could face high interoperability cost if changing the hardware requires changing the software stack. The talk introduces the open-source SYCL-based software toolchain that aims at unlocking interperobility accross all hardware accelerators and all manufacturers.\r\n\r\nThe scikit-learn library furthermore envisions a plugin-based system that enable external projects to provide alternative compute backends to existing estimators. The plugin-based system eases the development and distributions of backends that could be maintained under the umbrella of the scikit-learn project, while also opening up to third-party providers. Plugins should be easily pip- or conda- installables, seemlessly unlock better performance for scikit-learn estimators, should conform to the same specifications and the same quality standards than scikit-learn default engines, and be swappables so that all users can keep porting and sharing their estimators without regard to the compute backend it has been trained with. Several plugins are being experimented with currently, such as the `sklearn_numba_dpex` project that uses the OneAPI-based toolchain.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2de17fba-7b7a-557b-9ef9-9ce34847c0b9", "id": 30110, "code": "LPYNCD", "public_name": "Franck Charras", "avatar": "https://pretalx.com/media/avatars/065b203af552cb548fa015aafe14b3ae_daJEz44.jpg", "biography": "I graduated as a machine learning research engineer in 2016, with a specialization in NLP. I co-founded Sancare a start-up company that aims at bringing NLP-based solutions for medical data analysis to hospitals, and that has made a place for itself in the market with a performant NLP-powered billing assistant for medical stays. I'm now working at INRIA, France as a Machine Learning Research Engineers, focused on performance computing.", "answers": []}, {"guid": "91114ee9-3e12-54e9-8119-9813674ba951", "id": 1530, "code": "NEUMLP", "public_name": "Olivier Grisel", "avatar": "https://pretalx.com/media/avatars/ogrisel_portrait_870x550_PMry4Oq.jpg", "biography": "Machine Learning software engineer at Inria and member of the maintainers' team of the scikit-learn open source project.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/SZZ8Z7/", "id": 35627, "guid": "159a6e9f-4d9f-5305-86e0-7934709ffe47", "date": "2023-08-17T16:05:00+02:00", "start": "16:05", "logo": null, "duration": "00:30", "room": "Aula", "slug": "euroscipy-2023-35627-scaling-pandas-to-any-size-with-pyspark", "title": "Scaling pandas to any size with PySpark", "subtitle": "", "track": "High Performance Computing", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "This talk discusses using the pandas API on Apache Spark to handle big data, and the introduction of Pandas Function APIs. Presented by an Apache Spark committer and a product manager, it offers technical and managerial insights.", "description": "Undoubtedly, pandas plays a crucial role in data wrangling and analysis tasks. However, its limitation lies in handling big data processing. This creates a dilemma for data practitioners: should they sacrifice information by downsampling the data, or should they explore distributed processing frameworks to handle larger workloads? One popular option is Apache Spark, a mainstream distributed processing tool. Yet, using Spark means learning a new language, PySpark, which can be a challenge.\r\n\r\nThankfully, there is a silver lining. The pandas API on Spark offers equivalent functionalities to pandas in PySpark. This allows pandas users to seamlessly transition from single-node to distributed environments by merely replacing the pandas package with pyspark.pandas.\r\n\r\nConversely, existing PySpark users may need to create custom user-defined functions (UDFs) that are not available in the PySpark API. With the introduction of Pandas Function APIs in Spark 3.0+, users can now apply arbitrary Python native functions with type hints, using pandas instances as input and output, on a PySpark dataframe. This empowers data scientists to train ML models based on each data group with just a single line of code.\r\n\r\nAnd, you don't even need to write PySpark code now! English is the new programming language and we will introduce the English SDK for PySpark. The English SDK understands Spark tables and DataFrames, handles the complexity for you behind the scenes, and returns a DataFrame directly based on your English questions and directions.\r\n\r\nIn a joint presentation by a top open-source Apache Spark committer and a product manager, this talk has both the software engineer and product manager perspectives. Prior working knowledge of pandas, basic Spark, and machine learning will be helpful for the audience.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "3cadf86a-6fa3-533c-99a1-154600b6ff72", "id": 33154, "code": "SDEGBS", "public_name": "Hyukjin Kwon", "avatar": "https://pretalx.com/media/avatars/cd8a5b46da3297a7146b2a42296faa20_FGRmfhy.jpg", "biography": "Hyukjin is a Databricks software engineer as the tech-lead in OSS PySpark team, Apache Spark PMC member and committer, working on many different areas in Apache Spark such as PySpark, Spark SQL, SparkR, infrastructure, etc. He is the top contributor in Apache Spark, and leads efforts such as Project Zen, Pandas API on Spark, and Python Spark Connect.", "answers": []}, {"guid": "bc4410b4-7e2c-5025-b072-dff4eaa1777d", "id": 33166, "code": "ERMJYC", "public_name": "Allan Folting", "avatar": "https://pretalx.com/media/avatars/AllanFoltingProfilePictureSmall_ybQ7CjM.jpeg", "biography": "Allan is a product manager at Databricks mainly working on PySpark.\r\nHe is passionate about helping people make sense of data and has focused on that his whole career.", "answers": []}], "links": [], "attachments": [], "answers": []}], "HS 120": [{"url": "https://pretalx.com/euroscipy-2023/talk/98ZVJH/", "id": 31868, "guid": "96082f85-54a7-5809-80e6-1370ba0c4d00", "date": "2023-08-17T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "00:20", "room": "HS 120", "slug": "euroscipy-2023-31868-build-drug-discovery-web-applications-with-pyscript-ketcher-and-rdkit", "title": "Build Drug Discovery web applications with PyScript, Ketcher and rdkit", "subtitle": "", "track": "Scientific Applications", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "So you don't know JavaScript but know how to use python? Do you want to build an app where you can draw molecules for some application like properties prediction? Then come to this talk where I'll show you how to use Ketcher, EPAM tool for small molecule drawing, PyScirpt and rdkit for your next drug discovery app.", "description": "In this talk, the speaker will demonstrate how to build an app using Ketcher, a tool for small molecule drawing, PyScript, and RDKit for drug discovery applications. Even if you don't know JavaScript, you can still learn how to create an app for predicting molecular properties by following this tutorial. With Ketcher's intuitive interface, you can draw small molecules easily, while PyScript provides the backend programming capabilities. RDKit is a powerful toolkit for working with chemical structures and can be used to predict molecular properties. By the end of this talk, you will have the skills necessary to develop a drug discovery app that could be used to predicts the properties of molecules. This talk is ideal for those interested in drug discovery, molecular modeling, and computational chemistry, as well as those who are comfortable with Python programming language.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2fe0e102-4f40-59c4-b3cc-7b934bad4790", "id": 30352, "code": "LXQDTL", "public_name": "Nikita Churikov", "avatar": "https://pretalx.com/media/avatars/1676711748342_Fg9N3m6.jpg", "biography": "For 7 years I worked primarily with python and applied it to a variety of tasks: Machine Learning in NLP, Drug discovery and Computer vision. Developed REST APIs for these models, web applications using Flask and Django and cli apps with python std library or click library.\r\n\r\nI\u2019m also interested in Julia and rust languages in general and in C++ for computer vision.\r\n\r\nMy full CV as a developer is available on [LinkedIn](https://www.linkedin.com/in/churnikov/).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/P38Y7L/", "id": 32226, "guid": "568d4e9b-b615-51bd-8648-ab20432d4ec8", "date": "2023-08-17T10:55:00+02:00", "start": "10:55", "logo": null, "duration": "00:20", "room": "HS 120", "slug": "euroscipy-2023-32226-where-is-the-flock-the-use-of-graph-neural-networks-for-bird-identification-with-meteorological-radar-", "title": "Where is the flock? The use of graph neural networks for bird identification with meteorological radar.", "subtitle": "", "track": "Machine and Deep Learning", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "In this project we generate tools to identify birds within the spatial extent of a meteorological radar.  Using the opportunities created by modern dual-polarization radars we build graph neural networks to identify bird flocks. For this, the original point cloud data is converted to multiple undirected graphs following a set of predefined rules, which are then used as an input in graph convolutional neural network (Kipf and Welling, 2017, https://doi.org/10.48550/arXiv.1609.02907). Each node has a set of features such as range, x, y, z coordinates and several radar specific parameters e.g. differential reflectivity and phase shift which are used to build model and conduct graph-level classification. This tool will alleviate problem of manual identification and labelling which is tedious and time intensive. Going forward we also focus on using the temporal information in the radar data. Repeated radar measurements enable us to track these movements across space and time. This makes it possible for regional movement studies to bridge the methodological gap between fine-scale, individual-based tracking studies and continental-scale monitoring of bird migration. In particular, it enables novel studies of the roles of habitat, topography and environmental stressors on movements that are not feasible with current methodology. Ultimately, we want to apply the methodology to data from continental radar networks to study movement across scales.", "description": "In this project we generate tools and a python package to identify birds within the spatial extent of a meteorological radar.  Using the opportunities created by modern dual-polarization radars we develop graph neural networks (GNN) to identify bird flocks. For this, the original point cloud data is converted to multiple undirected graphs following a set of predefined rules. For example, each point of interest needs to have a label and a minimum number of neighbours within a specified range. Graphs are then used as an input in graph convolutional neural network (Kipf and Welling, 2017, https://doi.org/10.48550/arXiv.1609.02907). Each node has a set of features such as range, x, y, z coordinates and several radar specific parameters e.g. differential reflectivity and phase shift which are used to build model and conduct graph-level classification.  Model learns hidden layer representations that encode both local graph structure and features of nodes and is based on an efficient variant of convolutional neural network. \r\n\r\nThis tool will alleviate problem of manual identification and labelling which is tedious and time intensive. Going forward we also focus on using the temporal information in the radar data. Repeated radar measurements enable us to track these movements across space and time. This makes it possible for regional movement studies to bridge the methodological gap between fine-scale, individual-based tracking studies and continental-scale monitoring of bird migration. In particular, it enables novel studies of the roles of habitat, topography and environmental stressors on movements that are not feasible with current methodology. Ultimately, we want to apply the methodology to data from continental radar networks to study movement across scales. This project is a collaboration between the Netherlands eScience Center and the University of Amsterdam. This is an ongoing project and feedback provided by the community is highly valued. \r\n\r\nGit Repo of the package: https://github.com/point-cloud-radar/bird-cloud-gnn", "recording_license": "", "do_not_record": false, "persons": [{"guid": "cdc8c46f-296c-5c64-ad60-c5c01ab9dc10", "id": 30556, "code": "QVP8EU", "public_name": "Olga Lyashevska", "avatar": "https://pretalx.com/media/avatars/Portretten_eScienceCenter_2022_fotoAnneliesVerhelst_277_8YNP4pS.jpg", "biography": "Currently I work as Research Software Engineer at the Netherlands eScience Center in Amsterdam. Pior to that I worked as a Research Fellow at the Atlantic Technological University in Galway, Ireland. I have a passion for scientific programming, open source software and linux. I am also a lecturer and a PhD supervisor.", "answers": []}, {"guid": "d0ad7f04-4dab-598b-b3bd-7596d8e9c618", "id": 32516, "code": "3RVYDP", "public_name": "Abel Soares Siqueira", "avatar": "https://pretalx.com/media/avatars/me_9r7HH3K.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/K9MHRH/", "id": 31326, "guid": "c0237cb6-74d7-5fc6-ac47-7a442e289070", "date": "2023-08-17T11:45:00+02:00", "start": "11:45", "logo": null, "duration": "00:20", "room": "HS 120", "slug": "euroscipy-2023-31326-content-based-recommendation-system-for-the-examples-in-sphinx-gallery", "title": "Content-based recommendation-system for the examples in sphinx-gallery", "subtitle": "", "track": "Community, Education, and Outreach", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "The gallery of your project might group the examples by module, by use case, or some other logic. But as examples grow in complexity, they may be relevant for several groups. In this talk we discuss some possible solutions and their drawbacks to motivate the introduction of a new feature to sphinx-gallery: a content-based recommendation system.", "description": "Imagine a scikit-learn example on text clustering using silhouette scores, as a maintainer, would you assign it to the `sklearn.cluster`, the `sklearn.feature_extraction.text` or the `sklearn.metrics` group of examples? As a user where would you look for it?\r\n\r\nSome solutions such as adding human implemented tags have been proposed to cross-link examples that can be grouped by different logics, with the disadvantage of requiring maintenance and consensus. Instead we could have a recommender system based on similarity (nearest neighbors tf-idf model) to automatically link to the most relevant related content . This could be introduced at the end of each example.\r\n\r\nLibraries with several examples such as scikit-learn and matplotlib may benefit from this new feature.\r\n\r\nFor more information visit https://github.com/sphinx-gallery/sphinx-gallery/pull/1125", "recording_license": "", "do_not_record": false, "persons": [{"guid": "10a4ab97-6c56-521b-87a9-76a474e5034a", "id": 20333, "code": "HRFVLY", "public_name": "Arturo Amor", "avatar": "https://pretalx.com/media/avatars/52222317140_895f7057a8_o_cWueXBn.jpg", "biography": "I did my PhD in theoretical quantum physics at the National Autonomous University of Mex-\r\nico (UNAM). I currently work at the INRIA foundation as part of the scikit-learn consortium, mostly in charge of maintaining the scikit-learn documentation.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/Q9F8GA/", "id": 30948, "guid": "97a8b807-e617-5936-80a4-f29c3a303174", "date": "2023-08-17T13:30:00+02:00", "start": "13:30", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-30948-my-foray-from-scientific-python-into-the-pyodide-webassembly-universe", "title": "My foray from Scientific Python into the Pyodide / WebAssembly universe", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Pyodide is a Python distribution for the browser and Node.js based on WebAssembly / Emscripten.\r\nPyodide supports most commonly used scientific Python packages, like numpy, scipy, scikit-learn, matplotlib and there is growing interest to use it for improving package documentation through interactivity.\r\n\r\nIn this talk we will describe the work we have done in the past nine months to improve the state of Pyodide in a scientific Python context, namely:\r\n- running the scikit-learn and scipy test suites with Node.js to get a view of what currently works, what does not, and what can be hopefully be fixed one day\r\n- packaging OpenBLAS in Pyodide and use it for Pyodide scipy package to improve its stability, maintainability and performance\r\n- adding JupyterLite functionality to sphinx-gallery, which is used for example galleries of popular scientific Python package like scikit-learn, matplotlib, scikit-image, etc ...\r\n- adding the sphinx-gallery Jupyterlite functionality for scikit-learn example gallery\r\n\r\nWe will also mention some of the Pyodide sharp bits and conclude with some of the ideas we have to use it even more widely.", "description": "Pyodide is a Python distribution for the browser and Node.js based on WebAssembly. Pyodide supports most commonly used scientific Python packages, like numpy, scipy, scikit-learn, matplotlib and there is growing interest to use it for improving documentation through interactivity.\r\n\r\nIn this talk we will describe the work we have done in the past six months, including:\r\n- regularly running the scikit-learn and scipy test suites with Node.js to get  a view of what currently works, what does not, and what can be  hopefully be fixed one day\r\n- packaging OpenBLAS in Pyodide and use it for Pyodide scipy package to improve  its stability and maintainability\r\n- adding JupyterLite functionality to sphinx-gallery, which is used for example  galleries of popular scientific Python package like scikit-learn, matplotlib,  scikit-image, etc ...\r\n- adding the sphinx-gallery Jupyterlite functionality for scikit-learn example  gallery\r\n\r\nWe will also mention some of the Pyodide sharp bits and conclude with some of the ideas we have to use it even more widely.\r\n\r\nHere are the references for the work mentioned above:\r\n- running scipy and scikit-learn test suite inside Pyodide: https://github.com/lesteve/scipy-tests-pyodide and https://github.com/lesteve/scikit-learn-tests-pyodide\r\n- OpenBLAS Pyodide PR: https://github.com/pyodide/pyodide/pull/3331\r\n- sphinx-gallery JupyterLite-related PRs: https://github.com/pyodide/pyodide/pulls?q=is%3Apr+sort%3Aupdated-desc+author%3Alesteve\r\n- scikit-learn JupyterLite examples PR: https://github.com/scikit-learn/scikit-learn/pull/25887", "recording_license": "", "do_not_record": false, "persons": [{"guid": "2215d535-d2fc-5570-8835-1e541e4a967e", "id": 29607, "code": "N7TBEV", "public_name": "Lo\u00efc Est\u00e8ve", "avatar": "https://pretalx.com/media/avatars/loic-esteve_E0UpYb8.jpg", "biography": "Lo\u00efc has a background in Particle Physics, which is how he discovered Python towards the end of his PhD. After a few year stint in an investment fund of writing mostly C++ and as much Python as possible,\r\nhe was lured back to an academic environment at Inria.\r\n\r\nHe is a scikit-learn and joblib core contributor and has been involved in a number of Python open-source projects in the past 10 years, amongst which Pyodide, dask-jobqueue, sphinx-gallery and nilearn.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/EXVUBJ/", "id": 32136, "guid": "ac5371f1-3359-5da6-a6cc-caf702da23ad", "date": "2023-08-17T14:05:00+02:00", "start": "14:05", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-32136-myst-thebe-community-driven-tools-for-awesome-open-science-communication-with-jupyter-lite-backed-computation", "title": "MyST & Thebe: Community-driven tools for awesome open science communication with Jupyter[lite] backed computation", "subtitle": "", "track": "Community, Education, and Outreach", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Imagine a world where there are tools allowing any researcher to easily produce high quality scientific websites. Where it's trivial to include rich interactive figures that connect to Jupyter servers or run in-browser with `WASM` & `pyodide`, all from a local folder of markdown files and Jupyter notebooks.\r\n\r\nWe introduce MyST Markdown (https://mystmd.org/), a set of open-source, community-driven tools designed for open scientific communication. \r\n\r\nIt's a powerful authoring framework that supports blogs, online books, scientific papers, preprints, reports and journals articles. It includes `thebe` a minimal connector library for Jupyter, and `thebe-lite` that bundles a JupyterLite server with `pyodide` into any web page for in-browser `python`. It also provides publication-ready tex and pdf generation from the same content base, minimising the rework of publishing to the web and traditional services.", "description": "The MyST (Markedly Structured Text) project grew out of the ExecutableBooks team, developers of MyST Markdown and JupyterBook. Originally based on Sphinx and RST, over the past year the ExecutableBooks team has been working on a MyST Specification to coordinate development of the markup language and extensions across multiple languages & parsers (e.g. implementations in Python & Javascript). \r\n\r\nThe new javascript based MyST tools run directly in the browser, opening up new workflows for components to be used in web-based editors, directly in Jupyter and in JupyterLite. The libraries work with current MyST Markdown documents/projects and can export to LaTeX/PDF, Microsoft Word and JATS as well as multiple website templates using a modern React-based renderer. There are currently over 400 scientific journals that are supported through templates, with new LaTeX templates that can be added easily (using Jinja-based templating) and contributed via an open community repository (https://github.com/myst-templates).\r\n\r\nThebe lets you easily add interactive visualisations, reproducible figures and interactive code editors to any to static HTML web page -- backed by a kernel from a Jupyter server or an in-browser WASM kernel with thebe-lite.\r\n\r\nIt\u2019s a compact and versatile library that makes it easy to add Jupyter based interactivity, by default code blocks are turned into editors and made ready for execution, just by adding a couple of additional script tags. `thebe` enables the flexible interleaving of Jupyter cells and output areas with other content (e.g. from myst markdown files), maintaining the ability to run the underlying notebooks or individual code cells. \r\n\r\nOver the last year, `thebe` has seen major upgrades allowing it to be integrated into modern web frameworks and it have been integrated tightly with MyST's web themes, making it easy to add computation into any MyST based online article or publication. Now creating and deploying an interactive `ipywidgets` based figure as part of an online paper can be achieved in minutes.\r\n\r\nAnd while we've seen interactive figures in web based scientific papers before, the MyST toolchain democratizes their creation and makes it easy for any researcher to publish publication-quality web-based articles  from their desktop (or from CI) that the large online journals would need a post-production  web development teams to help deliver. MyST's architecture also allows for multiple researchers to contribute to lab websites and computational journals.\r\n\r\nIn our presentation we will give an overview of the MyST ecosystem, how to use MyST tools in conjunction with existing Jupyter Notebooks, markdown documents, and JupyterBooks to create interactive websites, books, blogs and scientific articles as well as professional PDFs. We give special attention to the additions around structured data, standards in publishing (e.g. efforts in representing Notebooks as JATS XML), rich frontmatter, bringing cross-references and persistent IDs to life with interactive hover-tooltips and making connections to Jupyter based and in-browser `python` kernels to run interactive figures through the additional of a few simple configuration options. We'll share some compelling examples of online papers and journals published with MyST.\r\n\r\nOur presentation is aimed at attendees who are looking to incorporate Jupyter Notebooks with other materials in new and novel ways - to create compelling scientific communication materials whether in the form of books, blogs or articles, for education or research. The talk will cover how and where `thebe` can be applied effectively, as well as going into some different configurations. Some knowledge of basic web development with HTML, will be beneficial for walk-through element of the talk as we'll cover some code and configuration, but we aim for the talk to be accessible by anyone interested in putting interactive scientific communication on the web, whether they develop themselves or not.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "773bdd69-6b9c-50e7-a258-b3d99461c1af", "id": 19630, "code": "EHAJXJ", "public_name": "Steve Purves", "avatar": "https://pretalx.com/media/avatars/steve-mugshot-square_m02nZxk.jpg", "biography": "I am a scientific software developer, data scientist/researcher and software product developer rolled into one. A team member of the Executable Books project where I work on thebe and CTO and co-founder of Curvenote where we are building tools and infrastructure for [much] better scientific communication and publishing.\r\n\r\nAn (electronic) engineer by background (Newcastle University, UK), I specialized in signal processing, computer vision, data science and machine learning and spent 20+ years helping both research and industry scientists (a lot of earth and geoscientists, but also data scientists in healthcare, finance, manufacturing, even dentists) build software to solve highly technical and scientific problems. I build apps that worked with huge datasets, 3d visualization and GPU-based HPC for server, desktops and the web.\r\n\r\nNow I'm applying all of my time and experience to building software that can help change how we communicate, re-use and build on scientific work for a better future.", "answers": []}, {"guid": "ef8db8fa-acd5-5315-b1f6-7bc281114c35", "id": 30519, "code": "9ENEUE", "public_name": "Rowan Cockett", "avatar": "https://pretalx.com/media/avatars/rowan_SPsKHEP.jpg", "biography": "Rowan is on the [Executable Books](https://executablebooks.org/) team where he develops MyST Markdown ([https://myst-tools.org](https://myst-tools.org/)) in the context of scientific writing. Rowan is also the CEO and cofounder of [Curvenote](https://curvenote.com), which is an interactive, online writing platform for science, engineering & research teams, with dedicated integrations to Jupyter. Rowan has a Ph.D. in computational geophysics from the University of British Columbia (UBC). While at UBC, Rowan helped start [SimPEG](https://simpeg.xyz), a large-scale simulation and parameter estimation package for geophysical processes (electromagnetics, fluid-flow, gravity, etc.), which is used in industry, national labs, and universities globally. He has won multiple awards for innovative dissemination of research and open-educational resources, including a geoscience modelling application, Visible Geology, that has been used by more than a million geoscience students to interactively explore conceptual geologic models.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/TVJEAD/", "id": 30643, "guid": "013cef8e-1524-5bdf-9676-04e23e634d12", "date": "2023-08-17T14:40:00+02:00", "start": "14:40", "logo": null, "duration": "00:20", "room": "HS 120", "slug": "euroscipy-2023-30643-transformations-in-three-dimensions", "title": "Transformations in Three Dimensions", "subtitle": "", "track": "Scientific Applications", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "Rigid transformation in 3D are complicated due to the multitude of different conventions and because they often form complex graphs that are difficult to manage. In this talk I will give a brief introduction to the topic and present the library pytransform3d as a set of tools that can help you to tame the complexity. Throughout the talk I will use examples from robotics (imitation learning, collision detection, state estimation, kinematics) to motivate the discussed features, even though presented solutions are useful beyond robotics.", "description": "This talk focuses on rotation and translation, that is, rigid transformations, in three  dimensions. There are various representations of these. We often combine several software components with different conventions. Furthermore, we usually combine multiple transformations that form complex graphs of transformations, and we are often interested in transformations that are not directly available, but can be computed from a combination of multiple transformations. Both problems can be handled with pytransform3d, a Python library for transformations in three dimensions.\r\n\r\npytransform3d offers...\r\n\r\n* operations for most common representations of rotation / orientation and\r\n  translation / position\r\n* conversions between those representations\r\n* clear documentation of conventions\r\n* tight coupling with matplotlib to quickly visualize (or animate)\r\n  transformations\r\n* the TransformManager which organizes complex chains of transformations\r\n* the UrdfTransformManager which is able to load transformations from URDF\r\n  files\r\n* a matplotlib-like interface to Open3D\u2019s visualizer to display geometries and\r\n  transformations\r\n\r\nI will present several features of the library in this talk and I will use examples from robotics for illustration, for example,\r\n\r\n* imitation learning - learning robotic motion from human demonstration\r\n* kinematics - translation of a human hand motion to a robotic hand\r\n* collision detection - between a robot arm and it's environment\r\n* state estimation - estimation of a robot's location and its uncertainty\r\n\r\nThere are several pitfalls that we will discuss as well.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "9b646967-61a1-56d2-8731-b349000e33ac", "id": 28670, "code": "33UUJG", "public_name": "Alexander Fabisch", "avatar": null, "biography": "Alexander Fabisch received his diploma degree in computer science from the University of Bremen in 2012. From 2012 to 2017 he worked as a researcher at the robotics research group of the University of Bremen and since 2017 he works at the Robotics Innovation Center of the German Research Center for Artificial Intelligence (DFKI). He obtained his doctoral degree from University of Bremen in 2020. His scientific interests are in the fields of machine learning and black-box optimization with robotic applications and a focus on learning manipulation behaviors.", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/HWKYXA/", "id": 31348, "guid": "09a99286-d76f-5b3b-83cc-80c2eec2080e", "date": "2023-08-17T15:30:00+02:00", "start": "15:30", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-31348-incidents-management-using-hawkes-processes-and-other-tech-aiops-projects-in-ing", "title": "Incidents management using Hawkes processes and other Tech AIOps projects in ING", "subtitle": "", "track": "Data Science and Visualisation", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "In this talk, we will discuss incident management using Hawkes processes within an IT infrastructure. We show how a model previously applied for earthquake predictions can help answer the question \u2018what caused what\u2019 in a major European bank.", "description": "ING as a European leading bank is continuously keeping track of its digital assets such as servers, network devices, and software programs. Asset management is a challenging task because of its complex dependencies and hierarchical nature. Despite these difficulties, many monitoring tools were successfully implemented in ING, including metric anomaly detection, resource capacity forecasting, IT structure optimization, and incident management with natural language processing of logs and events.\r\n\r\nIn this talk, we will discuss incident management with a Hawkes process. This model was previously successfully applied for earthquake predictions based on aftershocks and capturing the dynamics of order books in finance. The Hawkes process model is well-defined mathematically and can process a large volume of data to uncover Granger causal structures in data if implemented appropriately. We show how Hawkes processes help answer the question \u2018what caused what\u2019 within the IT infrastructure of a major European bank.\r\n\r\nNOTE: This talk will focus mostly on explaining the mathematics behind Hawkes processes. Although we assume no prior knowledge, it might be nice to refresh on Poisson processes, which will be our starting point (although we recap them in our presentation as well).\r\n\r\nSee for instance: https://builtin.com/data-science/poisson-process", "recording_license": "", "do_not_record": false, "persons": [{"guid": "0e4b2ff5-ebc3-5441-b6c9-9dd5ff7d95a1", "id": 21836, "code": "FQUEVZ", "public_name": "Arkadiusz Trawi\u0144ski", "avatar": "https://pretalx.com/media/avatars/Avatar_Blur_a5QqPk4.png", "biography": "Product Lead of Data Scientist team in IT monitoring department of ING. PhD in Physics, BSc in Computer Science.", "answers": []}, {"guid": "7b0148be-0f65-5e13-addb-cf9038bac41f", "id": 30041, "code": "7MFHEK", "public_name": "Joost G\u00f6bbels", "avatar": "https://pretalx.com/media/avatars/Reshaped_profile_picture_iStfhLo.jpg", "biography": "Thesis intern at the AI4Fintech research lab at ING. Joint MSc student Mathematics & Computer Science. BSc in Mathematics", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/STXCKT/", "id": 31263, "guid": "3fc97976-d583-5686-96fe-340a074cc1d2", "date": "2023-08-17T16:05:00+02:00", "start": "16:05", "logo": null, "duration": "00:30", "room": "HS 120", "slug": "euroscipy-2023-31263-the-helmholtz-analytics-toolkit-heat-and-its-role-in-the-landscape-of-massively-parallel-scientific-python", "title": "The Helmholtz Analytics Toolkit (Heat) and its role in the landscape of massively-parallel scientific Python", "subtitle": "", "track": "High Performance Computing", "type": "Talk (25 mins + Q&A)", "language": "en", "abstract": "Handling and analyzing massive data sets is highly important for the vast majority of research communities, but it is also challenging, especially for those communities without a background in high-performance computing (HPC). The Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python, targeting the usage by non-experts in HPC.\r\n\r\nIn this presentation, we will provide an overview of Heat's current features and capabilities and discuss its role in the ecosystem of distributed array computing and machine learning in Python.", "description": "**co-authors:**  *C. Comito (FZJ), M. G\u00f6tz (KIT),  J. P. Guti\u00e9rrez Hermosillo Muriedas (KIT), B. Hagemeier (FZJ), P. Knechtges (DLR), K. Krajsek (FZJ), A. R\u00fcttgers (DLR), A. Streit (KIT), M. Tarnawa (FZJ)*\r\n\r\nWhen it comes to enhancing exploitation of massive data, machine learning methods are at the forefront of researchers\u2019 awareness. Much less so is the need for, and the complexity of, applying these techniques efficiently across large-scale, memory-distributed data volumes. In fact, these aspects typical for the handling of massive data sets pose major challenges to the vast majority of research communities, in particular to those without a background in high-performance computing. Often, the standard approach involves breaking up and analyzing data in smaller chunks; this can be inefficient and prone to errors, and sometimes it might be inappropriate at all because the context of the overall data set can get lost. \r\n\r\nThe Helmholtz Analytics Toolkit (Heat) library offers a solution to this problem by providing memory-distributed and hardware-accelerated array manipulation, data analytics, and machine learning algorithms in Python. The main objective is to make memory-intensive data analysis possible across various fields of research ---in particular for domain scientists being non-experts in traditional high-performance computing who nevertheless need to tackle data analytics problems going beyond the capabilities of a single workstation. The development of this interdisciplinary, general-purpose, and open-source scientific Python library started in 2018 and is based on collaboration of three institutions (German Aerospace Center DLR, Forschungszentrum J\u00fclich FZJ, Karlsruhe Institute of Technology KIT) of the Helmholtz Association. The pillars of its development are...\r\n\r\n* ...to enable memory distribution of n-dimensional arrays, \r\n* to adopt PyTorch as process-local compute engine (hence supporting GPU-acceleration),\r\n* to provide memory-distributed (i.e., multi-node, multi-GPU) array operations and algorithms, optimizing asynchronous MPI-communication (based on mpi4py) under the hood, and \r\n* to wrap functionalities in NumPy- or scikit-learn-like API to achieve porting of existing applications with minimal changes and to enable the usage by non-experts in HPC. \r\n\r\nIn this talk we will give an illustrative overview on the current features and capabilities of our library. Moreover, we will discuss its role in the existing ecosystem of distributed computing in Python, and we will address technical and operational challenges in further development.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "fc6175de-d43c-51c4-8837-589be3fe5281", "id": 29958, "code": "FU9BXF", "public_name": "Fabian Hoppe", "avatar": "https://pretalx.com/media/avatars/DSC_7193_-_Kopie_yHRGBo4.jpg", "biography": "I recently obtained a PhD in numerical mathematics from the university of Bonn. Currently, I am postdoctoral researcher in the Scientific Machine Learning group at the Institute for Software Technology of the German Aerospace Center (DLR).", "answers": []}], "links": [], "attachments": [], "answers": []}], "HS 119 - Maintainer track": [{"url": "https://pretalx.com/euroscipy-2023/talk/CB9WMH/", "id": 35390, "guid": "4f3edd0c-327f-5866-9db3-b1d670459db7", "date": "2023-08-17T10:30:00+02:00", "start": "10:30", "logo": null, "duration": "01:30", "room": "HS 119 - Maintainer track", "slug": "euroscipy-2023-35390-interoperability-in-the-scientific-python-ecosystem", "title": "Interoperability in the Scientific Python Ecosystem", "subtitle": "", "track": "Scientific Applications", "type": "Maintainer track long", "language": "en", "abstract": "This slot will cover the effort regarding interoperability in the scientific Python ecosystem. Topics:\r\n\r\n- Using the Array API for array-producing and array-consuming libraries\r\n- DataFrame interchange and namespace APIs\r\n- Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem\r\n- Entry Points: Enabling backends and plugins for your libraries\r\n\r\n### Using the Array API for array-producing and array-consuming libraries\r\n\r\nAlready using the Array API or wondering if you should in a project you maintain? Join this maintainer track session to share your experience and exchange knowledge and tips around building array libraries that implement the standard or libraries that consume arrays.\r\n\r\n### DataFrame-agnostic code using the DataFrame API standard\r\n\r\nThe DataFrame Standard provides you with a minimal, strict, and predictable API, to write code that will work regardless of whether the caller uses pandas, polars, or some other library.\r\n\r\n### DataFrame Interchange protocol and Apache Arrow\r\n\r\nThe DataFrame interchange protocol and Arrow C Data interface are two ways to interchange data between dataframe libraries. What are the challenges and requirements that maintainers encounter when integrating this into consuming libraries?\r\n\r\n### Entry Points: Enabling backends and plugins for your libraries\r\n\r\nIn this talk, we will discuss how NetworkX used entry points to enable more efficient computation backends to plug into NetworkX", "description": "### Using the Array API for array-producing and array-consuming libraries\r\n\r\nThis session is for maintainers of projects that either implement the Array API (Numpy, cupy, pytorch, etc) or projects that use Array API inputs (scikit-learn, scipy, etc). Or maybe you are wondering if you should start investing in Array API for your project.\r\n\r\nThe Array API standard aims to specify a common API for multidimensional arrays. Solving the problem of subtle API differences between the many array libraries that exist. This means it provides a minimum set of functions and behaviours for array libraries to implement. As a result array consuming libraries do not need code to handle the slight differences, they can rely on these functions and behaviours to exist and be standard compliant in all array libraries.\r\n\r\nThe Array API standard is not yet in widespread use. Adoption across the ecosystem has only just started.\r\n\r\nThis session is a place to discuss and share your experiences in using the Array API standard for your array library or your library that consumes arrays.\r\n\r\n### Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem\r\n\r\nThe Apache Arrow (https://arrow.apache.org/) project specifies a standardized language-independent columnar memory format for tabular data. It enables shared computational libraries, zero-copy shared memory, efficient (inter-process) communication without serialization overhead, etc. Nowadays, Apache Arrow is supported by many programming languages and projects, and is becoming the de facto standard for tabular data.\r\n\r\nBut what does that mean in practice? There is a growing set of tools in the Python bindings, PyArrow, and a growing number of projects that use (Py)Arrow to accelerate data interchange and actual data processing. This talk will give an overview of the recent developments both in Apache Arrow itself as how it is being adopted in the PyData ecosystem (and beyond) and can improve your day-to-day data analytics workflows.\r\n\r\n### Entry Points: Enabling backends and plugins for your libraries\r\n\r\nAs a maintainer of an open source library, you always need to wrestle with the fact that there are new experimental things which could really help your users, but it may just be too experimental for now (especially for old projects).\r\n\r\nThere are always questions like:\r\n\r\n-   If we add this new requirement, we can use the new X feature, but now we depend on that library.\r\n-   A new change will help a lot of users, but this will utterly destroy all the code written using the library in the last 20 years.\r\n-    A new fork comes out, splits the community.\r\n-   You don\u2019t want to write and maintain C/Rust/FORTRAN and want to ship that bit to other packages.\r\n\r\nand many more!\r\n\r\nWith this talk, we would explore an option of providing your user community a workflow to develop plugins directly for your package.\r\n\r\nWe will look at an example case study from NetworkX, specifically using the entry points mechanism for plugin discovery. In NetworkX these plugins are currently used to swap in the computation bits.\r\n\r\n-    (2 minutes) Quick introduction about NetworkX and GraphBLAS\r\n-    (3 minutes) Quick Introduction about entry_points\r\n-    (5 minutes) Use NetworkX API - but get the speed of GraphBLAS\r\n-    (5 minutes) Demo-ing the plugin mechanism and implementation details", "recording_license": "", "do_not_record": false, "persons": [{"guid": "22b2e092-09d8-5036-b5e6-ea4cda0fff99", "id": 222, "code": "G9FDBT", "public_name": "Tim Head", "avatar": "https://pretalx.com/media/avatars/9e5755d13920a75666697d385b9be6b7_neSn2o4.jpg", "biography": "I contribute to scikit-learn. In the past I helped build mybinder.org and scikit-optimize. Way back in the history of time I was a particle physicist at CERN and Fermilab.", "answers": []}, {"guid": "18c9d15e-9959-521b-ba27-bc3c849b197e", "id": 30173, "code": "UAM73R", "public_name": "Mridul Seth", "avatar": "https://pretalx.com/media/avatars/picture_zEBWWdC.jpg", "biography": "I am currently working on the NetworkX open source project (work funded through a grant from Chan Zuckerberg Initiative!). Also collaborating with folks from the Scientific Python project (Berkeley Institute of Data Science), Anaconda Inc. Before this I used to work on the GESIS notebooks and gesis.mybinder.org.\r\nI am also interested in the development and maintenance of the open source data & science software ecosystem. I try to help around with the Scientific Open Source ecosystem wherever possible. To share my love of Python and Network Science, I have presented workshops at multiple conferences like PyCon, (Euro)SciPy, PyData London and many more!", "answers": []}, {"guid": "91114ee9-3e12-54e9-8119-9813674ba951", "id": 1530, "code": "NEUMLP", "public_name": "Olivier Grisel", "avatar": "https://pretalx.com/media/avatars/ogrisel_portrait_870x550_PMry4Oq.jpg", "biography": "Machine Learning software engineer at Inria and member of the maintainers' team of the scikit-learn open source project.", "answers": []}, {"guid": "2de17fba-7b7a-557b-9ef9-9ce34847c0b9", "id": 30110, "code": "LPYNCD", "public_name": "Franck Charras", "avatar": "https://pretalx.com/media/avatars/065b203af552cb548fa015aafe14b3ae_daJEz44.jpg", "biography": "I graduated as a machine learning research engineer in 2016, with a specialization in NLP. I co-founded Sancare a start-up company that aims at bringing NLP-based solutions for medical data analysis to hospitals, and that has made a place for itself in the market with a performant NLP-powered billing assistant for medical stays. I'm now working at INRIA, France as a Machine Learning Research Engineers, focused on performance computing.", "answers": []}, {"guid": "8c4f09e1-846b-5229-a564-fdb188f27fe0", "id": 30521, "code": "QKVYNA", "public_name": "Sebastian Berg", "avatar": null, "biography": "Sebastian Berg is a NumPy maintainer and steering council member working at NVIDIA. He started contributing to NumPy during his undergrad and PhD and Physics and continued working on NumPy at the Berkeley Institute for Data Science before continuing to contribute at NVIDIA.", "answers": []}, {"guid": "7e876587-827f-57eb-8ec2-ba1bbb58a7f3", "id": 75, "code": "7VUXWM", "public_name": "Joris Van den Bossche", "avatar": "https://pretalx.com/media/avatars/profile_Rc56sfi.png", "biography": "I am a core contributor to Pandas and Apache Arrow, and maintainer of GeoPandas. I did a PhD at Ghent University and VITO in air quality research and worked at the Paris-Saclay Center for Data Science. Currently, I work at Voltron Data, contributing to Apache Arrow, and am a freelance teacher of python (pandas).", "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/73E7GM/", "id": 30959, "guid": "ec36543b-abd6-5e97-83d5-f07dab74d3c5", "date": "2023-08-17T15:30:00+02:00", "start": "15:30", "logo": null, "duration": "00:45", "room": "HS 119 - Maintainer track", "slug": "euroscipy-2023-30959-the-graphic-server-protocol-a-joint-effort-to-facilitate-the-interoperability-of-python-scientific-visualization-libraries", "title": "The Graphic Server Protocol, a joint effort to facilitate the interoperability of Python scientific visualization libraries", "subtitle": "", "track": "Data Science and Visualisation", "type": "Maintainer track", "language": "en", "abstract": "The graphic server protocol is a proposal to mutualize efforts across scientific visualization libraries, languages and platforms such as to provide a unified intermediate-level protocol to render graphical primitives independently of the specifics of the high-level visualization interfaces.", "description": "Scientific figures, as complex as they might be, always end up being made of a (large) set of elementary graphical components, namely points (or markers), lines, glyphs (text), polygons, and volumes. Glyphs, markers and polygons can be further decomposed into triangles such that, apart from the specific case of 3D volume rendering, the elementary components of most scientific figures are essentially points, lines and triangles.\r\n\r\nWhen designing a scientific figure, one rarely manipulates these components explicitly. Instead, one uses high-level plotting functions that ultimately produce these elementary components. What defines a scientific visualization library is the choice and the definition of these high-level functions. The library may provide high-level functions that allow them to quickly design a figure (e.g. [ggplot], [vega], [seaborn]), or lower-level functions that allows them to further tune the figure (e.g. [matplotlib], [vispy]). In the most extreme case, the user may even be provided a total freedom at the price of complexity ([TikZ]). There is no definitive one-size-fits-all API because some users prefer the simplicity of a high-level interface that makes most decisions transparently, while others prefer to have a total control of all aspects of the figure.\r\n\r\nIn all cases, however, the rendering task is the same: drawing points, lines, markers, glyphs, polygons, meshes and volumes as efficiently as possible, possibly taking advantage of low-level libraries and hardware. Ultimately, rendering represents significantly redundant efforts across visualization libraries, and it becomes increasingly complex when considering low-level hardware-accelerated graphics interfaces such as OpenGL, Metal, DirectX, or Vulkan. Our experience with the development of hardware-accelerated scientific visualization libraries such as [glumpy], galry, and [vispy] has shown that this complexity is a significant obstacle to the development of an efficient and scalable scientific visualization library in Python.\r\n\r\nThe graphic server protocol is a proposal to mutualize efforts across scientific visualization libraries, languages and platforms such as to provide a unified intermediate-level protocol to render graphical primitives independently of the specifics of the high-level visualization interfaces. The goal is not to provide yet another graphical library, but a foundational protocol-based architecture that any scientific visualization library can reuse.\r\n\r\nDuring this talk, we will introduce the graphical server protocol we have designed and show a couple of early examples using two different rendering backends: matplotlib and [Datoviz] (which is based on Vulkan, a low-level interface for high-performance rendering and computing using GPUs). These two backends will render the same figure with the same visual quality using the same Python script, even though the Vulkan backend is expected to be significantly faster than the matplotlib one. Currently, the matplotlib instance can produce 3D graphics (meshes, surfaces and volumes) even though it is limited by the absence of a depth buffer that prevents proper sorting. The roadmap for the protocol is still largely open and we would like to gather feedback from developers of other scientific visualization libraries to potentially create an international steering group that could help define the roadmap.\r\n\r\n[ggplot]: https://ggplot2.tidyverse.org/reference/ggplot.html\r\n[vega]: https://vega.github.io/vega/\r\n[seaborn]: https://seaborn.pydata.org/\r\n[matplotlib]: https://matplotlib.org/\r\n[vispy]: https://vispy.org/\r\n[tikz]: https://en.wikibooks.org/wiki/LaTeX/PGF/TikZ\r\n[Datoviz]: https://datoviz.org/\r\n[glumpy]: http://glumpy.github.io/", "recording_license": "", "do_not_record": false, "persons": [{"guid": "1003d587-47d0-515c-ba6f-1b0a5b310060", "id": 28750, "code": "EGADXL", "public_name": "Nicolas Rougier", "avatar": "https://pretalx.com/media/avatars/Nicolas_Rougier-Gravatar_LpFB0XC.jpg", "biography": null, "answers": []}], "links": [], "attachments": [], "answers": []}, {"url": "https://pretalx.com/euroscipy-2023/talk/3D7SAQ/", "id": 31555, "guid": "cb41fe3a-a50a-52ac-92b8-dd5cd913fd91", "date": "2023-08-17T16:15:00+02:00", "start": "16:15", "logo": null, "duration": "00:20", "room": "HS 119 - Maintainer track", "slug": "euroscipy-2023-31555-model-documentation-the-keystone-towards-inclusivity-and-accessibility", "title": "Model Documentation: The Keystone towards Inclusivity and Accessibility", "subtitle": "", "track": "Community, Education, and Outreach", "type": "Talk (15 mins + Q&A)", "language": "en", "abstract": "The use of AI documentation such as repository cards (model and dataset cards), as a means of transparently discussing ethical and inclusive problems that could be found within the outputs and/or during the creation of AI artefacts, with the aim of inclusivity, fairness and accountability, has increasingly become part of the ML discourse. As limitations and risks centred documentation approaches have become more standard and anticipated with launches of new development e.g Chatgpt/GPT-4 system card and other LLM model cards. \r\n\r\nThis talk highlights the inclusive approaches that the broader open source community could explore when thinking about their aims when creating documentation.", "description": "In this talk we will first cover some of the current literature and standard approaches of documentation found within the open source community and within the ethical/ AI space. Building on this overview, we will then detail how to build on the strengths of the open source  community and its ability to bridge the gap between academia and research. From which we will map to more inclusive and ethics focused components found in AI model documentation practises, and how they could be incorporated within open source documentation methods. \r\n\r\nBy the end of this talk we would have not only identified the limitations within current open source documentation practises, we will also explore how centering fairness, ethics and inclusivity can create richer documentation that is more wholly inclusive of the open source community it represents.", "recording_license": "", "do_not_record": false, "persons": [{"guid": "276b930c-9892-5ff4-9b8d-265292a85aa9", "id": 30130, "code": "AFB7PG", "public_name": "Ezi Ozoani", "avatar": "https://pretalx.com/media/avatars/Screenshot_2023-04-14_at_18.11.00_CElGney.png", "biography": "Research engineer excited and working on applied AI research and quantum ML and the intersections of ethical and inclusive practises.", "answers": []}], "links": [], "attachments": [], "answers": []}]}}, {"index": 5, "date": "2023-08-18", "day_start": "2023-08-18T04:00:00+02:00", "day_end": "2023-08-19T03:59:00+02:00", "rooms": {}}]}}}