PyData Paris 2025

My software development journey began with the open-source and the Apache Arrow project. In 2021, I made my first contribution to the Arrow R package—an experience that sparked my interest in software development and open-source collaboration. During my internship at Quansight, I was introduced to the Python DataFrame API standard, which deepened my understanding of interoperability challenges.

In 2022, after over a year of contributions, I became an Apache Arrow committer, primarily focusing on the Python implementation. I continued my work as a PyArrow maintainer at Voltron Data until mid-2024.

Apache Arrow remains the project I’m most passionate about, and I’m still actively involved in its development as a freelancer.

You Don’t Have to Be an Expert: Stories from the Open Source Frontlines

Alexander CS Hendorf

Alexander C. S. Hendorf is an AI strategist who helps organizations, particularly in legacy-heavy industries, turn data-driven strategies into sustainable change. His unique perspective is shaped by a 20-year journey that began not in a lab, but in the music industry, where as COO he led a start-up through its digital transformation. A Python Software Foundation Fellow and founder of the non-profit Pioneers Hub, he is dedicated to building the human-centric systems and inclusive communities needed to thrive in the era of autonomy.

Open-source Business

Alexandre Abraham

Alexandre Abraham is Lead AI Scientist at Neuralk-AI, where he works on building the first deep tabular foundation model for retail applications. Throughout his career, he has applied cutting-edge machine learning to real-world industrial challenges—modeling user behavior at Criteo, developing intelligent labeling workflows at Dataiku, conducting health data research at Inria, and working on causal inference using national health databases at Implicity. His expertise centers on unsupervised learning, human-in-the-loop systems, and tabular data in production environments.

An active contributor to the open-source community, Alexandre is the author of several widely used tools, including nilearn for neuroimaging, and CardinAL and OpenAL, benchmarks for evaluating active learning strategies. He is also committed to education—after a decade of teaching at EPITA, he now trains public-sector decision-makers at the Institut des Hautes Études du Ministère de l'Intérieur.

Move beyond academia: Introducing an industry-first tabular benchmark

Alexis Bondu

Alexis Bondu is a Machine Learning researcher at Orange Research. His fields of research are varied and cover machine learning (Auto ML), active learning, weakly supervised learning, time series, data streams and early decision making. He is also responsible for the research part of the Khiops project, which is an Auto ML solution developed over the last twenty years in-house at Orange, and which has now been distributed as Open Source for around two years. The aim of this research work is to prepare the new functionalities and algorithms that will appear in future versions of Khiops.

Unlock the full predictive power of your multi-table data

Alexis Placet

Alexis is a C++ Scientific Software Engineer at Quantstack.
He obtained a Master degree in Computer Science from l'École Supérieure d'Électronique de l'Ouest of Angers in 2012.
Before joining Quanstack, Alexis worked in various companies covering a large spectrum of domains all dominated by performance constraints: signal processing, image processing, 3D meshes and metadata processing ...

Sparrow, Pirates of the Apache Arrow

Alix Tiran-Cappello

I work as a Data Scientist at Renault Digital. My missions encompass:
- Maintaining our MLOPS pipeline
- Co-animating weekly best-practices sessions for data scientists (40+ attendees)
- Acting as DevOps relay inside the data science Team
- Working with plants, industry and manufacturing plants to reduce their cost of operation

Previously, I worked during 6 years in Neuroscience research, in various public institutions.

How to do real TDD in data science? A journey from pandas to polars with pelage!

Anita Graser

Anita Graser is a spatial data scientist, open-source advocate, and author. Her background is in geoinformatics, and she is currently working as a Senior Scientist in the Data Science & Artificial Intelligence research group at AIT Austrian Institute of Technology in Vienna. Anita also serves on the QGIS project steering committee and teaches Python for QGIS at UNIGIS Salzburg. She is the lead developer of MovingPandas (a Python library for analysing movement data) and has developed tools such as the Time Manager plugin for QGIS. She has received multiple awards, including the international OSGeo Sol Katz award for her contributions to open-source geographic information systems. Anita has published several books about QGIS, including "Geocomputation with Python", “Learning QGIS” and “QGIS Map Design” and writes a popular spatial data science blog at https://anitagraser.com/.

Building Data Science Tools for Sustainable Transformation

Antoine Collas

Postdoctoral researcher at Inria Saclay

Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

Antoine Prouvost

Antoine is a Scientific Software Engineer at Quantstack where he led devlopment efforts on the Mamba package manager, as well as on the Xeus-Octave Jupyter kernel and Xtensor. He obtained a Ph.D. in combinatorial optimization and machine learning from École Polytechnique de Montréal in 2021 where he worked at the interplay of deep learning and operations research. During that time, he developed Ecole a mixed Python/C++ library to ease the research on the use of machine learning methods for decision making inside mathematical solvers.

Expanding Programming Language Support in JupyterLite

Anutosh Bhat

Open Source Contractor at QuantStack working on the projects revolving around the stack revolving Jupyter, LLVM and WASM.

xeus-cpp, the new C++ kernel for Jupyter.

Arjun Verma

Arjun is a Scientific Software Development Intern at QuantStack.

Collaborative GIS editing in JupyterLab

Arnaud Miribel

Beyond Prototyping: Building Production-Level Apps with Streamlit

Chris Kucharczyk

I'm Chris Kucharczyk, a data scientist and data visualization designer. I live in Oxfordshire, UK.

I currently work at DrivenData, a social enterprise developing machine learning solutions to social impact problems. We host data science competitions and offer data science consulting services.

How to make public data more accessible with "baked" data and DuckDB

Christophe Dervieux

Christophe Dervieux is an open source software engineer at Posit PBC, where he has worked for five years as a core developer on Quarto. With over a decade of experience using data science tools for publication—including previous work with R Markdown in the R community—Christophe brings deep expertise in reproducible research and technical communication. He is an active contributor to the open source ecosystem and is passionate about helping data practitioners of all levels share their work more effectively and reproducibly.

From Jupyter Notebook to Publish-Ready Report: Effortless Sharing with Quarto

Cédric Couralet

Cédric Couralet, Data Scientist at Insee, is an open-source enthusiast, with expertise in software architecture and secure system design.

torchFastText: Modernizing Text Classification at Insee with PyTorch-based models

David Brochart

David Brochart is the main author of pycrdt, a Python library providing bindings to Yrs, the Rust port of Yjs, a popular implementation of CRDTs in JavaScript. While pycrdt is extensively used in Jupyter for real-time collaboration, it can be used to implement distributed data structures, allowing to share data without using locks usually associated with multithreading.

Parallel processing using CRDTs

Davide De Marchi

Davide De Marchi is a researcher and software engineer specializing in geospatial big data. He has significant experience in Big Data, Cloud Computing, GIS, Remote Sensing, and Data Visualization. His career includes contributions to the design and implementation of interactive visualization tools, notably at the European Commission - Joint Research Centre where he has been a key developer of the BDAP platform. Earlier in his career, he gained substantial experience in the development of geospatial data processing software and served as an adjunct professor at the University of Urbino

Meta-Dashboards: Accelerating Geospatial Web Apps Creation with Voilà

Domagoj Marić

Domagoj Marić graduated from the Faculty of Electrical Engineering and Computing in Zagreb, where he initially specialized in the field of information security, while towards the end of his studies, he shifted more deeply into the field of data science with a focus on web content extraction (web scraping/crawling). He began his business career at Megatrend poslovna rješenja, where he worked on the development of Python applications and data solutions with a focus on creating virtual assistants and later as the head of the data science department. That role continued through his career at Comping. Today, he works at Pontis Technology as the AI customer delivery manager, leading the delivery of projects in the area of natural language processing, computer vision, predictive analytics and generative AI. In addition to responsibilities in the data science domain, he also has experience and works as a lecturer in the field of programming.

Modern Web Data Extraction: Techniques, Tools, Legal and Ethical Considerations

Elizaveta Clouet

Data Scientist with a strong interest in NLP techniques. Elizaveta is currently working at Hellowork on the projects including document analysis, named entity recognition and recommendation systems.

Balancing Privacy and Utility: Efficient PII Detection and Replacement in Textual Data

Emanuele Fabbiani

Emanuele is an engineer, researcher, and entrepreneur with a passion for artificial intelligence.

He earned his PhD by exploring time series forecasting in the energy sector and spent time as a guest researcher at EPFL in Lausanne. Today, he is co-founder and Head of AI at xtream, a boutique company that applies cutting-edge technology to solve complex business challenges.

Emanuele is also a contract professor in AI at the Catholic University of Milan. He has published eight papers in international journals and contributed to over 30 international conferences worldwide. His engagements include AMLD Lausanne, ODSC London, WeAreDevelopers Berlin, PyData Berlin, PyData Paris, PyCon Florence, the Swiss Python Summit in Zurich, and Codemotion Milan.

Emanuele has been a guest lecturer at Italian, Swiss, and Polish universities.

Advanced Polars: Lazy Queries and Streaming Mode

Emilien SCHULTZ

Research engineer/data scientist at CREST (ENSAE, Palaiseau) and member of the Computational Social Science group at Institut Polytechnique de Paris (CSS@IPP).

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

Emmanuel FARHI

Engineer and Doctor in solid state physics, I have worked for 20 years in the neutron scattering community, and shifted to synchrotron radiation. With a strong expertise in scientific computing, my duty is now to help the synchrotron SOLEIL experimental beam-lines to handle huge data. Our group relies on open-source software deployed over data processing services.

Fighting against the instability : Debian Science at the synchrotron SOLEIL

Gravin Florent

CTO at Camptocamp Geospatial, I have been working for 20 years + in geospatial data open source ecosystem and I am passionate about the technologies all around.
Living in Chambéry, I love maps, very useful to prepare outdoor activities =)

A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

Guillaume Lemaitre

Skrub: machine learning for dataframes

Hans Fangohr

Hans Fangohr is a computational scientist and open source advocate. He is heading the scientific support unit Computational Science at the Max Planck Institute for Structure and Dynamics of Matter in Hamburg, Germany, and is Professor of Computational Modelling at the University of Southampton in the United Kingdom. He is working on research software engineering, including high performance computing, data analysis and appropriate software engineering methods in computational science. He has contributed to open source software through tools such as Nmag, Ubermag, Postopus and NBVAL.

Reproducible software provisioning for high performance computing (HPC) and research software engineering (RSE) using Spack

Ian Thomas

Expanding Programming Language Support in JupyterLite

Isabel Paredes

Isabel Paredes is a software developer at QuantStack.

Expanding Programming Language Support in JupyterLite

Jeremy Tuloup

Technical Director at QuantStack and Project Jupyter core developer and maintainer (JupyterLab, Jupyter Notebook, Voilà Dashboards).

Browser-based AI workflows in Jupyter

Johan Mabille

Johan Mabille is a Technical Director specialized in high-performance computing in C++. He holds a master's degree in computer science from Centrale-Supelec. As an open source developer, Johan coauthored xtensor , xeus , and xsimd.

He leads the C++ team at QuantStack, where he oversees the development and maintenance of mamba, sparrow, and the Jupyter Xeus project.

Johan has also made significant contributions to JupyterLab.

Prior to joining QuantStack, Johan worked as a quant developer at HSBC.

xeus-cpp, the new C++ kernel for Jupyter.
Sparrow, Pirates of the Apache Arrow

Johannes Rieke

I'm a product manager for Streamlit and responsible for all features in the open-source library. My background is in physics, neuroscience, and machine learning.

Beyond Prototyping: Building Production-Level Apps with Streamlit

Julien Boelaert

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

Justine BEL-LETOILE

Justine is a data scientist at Hellowork, the French leader in talent aquisition, job search and course search tech. She spent the last 10+ years enjoying machine learning, python and other data science fun stuff in various fields. Her current work includes a good deal of natural language processing.

Balancing Privacy and Utility: Efficient PII Detection and Replacement in Textual Data

Jérôme Dockès

I am a research engineer at Inria working on open-source Python packages for data-science.

Skrub: machine learning for dataframes

Laurent Direr

I'm a freelance web developer helping small teams ship reliable software. I've been working with Python for 10+ years and enjoy automating work for other developers.

These days I'm very interested in local-first software technologies.
I attended the Recurse Center (a programming retreat) in 2018.

GitHub profile
Blog

Code as Data: A Practical Introduction to Python’s Abstract Syntax Tree

Lex Avstreikh

Former Creative Director with extended expertise in product and strategy, Lex Avstreikh now works as the Head of Strategy at Hopsworks; a Swedish startup at the forefront of machine learning infrastructure. He focuses on identifying pivotal market trends and executing strategic initiatives that secure and advance Hopsworks’ position as a global leader in the ML industry.

Building Resilient (ML) Pipelines for MLOps

Louis Le Dain

Move beyond academia: Introducing an industry-first tabular benchmark

Loïc Estève

Loïc has a Particle Physics background, which is how he discovered Python towards the end of his PhD.

He is a scikit-learn and joblib core contributor and has been involved in a number of Python open-source projects in the past 10 years, amongst which Pyodide, dask-jobqueue, sphinx-gallery and nilearn.

PyPI in the face: running jokes that PyPI download stats can play on you

Luc-Aurélien Gauthier

I’m a machine learning specialist with a background in both research and industry. After completing a PhD in machine learning, I applied my expertise in industrial settings at Safran and Orange, focusing on anomaly prediction, defect detection, and fraud analysis.

Unlock the full predictive power of your multi-table data

Lucas Colley

I am an undergraduate student studying Computer Science & Philosophy at the University of Oxford. Currently, I am a Mentee @ prefix.dev for European Summer of Code, working on Pixi and rattler-build.

I am also a maintainer of SciPy and array-api-extra, a member of the Consortium for Python Data API Standards, and a founding member of quantity-dev.

A Hitchhiker's Guide to the Array API Standard Ecosystem

Marie Sacksick

Currently Product Engineer at Probabl, Marie is also co-organizer of Women in Machine Learning and Data Science Paris.

Enhancing Machine Learning Workflows with skore

Martin Lang

Martin Lang is a computational scientist working at the Max Planck Institute for the Structure and Dynamics of Matter, Hamburg, Germany. He has a PhD in physics from the University of Southampton, UK.

Reproducible software provisioning for high performance computing (HPC) and research software engineering (RSE) using Spack

Martin Renou

Martin Renou is a Technical Director at QuantStack and a maintainer of Project Jupyter. Among other projects Martin is a core team member of the ipywidgets project and maintains many Jupyter widget packages such as ipyleaflet, ipydatagrid, ipygany, ipycanvas, and bqplot. He is a co-creator of the Voilà dashboarding system, and the xeus-python kernel.

Collaborative GIS editing in JupyterLab

Meilame Tayebjee

Data Scientist at the Innovation Lab, Insee

torchFastText: Modernizing Text Classification at Insee with PyTorch-based models

Miklos Erdelyi

Documents Meet LLMs: Tales from the Trenches

Nico Albers

Nico Albers is leading the data-related backend teams (recommendations, sorting, search) of the fashion online retailer ABOUT YOU. Before this, he worked as a Data Scientist on Business Intelligence Problems and Recommendations.

He holds a master's degree in Mathematics from the University of Hamburg, where he focused on Statistical Learning Theory and Inverse Problems.

The new lockfile format introduced in PEP 751

Nicolas Brichet

Browser-based AI workflows in Jupyter

Nicolas M. Thiéry

Nicolas M. Thiéry is professor in computer science at the Laboratoire Interdisciplinaire des Sciences du Numériques of Université Paris-Saclay. Open Science advocate since 1994, he has contributed to the SageMath computational mathematics system and has been focusing lately on reasoned usage of technologies such as Jupyter and software forges for teaching programming and computations at scale. He jointly leads a chair «AI and education» at Paris-Saclay.

Sharing computational course material at larger scale: a French multi-tenant attempt

Nour El Mawass

Nour leads the Generative AI technical group at Modus Create. She has a PhD in Machine Learning, and has worked on Machine Learning, Data Science and Data Engineering problems in various domains, both inside and outside Academia.

Documents Meet LLMs: Tales from the Trenches

Olivier Grisel

Olivier is an open source fellow at probabl and a scikit-learn core contributor.

Probabilistic regression models: let's compare different modeling strategies and discuss how to evaluate them

Patrick Lee

Applying Causal Inference in Industry 4.0: A Case Study from Glasswool Production

Paul Girard

Paul (@paulanomalie) is a digital humanist: trained as an engineer, inspired by designers, he aims at developing the best human-data interfaces. After ten years as a research engineer in humanities, he co-founded OuestWare a software agency specialized in developing custom data analytics web applications.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

Pietro Piccini

Architecting Scalable Multi-Modal Video Search

Ralf Gommers

Big ideas shaping scientific Python: the quest for performance and usability

Rania Talbi

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

Raúl Cumplido

Apache Arrow committer and PMC

State of Parquet 2025: Structure, Optimizations, and Recent Innovations

Riccardo Cappuzzo

I am a research engineer at Inria, part of P16 and of the SODA research team. I am the lead developer of the skrub Python package and spend most of my time on that, but I am also interested in research on tabular learning and tabular foundational models.

Skrub: machine learning for dataframes

Rok Mihevc

Started as a physicist, worked as data scientist and engineer, got interested in data tooling and became an Apache Arrow and Parquet contributor, focusing on the C++ and lately Rust implementations. Would like to see numerical computation become more accessible in general purpose languages and frameworks.

State of Parquet 2025: Structure, Optimizations, and Recent Innovations

Romain Clement

Romain Clement is a software engineer with over a decade of experience spanning data engineering, applied mathematics, and machine learning. Since 2018, he’s worked as an independent consultant, helping data teams streamline and productionize their workflows—bringing software engineering best practices into data science, MLOps, and beyond.

He’s an active open-source contributor, with personal projects and community involvement in ecosystems like Datasette. A regular speaker since 2019 and organizer of the Grenoble Python Meetup, he enjoys sharing pragmatic tools and techniques that make data work actually work.

Find out more on romain-clement.net

Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly
You Don’t Need Spark for That: Pythonic Data Lakehouse Workflows

Rémi Flamary

Remi Flamary is Professor at École Polytechnique in the Centre de Mathématiques Appliquées (CMAP). He was previously Associate Professor at Université Cote d’Azur (UCA), 3IA Chair in Artificial Intelligence, and a member of Lagrange Laboratory, Observatoire de la Cote d’Azur. His current research interests include signal, image processing, and machine learning with a recent focus on applications of Optimal Transport theory to machine learning problems such as graph processing and domain adaptation. He is also the co-creator and maintainer of the Python Optimal Transport toolbox (POT).

Optimal Transport in Python: A Practical Introduction with POT

Sanjiban Sengupta

Sanjiban is a Doctoral Student at CERN, affiliated to the University of Manchester. He is researching on optimization strategies for efficient Machine Learning Inference for the High-Luminosity phase of the Large Hadron Collider at CERN within the Next-Gen Triggers Project. Previously, he was a Summer Student at CERN in 2022, and also contributed at CERN-HSF via the Google Summer of Code Program in 2021. In the development of SOFIE, he was particularly involved in the development of the Keras and PyTorch Parser, storage functionalities, machine learning operators based on ONNX standard, Graph Neural Networks support, etc. Moreover, he volunteered as a Mentor for the contributors of Google Summer of Code 2022, and again in 2023, 2024 and 2025, and the CERN Summer Students of 2023 working on CERN’s ROOT Data Analysis Project.

Previously, Sanjiban spoke at PyCon India 2023 about Python interfaces for Meta’s Velox Engine. He also presented a talk on the Velox architecture at PyCon Thailand 2023. He has been contributing to open-source projects on data science and engineering that includes ROOT, Apache Arrow, Substrait, etc.

Advancements in optimizing ML Inference at CERN

Sebastiano Milardo

Sebastiano Milardo received his Bachelor’s and Master’s degrees in Computer Engineering from the University of Catania in 2011 and 2013, respectively, and earned a Ph.D. in Information and Communication Technologies from the University of Palermo in 2018. From 2014 to 2015, he was a Researcher at the Italian National Consortium of Telecommunications, contributing to the NEWCOM# and SIGMA projects. He served as a Postdoctoral Fellow at the MIT Senseable City Laboratory from 2018 to 2021, where he worked on interdisciplinary research at the intersection of urban science, networks, and data-driven technologies. Since 2021, he has been working as a freelance researcher and consultant, collaborating on projects involving advanced data analytics and artificial intelligence.

His research interests include software-defined networks, network protocols for the Internet of Things, and big data. More recently, his focus has expanded to artificial intelligence, with particular attention to large language models (LLMs), machine learning pipelines, and the practical application of AI technologies in complex, real-world scenarios.

Architecting Scalable Multi-Modal Video Search

Simeon Carstens

CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

Simona Bottani

I am data science project leader at Saint-Gobain Research in Paris. From 2018 to 2022 I worked at the Aramis team (INRIA) where I obtained a PhD in computer science from Sorbonne University. My PhD was about machine learning for 3D neuroimaging using a large scale datawarehouse. I obtained a bachelor and a master degree in biomedical engineer from Politecnico di Torino

Applying Causal Inference in Industry 4.0: A Case Study from Glasswool Production

Sylvain Corlay

Sylvain Corlay is the founder and CEO of QuantStack. He holds a PhD in applied mathematics from University Paris VI.

As an open-source developer, Sylvain Corlay is active in the Jupyter ecosystem. He is the co-creator of the Voilà dashboarding system and the Xeus C++ implementation of the Jupyter kernel protocol. He maintains several other projects of the Jupyter stack.

He is also a core contributor to conda-forge, and of several other scientific computing open-source projects, such as bqplot, xtensor, and ipyleaflet.

Beyond QuantStack, Sylvain does a lot of volunteer work for the community, as a member of the board of directors of NumFOCUS from 2018 to 2024, as co-organizer of JupyterCon 2020 and 2023, and organizer of the PyData Paris Meetup.

Open-source Business

Thorsten Beier

I am an OpenSource Software Developer working for QuantStack on the WebAssembly Stack

Expanding Programming Language Support in JupyterLite

Théo Gnassounou

I'm a third-year PhD student working on domain adaptation for biosignal applications.

Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

Tim Paine

Tim is a Quantitative Developer at Cubist Systematic Strategies and an adjunct professor in the Computer Science Department at Columbia University.

Build a data studio in your notebook with jupyter-fs

Uwe L. Korn

Uwe Korn is a CTO at the data science company QuantCo. His expertise is in building scalable architectures for machine learning services and the teams & culture around them. Nowadays, he focuses on the data engineering infrastructure that is needed to provide the building blocks to bring machine learning models into production. As part of his work to provide an efficient data interchange, he became a core committer to the Apache Parquet, Apache Arrow and conda-forge projects.

Navigating the security compliance maze of an ML service

Valeria Zuccoli

Statistician by education, Valeria is an AI Scientist specializing in real-time models for complex, challenging scenarios. Driven by a deep curiosity for the latest research, she applies advanced analytical techniques to build intelligent systems that deliver immediate and effective solutions in high-impact domains.

Repetita Non Iuvant: Why Generative AI Models Cannot Feed Themselves

Yann Lechelle

Yann Lechelle is a tech entrepreneur and executive, co-founder and CEO of Probabl, an INRIA spin-off focused on scikit-learn, whose mission is to globally distribute open source solutions in data science and machine learning.

Previously, he was CEO of Scaleway, a public cloud provider with unique European values, and a challenger to AWS, Azure, and Google Cloud. Over the past decades, he has founded and developed numerous tech companies, including Snips.ai, a leading company in voice processing and embedded AI, which he led from the seed stage to its acquisition in a commercial sale to Sonos, a pioneering smart speaker company listed on Nasdaq.

Yann is also co-founder of France Digitale, board member and VP ecosystem of HUB France AI, as well as board member of the One-o-One endowment fund and JEDI (Joint European Disruption Initiative). He contributes to the community as an angel investor, mentor, entrepreneur in residence at INSEAD, and Cloud and AI expert for the Collège Numérique France 2030 under the Secretary General for Investment under the authority of the Prime Minister.

Yann holds a Bachelor of Computer Science summa cum laude from the American University of Paris, as well as an MBA from INSEAD.

Open-source Business

Étienne Lac

CoSApp: an open-source library to design complex systems