
My software development journey began with the open-source and the Apache Arrow project. In 2021, I made my first contribution to the Arrow R package—an experience that sparked my interest in software development and open-source collaboration. During my internship at Quansight, I was introduced to the Python DataFrame API standard, which deepened my understanding of interoperability challenges.
In 2022, after over a year of contributions, I became an Apache Arrow committer, primarily focusing on the Python implementation. I continued my work as a PyArrow maintainer at Voltron Data until mid-2024.
Apache Arrow remains the project I’m most passionate about, and I’m still actively involved in its development as a freelancer.
- You Don’t Have to Be an Expert: Stories from the Open Source Frontlines
Alexandre Abraham is Lead AI Scientist at Neuralk-AI, where he works on building the first deep tabular foundation model for retail applications. Throughout his career, he has applied cutting-edge machine learning to real-world industrial challenges—modeling user behavior at Criteo, developing intelligent labeling workflows at Dataiku, conducting health data research at Inria, and working on causal inference using national health databases at Implicity. His expertise centers on unsupervised learning, human-in-the-loop systems, and tabular data in production environments.
An active contributor to the open-source community, Alexandre is the author of several widely used tools, including nilearn for neuroimaging, and CardinAL and OpenAL, benchmarks for evaluating active learning strategies. He is also committed to education—after a decade of teaching at EPITA, he now trains public-sector decision-makers at the Institut des Hautes Études du Ministère de l'Intérieur.
- Move beyond academia: Introducing an industry-first tabular benchmark

Alexis Bondu is a Machine Learning researcher at Orange Research. His fields of research are varied and cover machine learning (Auto ML), active learning, weakly supervised learning, time series, data streams and early decision making. He is also responsible for the research part of the Khiops project, which is an Auto ML solution developed over the last twenty years in-house at Orange, and which has now been distributed as Open Source for around two years. The aim of this research work is to prepare the new functionalities and algorithms that will appear in future versions of Khiops.
- Unlock the full predictive power of your multi-table data

Alexis is a C++ Scientific Software Engineer at Quantstack.
He obtained a Master degree in Computer Science from l'École Supérieure d'Électronique de l'Ouest of Angers in 2012.
Before joining Quanstack, Alexis worked in various companies covering a large spectrum of domains all dominated by performance constraints: signal processing, image processing, 3D meshes and metadata processing ...
- Sparrow, Pirates of the Apache Arrow
I work as a Data Scientist at Renault Digital. My missions encompass:
- Maintaining our MLOPS pipeline
- Co-animating weekly best-practices sessions for data scientists (40+ attendees)
- Acting as DevOps relay inside the data science Team
- Working with plants, industry and manufacturing plants to reduce their cost of operation
Previously, I worked during 6 years in Neuroscience research, in various public institutions.
- How to do real TDD in data science? A journey from pandas to polars with pelage!

Anita Graser is a spatial data scientist, open-source advocate, and author. Her background is in geoinformatics, and she is currently working as a Senior Scientist in the Data Science & Artificial Intelligence research group at AIT Austrian Institute of Technology in Vienna. Anita also serves on the QGIS project steering committee and teaches Python for QGIS at UNIGIS Salzburg. She is the lead developer of MovingPandas (a Python library for analysing movement data) and has developed tools such as the Time Manager plugin for QGIS. She has received multiple awards, including the international OSGeo Sol Katz award for her contributions to open-source geographic information systems. Anita has published several books about QGIS, including "Geocomputation with Python", “Learning QGIS” and “QGIS Map Design” and writes a popular spatial data science blog at https://anitagraser.com/.
- Building Data Science Tools for Sustainable Transformation

Postdoctoral researcher at Inria Saclay
- Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

Antoine is a Scientific Software Engineer at Quantstack where he led devlopment efforts on the Mamba package manager, as well as on the Xeus-Octave Jupyter kernel and Xtensor. He obtained a Ph.D. in combinatorial optimization and machine learning from École Polytechnique de Montréal in 2021 where he worked at the interplay of deep learning and operations research. During that time, he developed Ecole a mixed Python/C++ library to ease the research on the use of machine learning methods for decision making inside mathematical solvers.
- Expanding Programming Language Support in JupyterLite

I am a software engineer at Adobe, where I focus on product-led growth initiatives and the development of generative AI applications. Alongside, I am pursuing a Master’s degree from Georgia Institute of Technology, with a concentration in artificial intelligence, cognitive science, and human-computer interaction. My professional and academic interests lie in creating AI systems that are not only technically robust but also intuitive, explainable, and aligned with human cognitive processes. I am also a strong advocate for ethical AI, responsible technology, and diversity in the tech industry.
Through my work, research, and public speaking, I strive to advance the development of human-centered AI that is transparent, trustworthy, and accessible.
- From Language to Knowledge: How SpaCy Can Build Better AI Models

Open Source Contractor at QuantStack working on the projects revolving around the stack revolving Jupyter, LLVM and WASM.
- xeus-cpp, the new C++ kernel for Jupyter.

Arjun is a Scientific Software Development Intern at QuantStack.
- Collaborative GIS editing in JupyterLab

I'm Chris Kucharczyk, a data scientist and data visualization designer. I live in Oxfordshire, UK.
I currently work at DrivenData, a social enterprise developing machine learning solutions to social impact problems. We host data science competitions and offer data science consulting services.
- How to make public data more accessible with "baked" data and DuckDB

Christophe Dervieux is an open source software engineer at Posit PBC, where he has worked for five years as a core developer on Quarto. With over a decade of experience using data science tools for publication—including previous work with R Markdown in the R community—Christophe brings deep expertise in reproducible research and technical communication. He is an active contributor to the open source ecosystem and is passionate about helping data practitioners of all levels share their work more effectively and reproducibly.
- From Jupyter Notebook to Publish-Ready Report: Effortless Sharing with Quarto

David Brochart is the main author of pycrdt, a Python library providing bindings to Yrs, the Rust port of Yjs, a popular implementation of CRDTs in JavaScript. While pycrdt is extensively used in Jupyter for real-time collaboration, it can be used to implement distributed data structures, allowing to share data without using locks usually associated with multithreading.
- Parallel processing using CRDTs

Davide De Marchi is a researcher and software engineer specializing in geospatial big data. He has significant experience in Big Data, Cloud Computing, GIS, Remote Sensing, and Data Visualization. His career includes contributions to the design and implementation of interactive visualization tools, notably at the European Commission - Joint Research Centre where he has been a key developer of the BDAP platform. Earlier in his career, he gained substantial experience in the development of geospatial data processing software and served as an adjunct professor at the University of Urbino
- Meta-Dashboards: Accelerating Geospatial Web Apps Creation with Voilà

Domagoj Marić graduated from the Faculty of Electrical Engineering and Computing in Zagreb, where he initially specialized in the field of information security, while towards the end of his studies, he shifted more deeply into the field of data science with a focus on web content extraction (web scraping/crawling). He began his business career at Megatrend poslovna rješenja, where he worked on the development of Python applications and data solutions with a focus on creating virtual assistants and later as the head of the data science department. That role continued through his career at Comping. Today, he works at Pontis Technology as the AI customer delivery manager, leading the delivery of projects in the area of natural language processing, computer vision, predictive analytics and generative AI. In addition to responsibilities in the data science domain, he also has experience and works as a lecturer in the field of programming.
- Modern Web Data Extraction: Techniques, Tools, Legal and Ethical Considerations

Data Scientist with a strong interest in NLP techniques. Elizaveta is currently working at Hellowork on the projects including document analysis, named entity recognition and recommendation systems.
- Balancing Privacy and Utility: Efficient PII Detection and Replacement in Textual Data

Emanuele is an engineer, researcher, and entrepreneur with a passion for artificial intelligence.
He earned his PhD by exploring time series forecasting in the energy sector and spent time as a guest researcher at EPFL in Lausanne. Today, he is co-founder and Head of AI at xtream, a boutique company that applies cutting-edge technology to solve complex business challenges.
Emanuele is also a contract professor in AI at the Catholic University of Milan. He has published eight papers in international journals and contributed to over 30 international conferences worldwide. His engagements include AMLD Lausanne, ODSC London, WeAreDevelopers Berlin, PyData Berlin, PyData Paris, PyCon Florence, the Swiss Python Summit in Zurich, and Codemotion Milan.
Emanuele has been a guest lecturer at Italian, Swiss, and Polish universities.
- Advanced Polars: Lazy Queries and Streaming Mode
Research engineer/data scientist at CREST (ENSAE, Palaiseau) and member of the Computational Social Science group at Institut Polytechnique de Paris (CSS@IPP).
- ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences
Engineer and Doctor in solid state physics, I have worked for 20 years in the neutron scattering community, and shifted to synchrotron radiation. With a strong expertise in scientific computing, my duty is now to help the synchrotron SOLEIL experimental beam-lines to handle huge data. Our group relies on open-source software deployed over data processing services.
- Fighting against the instability : Debian Science at the synchrotron SOLEIL
- CoSApp: an open-source library to design complex systems
Etienne is a Senior Research Scientist at CNRS. A specialist of political sociology, he is also interested in computational social sciences.
- ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

CTO at Camptocamp Geospatial, I have been working for 20 years + in geospatial data open source ecosystem and I am passionate about the technologies all around.
Living in Chambéry, I love maps, very useful to prepare outdoor activities =)
- A Journey Through a Geospatial Data Pipeline: From Raw Coordinates to Actionable Insights

- Skrub: machine learning for dataframes

Hans Fangohr is a computational scientist and open source advocate. He is heading the scientific support unit Computational Science at the Max Planck Institute for Structure and Dynamics of Matter in Hamburg, Germany, and is Professor of Computational Modelling at the University of Southampton in the United Kingdom. He is working on research software engineering, including high performance computing, data analysis and appropriate software engineering methods in computational science. He has contributed to open source software through tools such as Nmag, Ubermag, Postopus and NBVAL.
- Reproducible software provisioning for high performance computing (HPC) and research software engineering (RSE) using Spack

Ian Thomas
- Expanding Programming Language Support in JupyterLite

Irene Donato is a Data Scientist at Agile Lab with a PhD in Mathematics and a background in Physics. She specializes in AI strategy. With experience across academia and industry, Irene focuses on applying data science to solve complex business problems.
- Architecting Scalable Multi-Modal Video Search

Isabel Paredes is a software developer at QuantStack.
- Expanding Programming Language Support in JupyterLite

Technical Director at QuantStack and Project Jupyter core developer and maintainer (JupyterLab, Jupyter Notebook, Voilà Dashboards).
- Browser-based AI workflows in Jupyter

I am a research engineer at Inria working on open-source Python packages for data-science.
- Skrub: machine learning for dataframes

Johan Mabille is a Technical Director specialized in high-performance computing in C++. He holds a master's degree in computer science from Centrale-Supelec. As an open source developer, Johan coauthored xtensor , xeus , and xsimd.
He leads the C++ team at QuantStack, where he oversees the development and maintenance of mamba, sparrow, and the Jupyter Xeus project.
Johan has also made significant contributions to JupyterLab.
Prior to joining QuantStack, Johan worked as a quant developer at HSBC.
- Sparrow, Pirates of the Apache Arrow
- xeus-cpp, the new C++ kernel for Jupyter.

I'm a product manager for Streamlit and responsible for all features in the open-source library. My background is in physics, neuroscience, and machine learning.
- Beyond Prototyping: Building Production-Level Apps with Streamlit
- ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences
Justine is a data scientist at Hellowork, the French leader in talent aquisition, job search and course search tech. She spent the last 10+ years enjoying machine learning, python and other data science fun stuff in various fields. Her current work includes a good deal of natural language processing.
- Balancing Privacy and Utility: Efficient PII Detection and Replacement in Textual Data

I'm a freelance web developer helping small teams ship reliable software. I've been working with Python for 10+ years and enjoy automating work for other developers.
These days I'm very interested in local-first software technologies.
I attended the Recurse Center (a programming retreat) in 2018.
GitHub profile
Blog
- Code as Data: A Practical Introduction to Python’s Abstract Syntax Tree
- Building Resilient (ML) Pipelines for MLOps

Loïc has a Particle Physics background, which is how he discovered Python towards the end of his PhD.
He is a scikit-learn and joblib core contributor and has been involved in a number of Python open-source projects in the past 10 years, amongst which Pyodide, dask-jobqueue, sphinx-gallery and nilearn.
- PyPI in the face: running jokes that PyPI download stats can play on you

I am an undergraduate student studying Computer Science & Philosophy at the University of Oxford. Currently, I am a Mentee @ prefix.dev for European Summer of Code, working on Pixi and rattler-build.
I am also a maintainer of SciPy and array-api-extra, a member of the Consortium for Python Data API Standards, and a founding member of quantity-dev.
- A Hitchhiker's Guide to the Array API Standard Ecosystem

I’m a machine learning specialist with a background in both research and industry. After completing a PhD in machine learning, I applied my expertise in industrial settings at Safran and Orange, focusing on anomaly prediction, defect detection, and fraud analysis.
- Unlock the full predictive power of your multi-table data

Currently Product Engineer at Probabl, Marie is also co-organizer of Women in Machine Learning and Data Science Paris.
- Enhancing Machine Learning Workflows with skore

Martin Lang is a computational scientist working at the Max Planck Institute for the Structure and Dynamics of Matter, Hamburg, Germany. He has a PhD in physics from the University of Southampton, UK.
- Reproducible software provisioning for high performance computing (HPC) and research software engineering (RSE) using Spack

Martin Renou is a Technical Director at QuantStack and a maintainer of Project Jupyter. Among other projects Martin is a core team member of the ipywidgets project and maintains many Jupyter widget packages such as ipyleaflet, ipydatagrid, ipygany, ipycanvas, and bqplot. He is a co-creator of the Voilà dashboarding system, and the xeus-python kernel.
- Collaborative GIS editing in JupyterLab
- Documents Meet LLMs: Tales from the Trenches

Nico Albers is leading the data-related backend teams (recommendations, sorting, search) of the fashion online retailer ABOUT YOU. Before this, he worked as a Data Scientist on Business Intelligence Problems and Recommendations.
He holds a master's degree in Mathematics from the University of Hamburg, where he focused on Statistical Learning Theory and Inverse Problems.
- The new lockfile format introduced in PEP 751
- Browser-based AI workflows in Jupyter

Nicolas M. Thiéry is professor in computer science at the Laboratoire Interdisciplinaire des Sciences du Numériques of Université Paris-Saclay. Open Science advocate since 1994, he has contributed to the SageMath computational mathematics system and has been focusing lately on reasoned usage of technologies such as Jupyter and software forges for teaching programming and computations at scale. He jointly leads a chair «AI and education» at Paris-Saclay.
- Sharing computational course material at larger scale: a French multi-tenant attempt

Nour leads the Generative AI technical group at Modus Create. She has a PhD in Machine Learning, and has worked on Machine Learning, Data Science and Data Engineering problems in various domains, both inside and outside Academia.
- Documents Meet LLMs: Tales from the Trenches

Olivier is an open source fellow at probabl and a scikit-learn core contributor.
- Probabilistic regression models: let's compare different modeling strategies and discuss how to evaluate them
- Applying Causal Inference in Industry 4.0: A Case Study from Glasswool Production

Paul (@paulanomalie) is a digital humanist: trained as an engineer, inspired by designers, he aims at developing the best human-data interfaces. After ten years as a research engineer in humanities, he co-founded OuestWare a software agency specialized in developing custom data analytics web applications.
- ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

- Big ideas shaping scientific Python: the quest for performance and usability

Apache Arrow committer and PMC
- State of Parquet 2025: Structure, Optimizations, and Recent Innovations

Remi Flamary is Professor at École Polytechnique in the Centre de Mathématiques Appliquées (CMAP). He was previously Associate Professor at Université Cote d’Azur (UCA), 3IA Chair in Artificial Intelligence, and a member of Lagrange Laboratory, Observatoire de la Cote d’Azur. His current research interests include signal, image processing, and machine learning with a recent focus on applications of Optimal Transport theory to machine learning problems such as graph processing and domain adaptation. He is also the co-creator and maintainer of the Python Optimal Transport toolbox (POT).
- Optimal Transport in Python: A Practical Introduction with POT
- Skrub: machine learning for dataframes

Started as a physicist, worked as data scientist and engineer, got interested in data tooling and became an Apache Arrow and Parquet contributor, focusing on the C++ and lately Rust implementations. Would like to see numerical computation become more accessible in general purpose languages and frameworks.
- State of Parquet 2025: Structure, Optimizations, and Recent Innovations

Romain Clement is a software engineer with over a decade of experience spanning data engineering, applied mathematics, and machine learning. Since 2018, he’s worked as an independent consultant, helping data teams streamline and productionize their workflows—bringing software engineering best practices into data science, MLOps, and beyond.
He’s an active open-source contributor, with personal projects and community involvement in ecosystems like Datasette. A regular speaker since 2019 and organizer of the Grenoble Python Meetup, he enjoys sharing pragmatic tools and techniques that make data work actually work.
Find out more on romain-clement.net
- You Don’t Need Spark for That: Pythonic Data Lakehouse Workflows
- Machine Learning in the Browser: Fast Iteration with ONNX & WebAssembly

Sanjiban is a Doctoral Student at CERN, affiliated to the University of Manchester. He is researching on optimization strategies for efficient Machine Learning Inference for the High-Luminosity phase of the Large Hadron Collider at CERN within the Next-Gen Triggers Project. Previously, he was a Summer Student at CERN in 2022, and also contributed at CERN-HSF via the Google Summer of Code Program in 2021. In the development of SOFIE, he was particularly involved in the development of the Keras and PyTorch Parser, storage functionalities, machine learning operators based on ONNX standard, Graph Neural Networks support, etc. Moreover, he volunteered as a Mentor for the contributors of Google Summer of Code 2022, and again in 2023, 2024 and 2025, and the CERN Summer Students of 2023 working on CERN’s ROOT Data Analysis Project.
Previously, Sanjiban spoke at PyCon India 2023 about Python interfaces for Meta’s Velox Engine. He also presented a talk on the Velox architecture at PyCon Thailand 2023. He has been contributing to open-source projects on data science and engineering that includes ROOT, Apache Arrow, Substrait, etc.
- Advancements in optimizing ML Inference at CERN

- CodeCommons: Towards transparent, richer and sustainable datasets for code generation model training

I am data science project leader at Saint-Gobain Research in Paris. From 2018 to 2022 I worked at the Aramis team (INRIA) where I obtained a PhD in computer science from Sorbonne University. My PhD was about machine learning for 3D neuroimaging using a large scale datawarehouse. I obtained a bachelor and a master degree in biomedical engineer from Politecnico di Torino
- Applying Causal Inference in Industry 4.0: A Case Study from Glasswool Production

Sylvain Corlay is the founder and CEO of QuantStack. He holds a PhD in applied mathematics from University Paris VI.
As an open-source developer, Sylvain Corlay is active in the Jupyter ecosystem. He is the co-creator of the Voilà dashboarding system and the Xeus C++ implementation of the Jupyter kernel protocol. He maintains several other projects of the Jupyter stack.
He is also a core contributor to conda-forge, and of several other scientific computing open-source projects, such as bqplot, xtensor, and ipyleaflet.
Beyond QuantStack, Sylvain does a lot of volunteer work for the community, as a member of the board of directors of NumFOCUS from 2018 to 2024, as co-organizer of JupyterCon 2020 and 2023, and organizer of the PyData Paris Meetup.
- Open-source Business

I'm a third-year PhD student working on domain adaptation for biosignal applications.
- Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation
I am an OpenSource Software Developer working for QuantStack on the WebAssembly Stack
- Expanding Programming Language Support in JupyterLite

Tim is a Quantitative Developer at Cubist Systematic Strategies and an adjunct professor in the Computer Science Department at Columbia University.
- Build a data studio in your notebook with jupyter-fs

Uwe Korn is a CTO at the data science company QuantCo. His expertise is in building scalable architectures for machine learning services and the teams & culture around them. Nowadays, he focuses on the data engineering infrastructure that is needed to provide the building blocks to bring machine learning models into production. As part of his work to provide an efficient data interchange, he became a core committer to the Apache Parquet, Apache Arrow and conda-forge projects.
- Navigating the security compliance maze of an ML service

Statistician by education, Valeria is an AI Scientist specializing in real-time models for complex, challenging scenarios. Driven by a deep curiosity for the latest research, she applies advanced analytical techniques to build intelligent systems that deliver immediate and effective solutions in high-impact domains.
- Repetita Non Iuvant: Why Generative AI Models Cannot Feed Themselves

- Open-source Business