2026-07-20 –, Room 1.38 (Ground Floor, Turing)
This talk will introduce scikit-learn users to the new API for metadata routing, a feature introduced in the recent releases available in experimental mode. As a scikit-learn core developer, I'll share insights from my experience working directly on this feature.
We will explore what metadata is, how it can be used in machine learning pipelines, and how the new API simplifies routing metadata throughout a workflow. Routing metadata refers to an internal mechanism to pass metadata around between components of a data science pipeline, ensuring it reaches the functions that consume or utilize it.
Using well-known metadata such as sample_weight and groups which are implemented in many scikit-learn metrics and evaluation tools, we will examine the restrictions for passing metadata prior to the introduction of the new API. Then, we will enable the new routing API and demonstrate how it solves these challenges with examples that involve layers of nested-ness through cross-validation, hyperparameter tuning, or pipelines. We will explain the core components of the API, including methods like set_fit_request() and how to actually pass our metadata.
Attendees will leave with an understanding of how to enable and use the new routing API including passing metadata through Pipeline objects and validation tools like cross_validate. Additional references to the metadata user guide and developer guide will be provided for those interested in further exploration.
This talk will introduce scikit-learn users to the new API for metadata routing, a feature introduced in the recent releases available in experimental mode. As a scikit-learn core developer, I'll share insights from my experience working directly on this feature.
We will explore what metadata is, how it can be used in machine learning pipelines, and how the new API simplifies routing metadata throughout a workflow. Routing metadata refers to an internal mechanism to pass metadata around between components of a data science pipeline, ensuring it reaches the functions that consume or utilize it.
Using well-known metadata such as sample_weight and groups which are implemented in many scikit-learn metrics and evaluation tools, we will examine the restrictions for passing metadata prior to the introduction of the new API. Then, we will enable the new routing API and demonstrate how it solves these challenges with examples that involve layers of nested-ness through cross-validation, hyperparameter tuning, or pipelines. We will explain the core components of the API, including methods like set_fit_request() and how to actually pass our metadata.
Attendees will leave with an understanding of how to enable and use the new routing API including passing metadata through Pipeline objects and validation tools like cross_validate. Additional references to the metadata user guide and developer guide will be provided for those interested in further exploration.
Stefanie is an open-source developer and maintainer of scikit-learn, contributing also to related libraries. She trained and taught at LeWagon (2022–2023), interned with scikit-learn (2023), and worked at muffintech (2023) before joining probabl’s open-source team in 2024. She holds a PhD in History from the University of Potsdam (2021) and was active on Wikipedia as a writer and mentor (2011–2014).