EuroSciPy 2026

Riccardo Cappuzzo

I am a research engineer at Inria, part of P16 and of the SODA research team. I am one of the maintainers of the skrub Python package. I hold a PhD in Computer Science and I am also interested in research on tabular learning and tabular foundational models.

Affiliation:

Inria

Position / Job:

Research engineer


Session

07-21
09:30
30min
How to use skrub Data Ops in practice
Riccardo Cappuzzo, Guillaume Lemaitre

Skrub is a package that eases preparing dataframes so they can be used in machine-learning tasks. In practice, data can be spread over multiple tables, represent various types of information (tabular, textual, graphical), or be stored on external database systems rather than dataframes.

Skrub Data Ops help with constructing versatile pipelines that can handle this variety of scenarios, while at the same time avoiding data leakage and allowing to build rich hyper-parameter grids that can be explored to maximize the performance of the final machine learning model.

In this talk, we give a brief introduction of the Data Ops framework before presenting three separate use cases highlighting their versatility: a traditional machine learning pipeline that uses Optuna to perform hyper-parameter tuning, a pipeline that trains on data stored in a relational database rather than a dataframe, and an image classification task with Pytorch.

By the end of the talk, attendees will learn about the skrub Data Ops, their main features and how they can be used successfully in different practical scenarios.

Computational Tools and Scientific Python Infrastructure
Room 1.38 (Ground Floor, Turing)