2025-10-01 –, Gaston Berger
Discover a new benchmark designed for real-world impact. Built on authentic private-company data and carefully chosen public datasets that reflect real industry challenges, like product categorization, basket prediction, and personalized recommendations, it offers a realistic testing ground for both classic baselines (e.g., gradient boosting) and the latest models such as CARTE, TabICL, and TabPFN. By bridging the gap between academic research and industrial needs, this benchmark brings model evaluation closer to the decisions and constraints faced in practice.
This shift has tangible consequences: models are tested on problems that matter to businesses, using metrics that reflect real-world priorities (e.g., Precision@K, Recall@K, MAP@K). It enables more relevant model selection, highlights where academic approaches fall short, and fosters solutions that are not just novel but deployable. Models are judged on tasks and metrics that matter, enabling more informed choices, exposing the limits of lab-only approaches, and helping accelerate the journey from innovation to deployment.
Tabular benchmarks today mostly rely on traditional academic datasets, often detached from industrial realities. In this talk, we introduce a fresh approach: a benchmark built from authentic, real-life company data, augmented by publicly available datasets chosen specifically for their industrial relevance. The benchmark will feature classic baselines such as gradient boosting decision trees, but also cutting-edge models like CARTE, TabICL, TabPFN, offering a comprehensive view of how state-of-the-art methods perform in practical, high-stakes scenarios.
Our goal is simple yet impactful: offer the machine learning community richer, more realistic challenges, directly applicable to real-world scenarios. We believe that meaningful progress in data science comes from addressing genuine industrial tasks, such as product categorization, Instacart basket prediction, and personalized fashion recommendations inspired by the H&M dataset, evaluated using industry-standard metrics (Precision@K, Recall@K, MAP@K).
During this session, you will:
- Understand the limitations of traditional academic benchmarks and why industrial datasets matter (0-5 min).
- Explore specific case studies from our industry-backed benchmark, including detailed examples of product categorization, basket prediction, and personalized recommendations (5-15 min).
- Learn how our approach provides a pathway from theoretical methods to practical, actionable insights (15-25 min).
- Participate in an interactive Q&A to discuss how realistic benchmarking can enhance your own work (25-30 min).
The session targets data scientists, researchers, and ML engineers looking to evaluate research models on tangible industry outcomes. Basic familiarity with machine learning concepts and benchmarking practices is recommended.
Alexandre Abraham is Lead AI Scientist at Neuralk-AI, where he works on building the first deep tabular foundation model for retail applications. Throughout his career, he has applied cutting-edge machine learning to real-world industrial challenges—modeling user behavior at Criteo, developing intelligent labeling workflows at Dataiku, conducting health data research at Inria, and working on causal inference using national health databases at Implicity. His expertise centers on unsupervised learning, human-in-the-loop systems, and tabular data in production environments.
An active contributor to the open-source community, Alexandre is the author of several widely used tools, including nilearn for neuroimaging, and CardinAL and OpenAL, benchmarks for evaluating active learning strategies. He is also committed to education—after a decade of teaching at EPITA, he now trains public-sector decision-makers at the Institut des Hautes Études du Ministère de l'Intérieur.
