PyCon DE & PyData 2025

The Foundation Model Revolution for Tabular Data
2025-04-25 , Platinum3

What if we could make the same revolutionary leap for tables that ChatGPT made for text? While foundation models have transformed how we work with text and images, tabular / structured data (spreadsheets and databases) - the backbone of economic and scientific analysis - has been left behind. TabPFN changes this. It's a foundation model that achieves in 2.8 seconds what traditional methods need 4 hours of hyperparameter tuning for - while delivering better results. On datasets up to 10,000 samples, it outperforms every existing Python library, from XGBoost to CatBoost to Autogluon.

Beyond raw performance, TabPFN brings foundation model capabilities to tables: native handling of messy data without preprocessing, built-in uncertainty estimation, synthetic data generation, and transfer learning - all in a few lines of Python code. Whether you're building risk models, accelerating scientific research, or optimizing business decisions, TabPFN represents the next major transformation in how we analyze data. Join us to explore and learn how to leverage these new capabilities in your work.


TabPFN shows how foundation model concepts can advance tabular data analysis in Python. Published in Nature Magazine in January 2025, it found strong community adoption with >3,000+ GitHub stars and 1,000,000+ downloads.

Detailed Outline:

  1. Motivation
    - Why tabular data: examples of tabular prediction tasks and time series forecasting
    - Why foundation models for tabular data
    - Learning from the foundation model revolution in text and vision

  2. Technical Insights
    - How we adapted transformers for tabular data
    - Making in-context learning work for structured data
    - Performance characteristics and resource requirements
    - How to apply TabPFN to time series

  3. Practical Applications
    - When to choose TabPFN vs traditional methods
    - Resource requirements and scalability limits
    - What's next for TabPFN

  4. Colab Demo

  • Q&A

Key Takeaways:
- Practical understanding of TabPFN's capabilities and limitations
- Hands-on experience integrating with Python data science workflows
- Best practices for working with foundation models on tabular data
- Insight into emerging approaches for structured data analysis


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Intermediate

Public link to supporting material, e.g. videos, Github, etc.:

https://www.nature.com/articles/s41586-024-08328-6; https://github.com/PriorLabs/TabPFN

Frank is a Hector-Endowed Fellow and PI at the ELLIS Institute Tübingen and has been a full professor for Machine Learning at the University of Freiburg (Germany) since 2016. Previously, he has been an Emmy Noether Research Group Lead at the University of Freiburg since 2013. Before that, he did a PhD (2004-2009) and postdoc (2009-2013) at the University of British Columbia (UBC) in Canada. He received the 2010 CAIAC doctoral dissertation award for the best thesis in AI in Canada, as well as several best paper awards and prizes in international ML competitions. He is a Fellow of ELLIS and EurAI, Director of the ELLIS unit Freiburg, and the recipient of 3 ERC grants. Frank is best known for his research on automated machine learning (AutoML), including neural architecture search, efficient hyperparameter optimization, and meta-learning. He co-authored the first book on AutoML and the prominent AutoML tools Auto-WEKA, Auto-sklearn and Auto-PyTorch, won the first two AutoML challenges with his team, is co-teaching the first MOOC on AutoML, co-organized 15 AutoML-related workshops at ICML, NeurIPS and ICLR, and founded the AutoML conference as general chair in 2022. In recent years, his focus has been on the intersection of foundation models and AutoML, prominently including the first foundation model for tabular data, TabPFN.