Polars - make the switch to lightning-fast dataframes
2023-04-17 , Kuppelsaal

In this talk, we will report on our experiences switching from Pandas to Polars in a real-world ML project. Polars is a new high-performance dataframe library for Python based on Apache Arrow and written in Rust. We will compare the performance of polars with the popular pandas library, and show how polars can provide significant speed improvements for data manipulation and analysis tasks. We will also discuss the unique features of polars, such as its ability to handle large datasets that do not fit into memory, and how it feels in practice to make the switch from Pandas. This talk is aimed at data scientists, analysts, and anyone interested in fast and efficient data processing in Python.


The pandas library is one of the most widely used tools for working with data in the Python ecosystem. However, pandas can be slow for medium and larger datasets, and many users have been looking for faster alternatives. In this talk, we introduce the new polars library, a high-performance dataframe library for Python based on Apache Arrow and written in Rust. We will report on our experiences switching from Pandas to Polars in a real-world ML project.

We will compare the performance of polars with pandas using various use-cases, and show how polars can provide significant speed improvements for common data manipulation and analysis tasks. Due to its speed it can even be an alternative for cases where people normally use distributed systems like Spark. For example, we will demonstrate how polars can process large datasets with minimal overhead, and how its massive use of parallelization can provide an additional speed boost.

We will also discuss how polars compares to other popular options like DuckDB and cuDF.

This talk is aimed at data scientists, analysts, and anyone interested in fast and efficient data processing in Python. Whether you are a pandas user looking for a faster alternative, or a Spark user interested in a simpler alternative, this talk will provide valuable insights and practical examples.


Expected audience expertise: Python

Intermediate

Abstract as a tweet

Want to learn about a new Python library that can speed up your datascience and analytics work? Join us at the conference to hear about polars, a lightning-fast dataframe library based on Apache Arrow and written in Rust!

Expected audience expertise: Domain

Intermediate

Thomas passion has been working with data since 25 years: from small databases for SMEs to large distributed systems for international enterprises and intelligent systems using machine learning. He graduated from the KIT in Karlsruhe, Germany and trained his first neural network while studying at UPC, Barcelona, Spain in 2002. Today he leads the Data Science & AI practice of BettercallPaul in Stuttgart and supports his customers and teams on their journey to generate added value from data.