2023-08-15 –, Aula
Ibis provides a common dataframe-like interface to many popular databases and analytics tools (BigQuery, Snowflake, Spark, DuckDB, …). This lets users analyze data using the same consistent API, regardless of which backend they’re using, and without ever having to learn SQL. No more pains rewriting pandas code to something else when you run into performance issues; write your code once using Ibis and run it on any supported backend. In this tutorial users will get experience writing queries using Ibis on a number of local and remote database engines.
Tabular data is ubiquitous, and Pandas has been the de facto tool in Python for analyzing it. However, as data size scales, analysis using Pandas may become untenable. Luckily, modern analytical databases (like DuckDB) are able to analyze this same tabular data, but perform orders-of-magnitude faster than Pandas, all while using less memory. Many of these systems only provide a SQL interface though; something far different from Pandas’ dataframe interface, requiring a rewrite of your analysis code.
This is where Ibis comes in. Ibis provides a common dataframe-like interface to many popular databases and analytics tools (BigQuery, Snowflake, Spark, DuckDB, …). This lets users analyze data using the same consistent API, regardless of which backend they’re using, and without ever having to learn SQL. No more pains rewriting pandas code to something else when you run into performance issues; write your code once using Ibis and run it on any supported backend.
In this tutorial we’ll cover:
- The basic operations of Ibis (select, filter, group_by, join, and aggregate), and how these operations may be composed to form more complicated queries.
- How Ibis may be used on a number of different local and remote backend engines to execute the same queries on different systems.
- The tradeoffs of different database engines, and recommendations for how to choose the best tool for the job.
- How Ibis integrates into the larger Python data ecosystem, including tools like Scikit-Learn or Matplotlib
This is a hands-on tutorial, with numerous examples to get your hands dirty. Participants should ideally have some experience using Python and Pandas, but no SQL experience is necessary.
Ibis: A fast, flexible, and portable tool for data analytics.
Category [Data Science and Visualization] –Data Analysis and Data Engineering
Expected audience expertise: Domain –none
Expected audience expertise: Python –some
Project Homepage / Git – Public link to supporting material –I'm fascinated by a variety of problems related to computers. I've solved hard problems in a variety of software engineering domains including digital video, Rust, systems programming, computer vision, and analytics. I'm currently helping build next generation Python analytics tooling at Voltron Data.