2025-03-02 –, F-AVR
Organizations often face a "data divide" – a gap between analysts comfortable with SQL and data scientists/engineers proficient in Python. This talk introduces DuckDB, a powerful, in-process SQL OLAP database, as a unifying solution. DuckDB seamlessly integrates the analytical power of SQL with Python's flexibility, making it ideal for both groups. We'll explore its ease of use, minimal setup (zero dependencies), and performance benefits, particularly for cloud data and lazy loading. While not a silver bullet, DuckDB shines in specific use cases and complements tools like Pandas, Polars, and PySpark, as highlighted in recent community discussions and benchmarks. Discover how DuckDB can streamline your data workflows, empower diverse teams, and unlock insights, from prototyping to edge computing.
This talk explores the challenge of unifying data teams with diverse skillsets (SQL vs. Python) as a common issue in many organizations. I will introduce DuckDB, an in-process SQL database, and how it bridges this gap by seamlessly integrating SQL with Python by covering the following subtopics:
- The challenges of siloed data teams and the need for tools that unify SQL and Python.
- DuckDB's core features (in-process, columnar, SQL-focused) and its Python integration.
- DuckDB's ease of use, minimal setup (no dependencies), and performance benefits.
- Direct analysis of cloud data (S3, GCS) with DuckDB, highlighting its efficiency compared to traditional methods.
- Lazy loading techniques with Arrow and DuckDB for optimized data access.
- Positioning DuckDB alongside Pandas, Polars, and PySpark, emphasizing its complementary role.
- Use cases for DuckDB, including prototyping, ad-hoc analysis, edge computing, and ETL.
Intermediate
Category:Data Science/Analysis/Engineering
Sam Matuba combines domain expertise in chemical and energy engineering with experience across data science, software development, and AI. His passion lies at the intersection of energy and technology, where he leads innovation to drive sustainable solutions and transformative change. With a unique ability to merge technical and engineering knowledge, Sam bridges the gap between traditional industries and cutting-edge advancements.
At Mabuhay Energy, Sam leads digital transformation and technology innovation, shaping the strategy for data and AI to unlock new opportunities for growth and efficiency. He spearheaded an award-winning tech project recognized at the Asian Power Awards, demonstrating his ability to deliver impactful, scalable solutions. A strong advocate for open-source technologies, Sam brings innovative tools to enterprise applications, ensuring they are both robust and future-ready.