PyCon Lithuania 2024

Data Processing with Apache Spark and Apache Iceberg
2024-04-02 , Tutorials 1

"Data Processing with Apache Spark and Apache Iceberg" is a dynamic workshop designed to equip data professionals with advanced skills in managing and processing large-scale data. Participants will be introduced to the essential table formats before delving into Apache Iceberg's integration with Apache Spark. This session focuses on practical applications, including schema evolution and efficient file management, to enhance data processing efficiency and scalability. Ideal for data engineers and scientists,


In this intensive workshop, we will dive into the practicalities of using Apache Spark and Apache Iceberg for advanced data processing and management. The session will start with an introduction to table formats, emphasizing their pivotal role in efficient data handling. We will explore Apache Iceberg's capabilities, focusing on how it enhances data processing when used with Apache Spark. Participants will learn to implement Apache Iceberg for tasks like schema evolution, hidden partitioning,

This workshop is tailored for data engineers, data scientists, and technology professionals with a basic understanding of data processing concepts and an interest in modern data management techniques.

Participants are required to set up Docker on their machine prior to the training session; you can do that by installing Docker Desktop here: https://www.docker.com/get-started/.

I'm a Data Engineer with a diverse background, transitioning from a Data Analyst to a Team Lead and Head of Data before returning to my roots. I have a knack for numbers and a passion for coding, constantly seeking optimal solutions and driving continuous improvement.

With expertise in data pipelines, orchestration, SQL, and strong communication skills, I excel in leading and mentoring teams. I've been fortunate to contribute to multiple data migrations and projects, including building some from scratch.

Outside of work, I thrive in fast-paced environments, embracing new challenges and staying updated with the latest technologies through side projects. I share my knowledge with the community through my podcast and blog, 'Uncle Data,' where I discuss all things data-related.

This speaker also appears in: