Workshop-Days 2025

Smart Data Lake Builder Hands-On
2025-09-11 , Room 7
Language: English

This interactive workshop introduces participants to Smart Data Lake Builder (SDLB), an open-source, metadata-driven Scala/Spark framework designed to help Data Engineers efficiently define, orchestrate, and optimize complex data pipelines. Through a combination of theory and hands-on exercises, attendees will learn how to design multi-layered data lake architectures, define data objects, actions, and connections, and implement transformations using SQL, Scala, or Python—all within a metadata-driven, configuration-first approach.

Participants will get practical experience by building a real-world data pipeline, leveraging SDLB’s features such as automated DAG generation, fail-fast and checkpoint/restart mechanisms, schema validation, data quality enforcement, and performance optimizations like partitioning and parallelization.


Overview

Smart Data Lake Builder (SDLB) is an open-source, metadata-driven framework for building robust, scalable data pipelines build with Scala and Apache Spark. SDLB enables Data Engineers to declaratively define data objects, actions, and connections in HOCON configuration files, and transformations in SQL, Scala/Spark, or Python.

What You’ll Learn

  • Core Concepts: Understand the multi-layered data lake architecture (external, staging, integration, business transformation layers) and the SDLB approach to pipeline design.
  • Data Objects & Actions: Learn to define data sources/targets and orchestrate transformations using SDLB’s modular building blocks.
  • Configuration-Driven Pipelines: Use HOCON files to describe data flows, making pipelines portable, maintainable, versionable, and leverage reusable templates..
  • Transformations: Implement data transformations in multiple languages (SQL, Scala/Spark, Python)
  • Advanced Features: Explore automated DAG generation, data quality checks, schema validation, incremental processing, partitioning, and parallelization.
  • DevOps Ready: See how SDLB supports automated testing, environment-specific configurations, and easy deployment across platforms.

Hands-On Exercise

Participants will collaboratively build a data pipeline that:

  • Downloads and historize airport data.
  • Applies data cleaning, transformation, and enrichment.
  • Stores results in multiple formats and layers (e.g., Delta Lake, CSV).
  • Implements data quality constraints and schema validation.
  • Demonstrates partitioning, historization, and incremental loading.

Who Should Attend

  • Data Engineers, Data Architects, and Developers interested in modern, metadata-driven data pipeline orchestration.
  • Anyone looking to leverage open-source tools for scalable, maintainable data engineering solutions.

Prerequisites

  • Basic familiarity with data engineering concepts
  • Some experience with Spark or configuration files (e.g., JSON, YAML, HOCON) is helpful but not required.
  • Some experience with Spark, SQL, Python and/or configuration files (e.g., JSON, YAML, HOCON) is helpful but not required.

Key Takeaways

  • Declarative Pipeline Design: Learn to build complex pipelines using simple, reusable configurations.
  • Metadata-Driven Orchestration: Understand how SDLB automates and validates the entire data flow.
  • Production-Ready Patterns: Gain practical skills in data quality, schema management, and pipeline optimization.

Join us to master Smart Data Lake Builder and take your data engineering skills to the next level!

Dr. Mandes Schönherr is a data engineering specialist with years of experience in designing and implementing modern data solutions in various cloud environments. Originally from Germany, now based in Bern, he works as a cloud expert and architect at ELCA, one of Switzerland’s largest and most respected IT consulting company. With a strong background in high-performance computing as an application developer, analyst and performance optimization consultant, he holds a PhD in natural sciences, focusing on computational physics and chemistry. Dr. Mandes Schönherr combines deep technical expertise with hands-on experience across cloud platforms, data architecture, and advanced analytics to deliver robust, scalable solutions for clients in diverse industries.