Mirano Tuk
Principal Software Engineer at ReversingLabs, working on large-scale distributed systems and data-intensive architectures.
I design and operate high-throughput, real-time pipelines, with an emphasis on reliability, observability, and performance in real-world conditions, and a practical approach to engineering trade-offs and system failures.
Session
Large-scale distributed systems rarely produce clean data streams. In practice, hundreds of services continuously emit overlapping updates, retries, corrections, and partial state. Turning that constant stream of noisy events into a reliable, searchable dataset in real time, while processing hundreds of billions of records per day, requires careful architectural choices.
This talk shares practical lessons from building a Kafka-based ETL pipeline that transforms massive volumes of events into a coherent dataset suitable for real-time search. After a brief overview of the system architecture, we focus on several key techniques: reducing redundant processing through key deduplication and short-lived buffers, defining when messages can be safely acknowledged without risking data loss, and keeping long-running ETL services healthy under heavy Kafka workloads.
The session emphasizes concrete engineering trade-offs and operational realities rather than theory. Attendees will leave with practical patterns for building more reliable and efficient streaming pipelines.