Mirano Tuk PyCon DE & PyData 2026

Mirano Tuk
.ical

Principal Software Engineer at ReversingLabs, working on large-scale distributed systems and data-intensive architectures.

I design and operate high-throughput, real-time pipelines, with an emphasis on reliability, observability, and performance in real-world conditions, and a practical approach to engineering trade-offs and system failures.

LinkedIn:

https://www.linkedin.com/in/mirano-tuk-24833a92/

GitHub:

http://github.com/mirac7

Session

04-14

16:30

30min

How to Search Through 800 Billion Records in Real Time

Mirano Tuk, Filip Bacic

Large-scale distributed systems rarely produce clean data streams. In practice, hundreds of services continuously emit overlapping updates, retries, corrections, and partial state. Turning that constant stream of noisy events into a reliable, searchable dataset in real time, while processing hundreds of billions of records per day, requires careful architectural choices.

This talk shares practical lessons from building a Kafka-based ETL pipeline that transforms massive volumes of events into a coherent dataset suitable for real-time search. After a brief overview of the system architecture, we focus on several key techniques: reducing redundant processing through key deduplication and short-lived buffers, defining when messages can be safely acknowledged without risking data loss, and keeping long-running ETL services healthy under heavy Kafka workloads.

The session emphasizes concrete engineering trade-offs and operational realities rather than theory. Attendees will leave with practical patterns for building more reliable and efficient streaming pipelines.

PyData: Data Handling & Data Engineering

Ferrum [2nd Floor]

Mirano Tuk .ical

Session

Mirano Tuk
.ical