BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pyconde-pydata-2026//speaker//P9UQXL
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pyconde-pydata-2026-ZYUJH3@pretalx.com
DTSTART;TZID=CET:20260414T163000
DTEND;TZID=CET:20260414T170000
DESCRIPTION:Large-scale distributed systems rarely produce clean data strea
 ms. In practice\, hundreds of services continuously emit overlapping updat
 es\, retries\, corrections\, and partial state. Turning that constant stre
 am of noisy events into a reliable\, searchable dataset in real time\, whi
 le processing hundreds of billions of records per day\, requires careful a
 rchitectural choices. \n\nThis talk shares practical lessons from building
  a Kafka-based ETL pipeline that transforms massive volumes of events into
  a coherent dataset suitable for real-time search. After a brief overview 
 of the system architecture\, we focus on several key techniques: reducing 
 redundant processing through key deduplication and short-lived buffers\, d
 efining when messages can be safely acknowledged without risking data loss
 \, and keeping long-running ETL services healthy under heavy Kafka workloa
 ds.\n\nThe session emphasizes concrete engineering trade-offs and operatio
 nal realities rather than theory. Attendees will leave with practical patt
 erns for building more reliable and efficient streaming pipelines.
DTSTAMP:20260412T141855Z
LOCATION:Ferrum [2nd Floor]
SUMMARY:How to Search Through 800 Billion Records in Real Time - Mirano Tuk
 \, Filip Bacic
URL:https://pretalx.com/pyconde-pydata-2026/talk/ZYUJH3/
END:VEVENT
END:VCALENDAR
