2022-06-14 –, Kesselhaus
Kafka data pipeline maintenance can be painful.
It usually comes with complicated and lengthy recovery processes, scaling difficulties, traffic ‘moodiness’, and latency issues after downtimes and outages.
It doesn’t have to be that way!
We’ll examine one of our multi-petabyte scale Kafka pipelines, and go over some of the pitfalls we’ve encountered. We’ll offer solutions that alleviate those problems, and go over comparisons between the before and after . We’ll then explain why some common sense solutions do not work well and offer an improved, scalable and resilient way of processing your stream.
We’ll cover:
- Costs of processing in stream compared to in batch
- Scaling out for bursts and reprocessing
- Making the tradeoff between wait times and costs
- Recovering from outages
- And much more…
Get your ticket now!
Register for Berlin Buzzwords in our ticket shop! We also have online tickets and reduced tickets for students available and you can find more information about our Diversity Ticket Initiative here!
I am a director of data engineering at Nielsen.
My group builds massive data pipelines that are cost effective and scalable (~250 Billion events/day). Our projects run on AWS, using Kafka, Spark, Aerospike, serverless Lambda functions, Airflow, OpenFAAS, Kubernetes and more.
I am passionate about new technologies, data, algorithms and machine learning. I love to tackle difficult problems and come up with amazing solutions to them.
I have 4 patents in the area of security, and lots of ideas for more..
I am a big data team lead at Nielsen.
My team focuses on building massive data pipelines (~250 Billion events/day) and infrastructure for running machine learning algorithms. Our projects run on AWS using a variety of technologies like Kafka, Spark, Airflow, Kubernetes, and more.
I like to continuously experiment with new technologies, tackle challenging problems, and find those better, more elegant, and cost-effective solutions.