Logging Apache Spark - How we made it easy
2022-06-13 , Maschinenhaus

Are you familiar with the following Scenario?

You're running your Apache Spark app on EMR, and the log file gets pretty heavy. You try and open it through the AWS UI, or download it straight to your computer. You end up connecting to the server running your driver or any of your executors, relentlessly searching your logs while simultaneously looking at Ganglia and the Spark UI for additional logs and metrics.

If you are, this talk is exactly for you.

Let me tell you how made it all easy with just some bootstrap actions, some bash scripts, Beats and Elastic. Customizable per app logging, with less searching of big log files and more looking into useful Kibana dashboards. This architecture is not nice to have, it's essential.

The Search track is presented by OpenSource Connections


Get your ticket now!

Register for Berlin Buzzwords in our ticket shop! We also have online tickets and reduced tickets for students available and you can find more information about our Diversity Ticket Initiative here!

Simona Meriam is a Senior Data Engineer at Aidoc, where she specializes in research and development of solutions for big data infrastructures. In her previous position as a Big Data Engineer at Nielsen, she researched and developed big data solutions using cutting-edge technologies such as Spark, Kafka, and Elasticsearch. In her spare time she enjoys talking, talking about music that you'll probably think is weird, Japan and data.