Simona Meriam is a Senior Data Engineer at Aidoc, where she specializes in research and development of solutions for big data infrastructures. In her previous position as a Big Data Engineer at Nielsen, she researched and developed big data solutions using cutting-edge technologies such as Spark, Kafka, and Elasticsearch. In her spare time she enjoys talking, talking about music that you'll probably think is weird, Japan and data.
Are you familiar with the following Scenario?
You're running your Apache Spark app on EMR, and the log file gets pretty heavy. You try and open it through the AWS UI, or download it straight to your computer. You end up connecting to the server running your driver or any of your executors, relentlessly searching your logs while simultaneously looking at Ganglia and the Spark UI for additional logs and metrics.
If you are, this talk is exactly for you.
Let me tell you how made it all easy with just some bootstrap actions, some bash scripts, Beats and Elastic. Customizable per app logging, with less searching of big log files and more looking into useful Kibana dashboards. This architecture is not nice to have, it's essential.