Running Apache Spark on K8s: From AWS EMR to K8s
06-14, 14:00–14:40 (Europe/Berlin), Kesselhaus

Spark is a trend technology that it is being used for a lot of companies for large-scale data analytics. During the first try, companies usually try to use the cloud provider solution to speed up their time to market, but once Spark is broadly embrace by more teams in the company and the solution should be able to be multi cloud provider, then the Kubernetes adoption appear and the journey to make it happen its worth to share to inspire others in the same situation. In this talk the audience will learn some benefits to migrate from AWS EMR to Spark on Kubernetes, from operability point of view (reliability, portability, scalability), through observability and finally reviewing efficiency and costs. This talk is a real use case three teams at Empathy.co were working during 6 months to make their solution more agnostic and with minimum cloud dependencies.


Get your ticket now!

Register for Berlin Buzzwords in our ticket shop! We also have online tickets and reduced tickets for students available and you can find more information about our Diversity Ticket Initiative here!

I’m a Senior DevOps Engineer currently working as Tech Lead of the Platform Engineering Team at Empathy.co. I mostly manage Kubernetes Clusters, CI/CD orchestration, Elasticsearch, MongoDB and try to break things on AWS, GCP and Azure. I'm a big fan of Anton Babenko and if I'm not online you can find me on Pagerduty.

DataScience&Search Product Owner and developer in Empathy. I worked in all the search areas in the last years, from the relevancy to data science and from the pure backend to manage the merchandiser and customer needs.
Experience with high availability systems using k8s, different cloud providers. Talking about search and search intelligence I usually have fun with technologies like Spark or Elasticsearch but also love multidisciplinary teams with knowledge over all the development process (CI/CD, metrics, performance...).

About the real me, I like to travel, be (more or less) healthy and plants but... who doesn't?
Please if you see me around pay a drink.

I am a software engineer working as a Data Engineer for Empathy.co. My work is focused on building and managing ETL pipelines that feed our search engine with contextual information to improve the search experience on the final users and provide aggregated analytics to merchandisers, always keeping privacy in mind. I am passionate about data engineering and I perform my daily work with technologies like Apache Flink, Apache Spark, MongoDB running in AWS or GCP.