Daniel Hernández Alfageme
I am a software engineer working as a Data Engineer for Empathy.co. My work is focused on building and managing ETL pipelines that feed our search engine with contextual information to improve the search experience on the final users and provide aggregated analytics to merchandisers, always keeping privacy in mind. I am passionate about data engineering and I perform my daily work with technologies like Apache Flink, Apache Spark, MongoDB running in AWS or GCP.
Session
Spark is a trend technology that it is being used for a lot of companies for large-scale data analytics. During the first try, companies usually try to use the cloud provider solution to speed up their time to market, but once Spark is broadly embrace by more teams in the company and the solution should be able to be multi cloud provider, then the Kubernetes adoption appear and the journey to make it happen its worth to share to inspire others in the same situation. In this talk the audience will learn some benefits to migrate from AWS EMR to Spark on Kubernetes, from operability point of view (reliability, portability, scalability), through observability and finally reviewing efficiency and costs. This talk is a real use case three teams at Empathy.co were working during 6 months to make their solution more agnostic and with minimum cloud dependencies.