Data Engineering Hierarchy of Needs
.ical

2020-07-16 09:00–09:45, Bay Area Meetup [Sessions start: Thursday 16.07 9am (Thursday 16..07 9am PDT)]

Data Infrastructures look differently between small, mid, and large sized companies. Yet, most content out there is for large and sophisticated systems. And almost none of it is on migrating a legacy, on-prem, databases over to the cloud.

We'll begin with the fundamentals of building a modern Data Infrastructure from the ground up through a hierarchy of needs. The hierarchy has a (subjective) 7 levels, ranging from Automation to Data Streaming.

Different Companies have Different Data Infra Needs

Small-Sized Companies
Mid
Large

The Hierarchy

Picture here

Automate - Moving from scripts and manual process over to a transparent ETL software; e.g. Airflow.
Extract - Without Extraction, there is no data.
Load - Storage is cheap. Data Loss is expensive. Load first.
Transform - Only SQL allowed for maintainable reasons.
Optimize - Spark.
Machine Learning - Integrate ML for automation from ingestion to modeling. Use Airflow.
Streaming - Near/Real-Time Data and Transactions.

Angel D'az

I am a first generation white-collar worker and Salvadoran immigrant. I am currently a Data Engineer working on automating ETL pipelines for Data Analysts/Scientists. My focus is reproducible data infrastructure as code that is easy to stand-up and troubleshoot.

Data Engineering Hierarchy of Needs .ical 2020-07-16 09:00–09:45, Bay Area Meetup [Sessions start: Thursday 16.07 9am (Thursday 16..07 9am PDT)]

Different Companies have Different Data Infra Needs

The Hierarchy

Data Engineering Hierarchy of Needs
.ical

2020-07-16 09:00–09:45, Bay Area Meetup [Sessions start: Thursday 16.07 9am (Thursday 16..07 9am PDT)]