Natalia is a Data Engineer from Brazil with a passion for learning new things and eating cookies. Currently working at Bol.com, she started developing in Python in 2014 and has been solving big data issues in different industries since then.
Live Stream: https://youtu.be/9ZQxvhdOTlA
PySpark is a distributed data processing engine widely used in Data Engineering and Data Science. Another way to think of PySpark is a library that allows processing large amounts of data on a single machine or a cluster of machines. We will go through the basic concepts and operations so you will leave the workshop ready to continue learning on your own.