Fullstack datascientist v.2021 (how much of software engineering should a modern datascientist know)
Live broadcast: https://www.youtube.com/watch?v=UujU3xOo038
What are the essential software engineering skills a datascientist should have to succesfully bring own work to production? We - Sergei Beilin, Ph.D., software engineering consultant in AI/ML, and his wife Natalia Beylina, Ph.D., datascientist - will go through the most important things a modern datascientist needs to know about software engineering, from both software engineer and datascientist point of views, and using our own experience.
We will discuss:
* programming language(s): how much of the language should one know?
* execution models, orchestration, containerization - kubernetes, kubeflow, airflow, spark/databricks, etc
* storage, network protocols/APIs, file formats - from CSVs to delta, from json to avro
* modern systems architecture concepts to understand
* and how the whole system architecture and infrastructure landscape will dictate the way you deploy and run your work
* tools and devops practices
* processes: integrating data scientists' workflow into typical agile
* bad practices to avoid: a few examples we've seen ourselves