Practical DevOps for the busy data scientist PyConDE & PyData Berlin 2019

Practical DevOps for the busy data scientist

How many times have you developed a model or a data application and tested it locally or in a staging environment just to find out that it breaks in production? This is a common issue faced by thousands of data scientists around the globe. As the wor

I will provide a number of machine learning examples and use cases focusing on logging, debugging, diagnosis, automated testing, integration and delivery. In brief, this talk will lead you to step by step on how to use Azure Devops to automate the deployment of your data applications along with Kubernetes and Docker containers to a production environment.

Attendees will acquire an understanding of DataOps and how these can improve your data science workflows. As we move over to the examples, you will better identify the many challenges faced during the productionization of data applications and how these can be mitigated through best DataOps practices. By the end of the talk, attendees will have the knowledge required to automate the delivery of their data products, increasing their productivity and the quality of their work.

Outline:
Introduction: what are DataOps and why should Data Scientists care about?
Introduction to the technologies used: Docker, Kubernetes, CI/CD, helm, etc. : term debunking, why are these technologies and what is the fuzz about them?
Preparing your repository for continuous delivery
Provision your resources in the cloud efficiently using Helm and Kubernetes
Setting a basic deployment pipeline
Putting it all together
Adding additional features (e.g. intermediate checks, sandboxing etc.) to your pipeline so that it is tailored to your needs

Domains: Algorithms, Big Data, Data Science, DevOps, Machine Learning Domain Expertise: some Python Skill Level: basic Abstract as a tweet:

Devops for the busy data scientist: learn how to leverage these practices to improve your workflows

Dr. Tania Allard

Tania is a Research Engineer and developer advocate with vast experience in academic research and industrial environments. Her main areas of expertise are within data-intensive applications, scientific computing, and machine learning. One of her main areas of expertise is the improvement of processes, reproducibility and transparency in research, data science and artificial intelligence.
She is passionate about mentoring, open source, and its community and is involved in a number of initiatives aimed to build more diverse and inclusive communities. She is also a contributor, maintainer, and developer of a number of open source projects and the Founder of Pyladies NorthWest UK.