Now that you finally have your Machine Learning model trained, what’s the next step for moving to production?
Orchestrating, scheduling and monitoring ML inference pipelines is a big challenge.
Airflow can be your ally for handling this complexity.
After having worked hard developing a machine learning model, you know that there is still a relatively small step to do: moving it to production.
In a common scenario what you probably would like to have is a workflow to automate:
* gathering and preprocessing the data
* running inference on them
* storing the predictions
Ideally you would want a tool that can help you:
* dealing with big data
* guaranteeing robustness and resilience
* executing your workflows on a scheduled basis or when some pre-conditions are met
* resolving dependencies between tasks
If until today you used cron to schedule jobs, this could be the right time to adopt a well established tool like Apache Airflow for addressing this complexity.
Apache Airflow is an open source project written in Python for programmatically author, schedule and monitor batch execution of tasks.
You can design your pipelines according to a determined logic: decide which actions to perform, retry them if errors occur, skip tasks if dependencies are not met, access monitor and log status through a friendly and powerful web UI, and a lot more.
A very nice feature of Airflow is that all the above is configured and defined in Python code.
Therefore the Airflow pipelines can benefit from the advantages of the software development process (such as peer-reviews, automated testing and version control).
In this workshop we’ll go over basic Airflow concepts and we’ll setup an instance for orchestrating an inference pipeline for a machine learning model.
Details for Audience
- It assumes no previous Airflow knowledge.
- The main purpose is creating a basic train and inference pipeline with Airflow.
- It is not about a particular model / ML method.
- It's not an advanced Airflow workshop.
- It is not suitable for Python beginners.
Workshop Requirements
- Docker installed.
- Any editor (Sublime, PyCharm, Vim, Atom).
- Verify that Docker works properly.
- Ensure that you allocated 4gb of RAM for the Docker Engine. (Can be done via desktop app, check Preferences section. After setting up, restart Docker App)
- Download the Airflow Docker image:
docker pull puckel/docker-airflow
- Download repository under the
$HOME
directory.
git clone https://github.com/deliveryhero/pyconde2019-airflow-ml-workshop
expert
Abstract as a tweet:Automate your machine learning and data pipelines with Apache Airflow
Domain Expertise:some
Domains:Big Data, Infrastructure, Machine Learning, Data Engineering
Public link to supporting material:https://github.com/deliveryhero/pyconde2019-airflow-ml-workshop