2020-07-10 –, Warsaw Meetup [Session starts: Friday 10.07 6pm (Friday 10.07 9am PDT)]
In this talk we will review how Airflow helped create a tool to detect data anomalies. Leveraging Airlfow for process management, database interoperability, and authentication created an easy path forward to achieve scale, decrease the development time and pass security audits. While Airflow is generally looked at as a solution to manage data pipelines, integrating tools with Airlfow can also speed up development of those tools.
The Data Anomaly Detector was created at One Medical to scan thousands of metrics per day for data anomalies. It's a complicated tool and much of that complexity was outsourced to Airflow. Because the data infrastructure at One Medical was already built around Airflow, and Airflow had many desirable features, it made sense to build the tool to integrate closely with Airflow. The end result was more time could be spend on building features to do statistical analysis, and less effort had to be spent on database authentication, interoperability or process management. It's an interesting example of how Airflow can be leveraged to build data intensive tools.
Blaine is Sr Data Engineer at One Medical where he builds solutions for data pipelines and infrastructure. Blaine has 15+ years of experience within Data Engineering at various companies including MySpace, Chegg, Linkedin, and Microsoft.