PyCon JP 2022

Effective data science teams with Jupyter and databooks
2022/10/15 , pyconjp_3
言語: English

Jupyter notebooks have been around us since 2015. Since then, it has been used in blogs, books and fields such as data science.
Nonetheless, some of the features that make notebooks great also discourage teams from using them. In this talk I'll explore these issues, some solutions, my personal experience and share a python tool I've built to make them more amicable for software teams.


Jupyter notebook is a great tool for quick prototyping and exploration. For this reason, it's very popular in fields such as data science. However, it's JSON-like structure makes it hard to work with notebooks in teams, as it does not cope well with other software tools such as git.

Allowing developers better version notebooks and provide better tools to compare and resolve conflicts can greatly improve the lives of developers. I have used Jupyter notebooks at different data science projects, have experimented with different tools to support better work with notebooks and git and build my own tool (databooks) to that purpose.

Program

  • Agenda
  • Introduction
  • Data science and notebooks
  • Jupyter notebooks
  • What it is
  • Where you can find them
  • Issues with notebooks
  • Solution to those issues
  • Databooks
  • Demo
  • What it is
  • How it works

This talk is not about

  • Why you should use notebooks
  • How does Jupyter notebooks work
  • Data science
  • How to put notebooks in production
  • How does git work

Murilo Cunha is an AI tech lead at Dataroots with a background in Mechanical Engineering and an advanced master’s degree in Artificial Intelligence from KU Leuven, whose main goal is to make AI both useful and accessible. To reach this goal, Murilo takes a pragmatic approach which led him to move more into the direction of data engineering. In line with his passion of enabling AI to make an impact, Murilo developed an expertise in MLOps, meaning that he advocates for automation and monitoring at all steps of ML system construction, including integration and deployment. With his experience in getting ROI on AI initiatives and as an open source supporter, he decided to fill in the existing gap in the tooling that supports data scientists by creating databooks, an open source package to make the life of data scientists easier.