What about tests in Machine Learning projects?
2019-09-04 , Track 3 (Oteiza)

Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits?


Once your machine learning POC seems promising and your development environment is set up, the next step is to refactor your code and write TESTS. We know that a lot of people think tests are too complicated and boring to write and they are not very useful. Some manual checks can address the need.

It is not totally false. Tests can be really boring and time consuming to write when you don't have the right tools, the right APIs, the right environments or the right code structure.
But it is always a bad idea to ignore tests or to perform them manually. If you want to be involved in your project life cycle, if you want to bring it from POC to production you need to care about tests. After some years tackling production bugs, you can't feel safe delivering without tests as you can't start driving until your seat belt is fastened.

There is more than one way to test. Tests can be split on several levels (unit, component, functional, performances, etc...) to be able to quickly identify the faulty code/data/parameter. Tests must also be automated in a Continuous Integration and run at least on each experiment before merging it in the baseline pipeline as it is done in software engineering (the CI is triggered on each feature branch).

This talk is about how to easily write tests and testable code, how to avoid most common traps and what are the benefits of tests on unrealistic data in your Machine Learning project.

(Tests on real data are also really important but they are not the main purpose of this talk.)

Slides are here: sdg.jlbl.net/slides/tests_for_datascientist/presentation.html


Abstract as a tweet

Good practices tell you must write tests! But testing Machine Learning projects can be really complicated. Test writing seems often inefficient. Which kind of test should be written? How to write them? What are the benefits?

Python Skill Level

professional

Domain Expertise

none

Domains

General-purpose Python, Machine Learning