PyCon APAC 2023

Test with Confidence: A Deep Dive into Eliminating Flaky Tests in Python
2023-10-27 , track 1

Flaky tests are a common problem in software development that can cause frustration, delays, and even damage to a codebase. These tests may pass or fail inconsistently, making it difficult to trust the results and slowing down development workflows. In this talk, we'll explore the root causes of flaky tests in Python applications and provide practical strategies for finding, resolving, and prevent


Overview

  • What is a flaky test? Learn how the following can cause flaky tests:
    • Asynchronous code,
    • Race conditions, and
    • Environmental factors
  • Are deterministic tests worth investing time into?

Identifying Flaky Tests

  • The ideal goal is to be able to identify flaky tests using an automated solution.
  • If you have a test suite with some flaky tests that randomly fail, you will likely try to develop a solution that automatically retries the tests that fail. In our solution, we wanted to also track the failures that were likely flaky tests and not identifying actual errors.
  • While flaky tests don’t identify an actual “error” in the code, they are an error themselves to the codebase. Without reliable tests, we cannot confidently continue to grow. Sentry is a good choice for keeping tabs on all kinds of errors, so we can store these flaky test "errors" here.
  • Now every test failure that is “fixed” by retrying the test is reported to Sentry as a flaky test “error”, but still does not break CI since it isn’t an error in the actual application. Tests that consistently fail will not be reported to Sentry as a flaky test “error”, but instead will actually break CI because there is an actual error in the application.
  • With the right tests being tracked in Sentry as flaky test “errors”, we can query in Sentry and easily identify all of the flaky tests.
  • I will show an example of how to do this with Pytest

Best Practices

Through doing this while building Sentry, we have identified three best practices to eliminate flaky tests:
- If the flaky test is a low-value test, consider skipping or deleting the test.
- It is best to keep tests isolated. You can do this by using the beforeEach lifecycle hook to reset and prevent side effects.
- use the beforeEach lifecycle hook to reset and prevent side effects
- Keep tests small. Larger tests are more likely to be flaky due to their increased complexity and coverage area. Consider increasing the number of test while reducing the coverage size of each test.

How can you be sure flaky tests are gone?

The other benefit to using Sentry to monitor flaky tests in addition to your errors and performance issues is that your Sentry dashboard will continue to monitor any tests that remain or become flaky. As you implement improvements to your tests, you will be notified via Sentry if any flaky tests remain after each deployment.

By keeping all codebase monitoring, including tests, to one application, you can reduce the amount of developer time wasted tracking down various ways to improve the health of your applications. But an alternative to this could be write your own pytest plugin that uploads the testrun information to an internal service that can store and query the data.

But this is not recommended when you can just use existing software off the shelf to solve your problems. Especially when Sentry is free and easy to self-host.

I like to write code and travel. Ask me about the time I caught piranhas in the Amazon jungle.