PyConDE & PyData Berlin 2024

Refactoring Large Programs
2024-04-22 , A03-A04

One of the most challenging tasks in software engineering is cleaning up a complex software with 10,000-100,000 lines of code. The problem gets worse, if you are taking over legacy code. The fact that the Python language does neither enforce strict typing or encapsulation does not help either. What should you do if throwing away everything and rewriting the program from scratch is not an option?

In this tutorial, we will exercise refactoring a larger program that is undocumented, unstructured and untested. We will take a messy example program and work through a list of procedures that may help you in your next big refactoring.


Refactoring Large Programs

You find code and installation instructions for the tutorial on https://github.com/krother/space

One of the most challenging tasks in software engineering is cleaning up a complex software with 10,000-100,000 lines of code. The problem gets worse, if you are taking over legacy code. The fact that the Python language does neither enforce strict typing or encapsulation does not help either. What should you do if throwing away everything and rewriting the program from scratch is not an option?

In this tutorial, we will exercise refactoring a larger program that is undocumented, unstructured and untested. We will take a messy example program and work through a list of procedures that may help you in your next big refactoring. These include:

  • review the code
  • write a minimal test
  • add type annotations
  • extract core data structures
  • separate easily cleanable parts from very bad parts
  • remove excess dependencies
  • be very transparent about which features of the code you trust

The main takeaway of the tutorial is that large-scale refactoring is possible. Although a large refactoring is difficult and costly, you should learn that it can be approached systematically. You will walk away with ideas where to start refactoring. You will also develop your awareness how difficult a complex refactoring is. Looking at a messy codebase realistically is not only important to manage the expectations of clients and stakeholders, it is also important to manage the stress that comes with it.

This tutorial addresses people with fluency in basic Python. You should know how a class in Python works and what a Unit Test is. It helps if you have done simple refactoring before (extract variable, extract function) before. I encourage junior developers to attend the tutorial to learn and discuss how a potentially overwhelming situation looks like.

The tutorial session is structured in the following way:

  • 0:00 Interactive Warm-up with the audience: Who is here?
  • 0:05 Download and inspect code
  • 0:10 Quick code review
  • 0:20 Refactoring I: create a minimal test
  • 0:40 Refactoring II: extract data structures
  • 1:00 Refactoring III: isolate code
  • 1:20 buffer time and Q & A

The messy code and refactoring recipes will be provided to participants through GitHub.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Intermediate

Abstract as a tweet (X) or toot (Mastodon):

Refactor a large Python program that is undocumented, unstructured and untested

Public link to supporting material, e.g. videos, Github, etc.:

https://github.com/krother/space

Kristian is a freelance Python trainer who wrote his first lines of Python in the year 0x11111001111. In his early career he wrote software for life science research. Since 2011, he has been teaching Python and Data Science in Europe. He has translated and written Python books and published teaching material. Kristian has collected 308 stars on Advent of Code. His knowledge about async is, unfortunately, miserable. His favorite Python module is 're'. Kristian believes everybody can learn programming.