Modernizing development workflow for a 7-year old 74K LoC Python project using Pantsbuild
10/15, 13:50–14:20 (Asia/Tokyo), pyconjp_2
言語: English

Mono-repository or not? That is a boggling question for many medium-to-large-sized development teams. As a growing company, we had to onboard new hires quickly while coping with flooding customer requests and increasing codebase complexity. We have merged 7 repositories into a single one and migrated to the Pantsbuild system, a Python-friendly modern build system. Here is our story!


  • Discussions about mono-repository vs. multi-repository
    • Both has its own pros and cons. First let’s review existing discussions about the choices.
    • How to define the criteria to merge a repository?
  • Problems with the prior art in my team
    • Making a single release takes several hours.
    • We had to create multiple PRs to different repositories for a single conceptual feature or bugfix. Both code authors and reviewers had difficulties on context switching.
    • We had to establish custom practices like synchronizing branch names between different repositories for CI.
    • Linking multiple PRs with a single issue on GitHub did not work as we expected.
    • We often forgot to switch branches in multiple repository clones of related components while working on a single issue.
    • The development-setup script became too complex.
  • A short intro about Pantsbuild
    • http://pantsbuild.org/
    • Why did we choose this? (compared to Bazel, etc.)
    • Let’s share a first glance on basic usage
  • Migration process with Pantsbuild
    • About the new mono-repo directory structure and importing per-package repository
  • Customization for our codebase
    • I wrote a few custom Pants plugins for setup.py generation, towncrier tooling, and a dependency injector for platform-specific prebuilt binaries.
    • I wrote a custom package entrypoint scanner as no package metadata is available in Pants-based execution environments.
  • Experience after migration
    • Now making a release takes less than 10 minutes.
    • The on-site engineering team has confidence with version compatibility of all components as they now share a single unified version number.
    • A single issue now has a single unified PR, making GitHub Projects more useful.
    • Writing and reviewing a PR across multiple components is now a breeze. We can see all relevant changes including documentation at a single place.
  • Recap
    • It was a long and difficult journey, which took more than one month.
    • But it was worth, and I hope that my experience and customization could help others going for the mono-repo migration with complex Python projects.
    • Great community support was a tremendous help during the whole migration process.

Joongi is the creator of Backend.AI and the CTO of Lablup, where he oversees the development of MLOps pipelines and GPU-accelerated AI services. He earned his Ph.D. in Computer Science from KAIST by creating a GPU-accelerated packet processing framework with world-leading speed of 80 Gbps. His major areas of interest include scalable and automated backend systems, as well as their analysis and design. He's also a big fan of open source, having contributed to projects like Python, iPuTTY, Textcube, aiodocker, aiohttp, pyzmq, DPDK, and others.