The worst outage I never caused
2020-09-04 , The One Obvious Room

In 2017 I came one keypress from causing Google's main backbone to largely fall off the Internet. This is the story of how we used that incident as a learning opportunity, how a lack of buy-in hindered further improvements, and how an existing toolkit of python libraries allowed testing and validation tools to be quickly built, preventing any chance of a recurrence.


In 2017 I came one keypress from causing Google's main backbone to largely fall off the Internet. This is the story of how we used that incident as a learning opportunity, how a lack of buy-in hindered further improvements, and how an existing toolkit of python libraries allowed testing and validation tools to be quickly built, preventing any chance of a recurrence.

Julien is a Senior Site Reliability Engineer at Google Sydney, from 2011 to 2018 he worked on Google's production networks, focusing on Internet routing & interconnection. When not at work he does things like designing custom embedded Linux machines & modernising frequency distribution systems.

He is also the current Secretary of Linux Australia, the parent organisation for PyCon AU.