Noaa is a full-stack developer, community manager, and tech writer who wishes to encourage developers to deepen the decisions we make during the development processes, research about the technologies we use and share our knowledge. She started her journey in the 8200 Unit of the IDF Intelligence forces where Noaa took her first steps in software development. In the last 4 years, her work has mainly included Angular, .NET, VanillaJS, and Typescript. She currently develops in React, NodeJS and Golang.
When building our Kubernetes-native product, we wanted to find the most common sources of failures, anti-patterns and root causes for Kubernetes outages, so we got to work. We rolled up our sleeves and read 100+ Kubernetes post-mortems. This is what we discovered.
A smart person learns from their own mistakes, but a truly wise person learns from the mistakes of others.
When launching our product, we wanted to learn as much as possible about typical pains in our ecosystem, and did so by reviewing many post-mortems (100+!) to discover the recurring patterns, anti-patterns, and root causes of typical outages in Kubernetes-based systems.
In this talk we have aggregated for you the insights we gathered, and in particular will review the most obvious DON’Ts and some less obvious ones, that may help you prevent your next production outage by learning from others’ real world (horror) stories.