Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove references to "root cause(s)", as incidents should focus on all "contributing factors", including the trigger, that lead to the incident #95

Open
theckman opened this issue Apr 25, 2019 · 0 comments

Comments

@theckman
Copy link

One change we are seeing in our industry is the wider adoption of the belief that being able to distill an incident down to a single root cause is a myth[1][2]. As the complexities of our systems grow the complexities of our incidents grow, and trying to isolate an incident to one item doesn't result in the types of learnings we need to come out of those incidents.

The truth is that each incident is unique because of the multiple factors that contributed to it, and if any one of those factors was different it would have been a completely different incident. Without giving each of those factors the same care, we miss the opportunity to solve for those different parts.

While pluralizing "root cause" to "root causes" can get you a good part of the way there, in my experience I've seen that the verbiage change from "root causes" to "contributing factors" is a much bigger change in how people think about it and drive the learnings in the way we want. While I initially was skeptical such a minimal language change would make a difference, I can happily admit I was wrong.

At Netflix we've started to change our internal language around it, and have found a much richer set of learnings from teams after an incident. Being that I was a responder at PagerDuty when we started to form these practices and the inception for this documentation, I feel like it'd be a miss if we didn't iterate on these documents to follow with learnings from our industry.

We, and others, have started to talk about Contributing Factors instead. We still identify what was traditionally called the "root cause", but we listed it as one of the factors (often called out as the trigger).

What are your thoughts on updating the verbiage of this documentation to align with our industry shifting its way of thinking?

[1] https://medium.com/@jpaulreed/dev-ops-and-determinism-966a57e3a5cc
[2] https://en.wikipedia.org/wiki/Fallacy_of_the_single_cause

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants