If a tree falls in a forest and no one is around to hear it, does it make a sound?
In my article about incidents, I wrote:
“Raise an incident when you think a postmortem would be useful for what you are seeing.”
This also includes raising an incident for a near miss. A near miss refers to an event that could have caused an incident, outage, security breach, or service degradation — but ultimately did not. Near misses can be a great source of learning, and reducing near-misses can reduce actual incidents. Making them visible can also be an integral part in making your organisation more resilient to failures.
Using your incident processes also for a near miss can be great. In my previous article about incident priorities, I left out a crucial incident priority – the one for near misses. As such, I usually recommend organisations also include a priority 4 (for near misses) in their incident priority matrix. This is what the revamped matrix would look like:
| Impact \ Severity | None/Near Miss | Low | Medium | High |
|---|---|---|---|---|
| None/Near Miss | Priority 4 | Priority 4 | Priority 4 | Priority 4 |
| Low | Priority 4 | Priority 3 | Priority 3 | Priority 2 |
| Medium | Priority 4 | Priority 3 | Priority 2 | Priority 1 |
| High | Priority 4 | Priority 2 | Priority 1 | Priority 1 |
I usually recommend only paging/alerting when customers actually are impacted. As such, near misses are usually not paged for at all. Instead, they are discovered through dashboards, metrics, and logs - or simply brought up when someone discovers they were close to making a mistake. The latter obviously requires a large amount of psychological safety.