Recording Outages Before Retrying

by Jan 21, 2011

I am seeing outages recorded even though the service in question has not gone through all the its retires. This causes problems because the outages report then never corresponds with the actual alerts we receive.

A service must fail all of the rechecks that are defined within the service monitor alerting settings before the failure registers as an outage. If the rechecks do not fail or only some of them — one of three, for example — fail, an outage will not be recorded and an alert will not be sent. You can verify this using the Service Outages report. Consecutive outages of WARN or CRIT must be seen to assume that an email alert should have been sent.

When this happens on my up.time server I see that an outage is recorded, but an alert isn't sent.

How do I configure the service not to record an outage unless all retries fail?