Deafult Service Checks For New Element

by Jan 29, 2009

When a new Agent monitored element is added, four service instances are automatically added for the new element. These service monitors are “Configuration Update Gatherer”, “Platform Performance Gatherer”, “UPTIME-agent”, and “PING”.

Consider the scenario where critical and non critical servers are monitored.

Non critical servers are used in a development environment and may be rebooted without warning and this is acceptable and should not raise an alert. However, if the same server is unavailable for lets say two hours, then that needs to be investigated.

The solution to this would be to change the PING service monitor to perform 4 rechecks at 30 minute intervals and then raise the alert.

However, to do this the default installed PING monitor has to be changed on every (let's say 500) server that is monitored.

Clearly this would be a substantial amount of work to accomplish manually (I am currently considering it!).

Alternatively, critical servers should be alerted every 30 minutes, if the problem is oustanding and has not been acknowledged.

Again, this means changing the alert profile that is used for the critical server PING service monitors to be different to the default that was installed when the element was added.

Suggestion. Allow a default set of monitors (including none) to be defined when the host is added to uptime. This would allow the “Service Group” drop down to be used to add a predefined set of services, one of which was defined as the “Host Check”.

The following would also be very useful. The ability to bulk change a defined list of already monitored hosts, so that the “Host Check” service can be changed to a customised service check.

The automatic inclusion of the predefined service monitors, with no control of the defaults, raises several other issues that could be easily resolved by giving control back to the uptime administrator.