Ping Topological Dependency Monitor

by Sep 15, 2006

Hi
We had a big problem monitoring hosts through VPN tunnels. We monitor 2 or three Oracle DB Servers on each of 27 VPN tunnels.

The problem was, whenever a VPN tunnel would go down, we would get a ton of false alarms. So many in fact, (about 10 or 12 services monitored on each host) that we missed a few actual emergencies because the guys that get notifications disregarded the actual alarms as false ones.

We needed to be able to set up topological dependencies, but you cannot set up a topological dependency on the VPN tunnel itself. A lot of times the tunnel goes down though the device doesnt.

Here is what we did:

we had an entry in the hosts file of the monitor box for each DB server that looked like this:
10.3.1.66 abc-db-01 abc-db-01.domain.tld

We use the first name entry for the server itself (abc-db-01)
We then set up a node based on the second name entry ( abc-db-01.domain.tld)

We removed all the services from the node and added the script below as a custom monitor (also the host check)

Script Name /usr/local/uptime4/bin/pingdep
Arguments

Warning
Critical contains CRIT

Then we set up the servers at each site to use the node as a topological dependency

Now we rarely ever get false alarms, and my credibility with our clients has gone way up since I actually proactively notify them of problems.

here is the script. Place it in UPTIMEDIR/bin and call it pingdep. Make sure it has the correct path to perl. Make sure it is executable

#!/usr/bin/perl

############################################################
# ATS Tunnel check Script For Uptime monitoring System
# Written by Rocky Allen for ATS 2006
# Great Idea !!! Thanks Cap Hayes ATS
# This script is distributed under the Perl artistic License
#
############################################################
use strict;
use warnings;
use Net::Ping;

my $hostname = shift;
my @hosts;
##### These should be the last octet of some random hosts on the remote VPN side ###
my @ips = qw(1 13 20 21 241 242 240 254 247 248 11 14 32 64 6 16);
my @foundone;

if (!(open(FH0, '/etc/hosts'))) {
print “CRIT – cant open hosts filen”;
exit 2;
}

while () {
if ($_ =~ /^(d+.d+.d+).d+.*$hostname.*$/) {
push(@hosts, $1);
}
}
close FH0;

foreach (@ips) {
my $host = “@hosts” . '.' . “$_”;
my $p = Net::Ping->new();
my $message = “$host is alive.n”;
push(@foundone, $message) if $p->ping($host,1);
$p->close();
}

if (!(@foundone)) {
print “CRIT – Cannot connect to any hostn”;
exit 2;
}

print “OK – Tunnel is upn”;
exit 0;