Agent Communications Errors

by Nov 10, 2007

Hi All,

I'm currently working with Uptime support on an issue and thought I'd ask you all if you've seen this as well.

The uptime agent on my monitoring station is currently experiencing an issue where the agent, times out 10 – 12 times an hour, along with random service monitors failing to connect.. Also, I see the same issues on just about every other server, though at a much lower frequency, say about 1 – 2 times a day.

If you look in your outage logs do any of you see this issue as well?

Thanks,
Eric

Sample outage log below:
Fri Nov 09 11:51:05 PST 2007 Prod Web Server Stats CRIT retry OK IIS Stats – Current Connections: 0, GET Requests/sec: 0

Fri Nov 09 11:50:46 PST 2007 UPTIME-ProdWeb CRIT retry OK up.time agent running on Pro2ProdWeb, up.time Windows-MS-agent 4.1.0.1511

Fri Nov 09 11:50:06 PST 2007 Prod Web Server Stats OK CRIT retry Unable to contact Agent (Pro2ProdWeb on port 9998)

Fri Nov 09 11:49:47 PST 2007 UPTIME-ProdWeb OK CRIT retry Unable to contact Agent (Pro2ProdWeb on port 9998)

Fri Nov 09 11:02:42 PST 2007 Performance Check (member) OK recovery OK All checks are within bounds – time: 235 ms

Fri Nov 09 11:00:42 PST 2007 Performance Check (member) WARN OK recovery All checks are within bounds – time: 281 ms

Fri Nov 09 10:58:43 PST 2007 Performance Check (member) WARN retry WARN Limited performance data available, no check completed.

Fri Nov 09 10:56:44 PST 2007 Performance Check (member) OK WARN retry Limited performance data available, no check completed.

OR
Fri Nov 09 12:21:14 PST 2007 UPTIME-DB2 CRIT retry CRIT Monitor failed: Read timed out

Fri Nov 09 12:19:15 PST 2007 UPTIME-DB2 OK CRIT retry Monitor failed: Read timed out

Fri Nov 09 12:18:38 PST 2007 File System Check (member) OK CRIT retry Could not contact agent

Fri Nov 09 12:17:14 PST 2007 UPTIME-DB2 CRIT retry OK up.time agent running on ETDB2, up.time Windows-MS-agent 4.1.0.1511

Fri Nov 09 12:15:27 PST 2007 File System Check (member) CRIT retry OK All filesystems are within acceptable levels

Fri Nov 09 12:15:18 PST 2007 UPTIME-DB2 OK CRIT retry Monitor failed: Read timed out

Fri Nov 09 12:13:57 PST 2007 Performance Check (member) WARN retry WARN Limited performance data available, no check completed.

Fri Nov 09 12:13:39 PST 2007 File System Check (member) OK CRIT retry Could not contact agent