Periodic Message "output Monitor Failed: Read Timed Out" And "output Could Not Retrieve Filesystems"

by Oct 4, 2014

We receive Periodic Message “Output Monitor failed: Read timed out” and “Output Could not retrieve filesystems”

 

These occur on various servers at various times.  The servers, for most times, get OK status from Uptime.

These messages are generally followed by an OK Recovery.  Besides today, this particular server messaged the same CRIT error 16 days ago, at a different time.  Results in between were OK.

 

How do I stop these CRITICAL Messages, when the server is up?  What makes them occur.

 


Output Could not retrieve filesystems

 

Service Prod Disk Space (member)

DateTime 3/Oct/2014 11:20

Server Bsrvr Status CRIT


Output Monitor failed: Read timed out

 

Service UPTIME-Bsrvr

DateTime 3/Oct/2014 11:19

Server Bsrvr Status CRIT


Output up.time agent running on Bsrvr, up.time Windows Agent 6.0.0 (build 50)

 

Service UPTIME-Bsrvr

DateTime 3/Oct/2014 11:24

Server Bsrvr Status OK/recovery


Output All file systems are within acceptable levels

 

Service Prod Disk Space (member)

DateTime 3/Oct/2014 11:30

Server Bsrvr Status OK/recovery


Here is the Current System Status: Bsrvr [Windows 7/Server 2008 R2]

 

 

CPU Performance Check (member)OK 2014-10-03 12:38:41+ 6 days 8hAll checks are within bounds

 

Is_Mirror_Synchronized SecureNet OK 2014-10-03 12:46:31+ 6 days 8hProcess returned with valid status – d:jobsuptime>ech

 

Ping All Servers (member) OK 2014-10-03 12:43:43+ 6 days 8hPing completed: 5 sent, 0.0% loss, 0.21ms average round trip

 

PING-Bsrvr OK 2014-10-03 12:44:56+ 6 days 8hPing completed: 5 sent, 0.0% loss, 0.25ms average round trip

 

Prod Disk Space (member) OK 2014-10-03 12:40:54+ 1h 17mAll file systems are within acceptable levels

 

SQLJobLastRun_SecureNet_Robot OK 2014-10-03 12:44:43+ 6 days 8hProcess returned with valid status – d:jobsUpTimeSQLJob

 

UPTIME-Bsrvr OK 2014-10-03 12:44:19+ 1h 23mup.time agent running on Bsrvr, up.time Windows Agent 6.0.0

 


 

—- We also get this one, which in similar.  Different Servers, Different times, spread out, while the rest of the services respond OK.

___

Server SLOT8 Status CRIT

Service DNS_SLOT8

Output Monitor failed: ICMP Port Unreachable DateTime 3/Oct/2014 07:24

___

Server SLOT8 Status OK/recovery

Service DNS_SLOT8

Output Address slot08b.bswa.local resolves to 192.xx.xx.xx authoritatively DateTime 3/Oct/2014 07:44

___

 

 

They Do report Correct when Disk Space is Full or CPU is Maxing, although the server above with the Output Fail is not the one reported as maxed.  It has plenty of space.


Output WARN:

C: 87% used is greater than 85%

 

Service Prod Disk Space (member)

DateTime 3/Oct/2014 00:56

Server Bsrvr2 Status WARN


CPU Usage Bsrvr2 Status CRIT

 

Service CPU Performance Check (member)

Output CPU Check: 99.7% >= 90%

DateTime 3/Oct/2014 09:43