Monyog Repeatedly Killed By Sigsegv

by Jan 8, 2016

After running in a fairly stable way for months, we've upgraded to MONyog 6.5 before Christmas, and had the process die 3 times in 4 weeks. Each time the process logged a segmentation fault. Running on Centos 6.5:


[root@(server) MONyog]# cat /etc/centos-release 
CentOS release 6.5 (Final)
[root@(server)  MONyog]# uname -a
Linux (server.fqdn) 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@(server) MONyog]# rpm -qa | grep MONyog
MONyog-6.5.0-0.x86_64

The error was logged 3 minutes after a failed call to the AWS API to get logs from an RDS server, and a curl timeout seems to be involved in the crime. It had been logging a similar failure every 5 minutes; something about that RDS DB isn't happy – but that shouldn't be crashing my entire database monitoring application!


[6.5] [2016-01-07 18:55:59] [Server: (endpointname)] populatemysql.cpp(471) ErrCode:-1 ErrMsg:RDS Error log not present
[6.5] [2016-01-07 18:58:56] linservicemgr.cpp(106) ErrCode:11 ErrMsg:Stopping MONyog: Received signal SIGSEGV -- Segmentation fault!
/usr/local/MONyog/bin/MONyog-bin(_ZN13LinServiceMgr17PrintProgramStackEv+0x9f)[0x4d955f]
/usr/local/MONyog/bin/MONyog-bin(_ZN13LinServiceMgr12SignalActionEi+0x17e)[0x4d948e]
/lib64/tls/libc.so.6[0x7f2a05110a20]
/usr/src/redhat/BUILD/monyog/out/lib/linux-glibc-2.3.2-95.50/release64/lib/libcurl.so.4[0x7f2a058f98d1]
/usr/src/redhat/BUILD/monyog/out/lib/linux-glibc-2.3.2-95.50/release64/lib/libcurl.so.4[0x7f2a058f8d8b]
/usr/src/redhat/BUILD/monyog/out/lib/linux-glibc-2.3.2-95.50/release64/lib/libcurl.so.4(curl_mvsnprintf+0x25)[0x7f2a058f8825]
/usr/src/redhat/BUILD/monyog/out/lib/linux-glibc-2.3.2-95.50/release64/lib/libcurl.so.4(Curl_failf+0xa0)[0x7f2a058ec260]
/usr/src/redhat/BUILD/monyog/out/lib/linux-glibc-2.3.2-95.50/release64/lib/libcurl.so.4(Curl_resolv_timeout+0x144)[0x7f2a058e3554]

We're running the enterprise version, if that's relevant. I'll try to hunt down the customer login details and send over the core dump, but wanted to post something public so anyone else with the same issue can see it's not a one-off!