Category Archives: nagios

nagios problems 4.4.3

qh: Failed to init socket ‘/usr/local/nagios/var/rw/nagios.qh’

For this, I had to edit my (/etc/nagios/)nagios.cfg and

QUERY HANDLER INTERFACE
# This is the socket that is created for the Query Handler interface
#query_socket=/var/lib/nagios/rw/nagios.qh 
query_socket=/usr/local/nagios/var/rw/query.sh

Another problem occurred while trying to start nagios..

For this problem, this is the solution found

mkdir /usr/local/nagios/var/rw
chown nagios.nagios /usr/local/nagios/var/rw
service nagios restart

No output on stdout) stderr: connect to address XXX.XXX.XXX.XXX port 5666: Connection refused 

This was tested on a CentOS 7.

This might have TWO possible causes

  • nrpe service is down
    use service nrpe status to test it.
    You might wanna see ‘section’ Add NRPE to service bellow, to activate nrpe as a service.
  • firewall on nrpe machine is blocking it

 

Some important files/directories

nrpe.cfg – /etc/nagios/nrpe.cfg
nagios/nrpe plugins folder – /usr/lib64/nagios/plugins/
logs – /var/log/messages

 

nrpe.cfg

My nrpe.cfg, in a CentOS 7, is located at /etc/nagios/nrpe.cfg

nano /etc/nagios/nrpe.cfg
nrpe log

In a default nrpe installation log is disabled!
You might want to enable it for better debug of the issues… Go to  and enable it.

log_file=/var/run/nrpe.log

nrpe debug mode

Yah!, the default installation will came also with debug disabled.
We want it enabled to see more information while we try to use service nrpe status for example.

debug=1

server_address

Server address… it might confuse you!
Server address ISN’T the NAGIOS’s ip address. It’s the actual external IP address of the current machine! allowed_hosts it’s actually it!

Lets check out the status of 5666 port.
lsof -i:5666
netstat -an |grep 5666

If you don’t see any result… this probably means that nrpe is down!
See the ‘section’ Add NRPE to service bellow.

Open PORT on the firewall
sudo firewall-cmd --permanent --zone=public --add-port=5666/tcp
firewall-cmd --reload
Add NRPE to service  – this will lunch nrpe on reboots –
sudo systemctl enable nrpe.service
sudo systemctl start nrpe.service

Continue reading No output on stdout) stderr: connect to address XXX.XXX.XXX.XXX port 5666: Connection refused 

dashboard for VMware, SNMP, REST API and more

Simple dashboard system for sysadmins with modules for VMware, SNMP, REST API and more

SysAdminBoard is a simple dashboard system written in Python, HTML and Javascript and served on a simple CherryPy Webserver (included). It was originally written to reformat snmp data for the Panic Statusboard iPad App, but has since become a fully stand-alone project that can grab data from a variety of sources and render charts and graphs in a web browser.

Failed to obtain lock on file /var/run/nagios/nagios.pid: No such file or directory

Another nagios update – another issue -…

[1490254976] Event broker module ‘NERD’ deinitialized successfully.
[1490254991] Failed to obtain lock on file /var/run/nagios/nagios.pid: No such file or directory
[1490254991] Bailing out due to errors encountered while attempting to daemonize… (PID=792)

SHIT!

I had to

mkdir /var/run/nagios
chown nagios:nagios /var/run/nagios

 

And restart nagios.

check_yum – YUM output signature is larger than current known format

If you are using /check_yum you might face the following error…

YUM output signature is larger than current known format, please make sure you have upgraded to the latest version of this plugin. If the problem persists, please contact the author for a fix

nano /etc/nagios/nrpe.cfg

set the correct command to check_yum

command[check_yum]=/usr/lib64/nagios/plugins/check_yum --all-updates

and then restart nrpe

/bin/systemctl restart nrpe.service

CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.

 

CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.

check_nrpe-received-0

Testing check_nrpe from nagios server to one of our nrpe hosts..

[18:36][root@ops /etc/nagios/hosts]# /usr/lib64/nagios/plugins/check_nrpe -H 192.XXX.XX.XX -c check_disk -a 60 80 /
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages.

As we can see (first image), nagios server can see/ping the nrpe host…

Taking a look of my logs at the nrpe host located at

  • ubuntu – /var/log/syslog
  • centos – /var/log/messages
Oct 20 14:51:56 ubuntu-512mb-nyc1-01 nrpe[17097]: Error: Request contained command arguments!
Oct 20 14:51:56 ubuntu-512mb-nyc1-01 nrpe[17097]: Client request was invalid, bailing out...

This means that nrpe isn’t configured to allow command arguments.

Some possible ways to fix it!..

POSSIBLE SOLUTION Nº 1

Matt Yakel said that he had to add the following line to to the /etc/services

nrpe 5666/tcp # nrpe

Mine already had it so…. this wasn’t my problem.

POSSIBLE SOLUTION Nº 2

WELL!, since this was taking to long!
I decided to make the request without passing arguments from the server to the host.
This way, we can have dont_blame_nrpe=0.

Like *everyone* else uses, I’v defined check_nrpe_1arg command on my nagios server on file /etc/nagios/objects/commands.cfg.

Edit

nano /etc/nagios/objects/commands.cfg

Add the following lines to it

define command{
   command_name        check_nrpe_1arg
   command_line        $USER1$/check_nrpe -H $HOSTADDRESS$ -C $ARG1$
}

Now, on the file of the desired host configuration file (/etc/nagios/hosts/), lets replace check_nrpe for check_nrpe_1arg and remove the arguments.

Example

define service {
 use generic-service
 host_name sdxxx.host.com
 service_description DISK: root partition
 check_command check_nrpe_1arg!check_disk
}

On nagios nrpe host, on nrpe.cfg we need to use the hardcoded command arguments.

Example

command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 20 -c 10 -p /
command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 300 -c 500 -s RSZDT
command[check_apt]=/usr/lib/nagios/plugins/check_apt

This possible solution worked for me. 🙂

 

nagios check_yum

Okay!, here we go.

Download  check_yum plugin from https://github.com/calestyo/check_yum/blob/master/check_yum to /usr/lib64/nagios/plugins/ on your CentOS nrpe server.

Add the command on your nagios.cfg

command[check_yum]=/usr/lib64/nagios/plugins/check_yum

 

Restart NRPE

/bin/systemctl restart  nrpe.service

And on your nagios server on your host/s5.domain.com.cfg

define service {
    use                    generic-service
    host_name              s5.domain.com
    service_description    SYS: system updates
    check_command          check_nrpe!check_yum!1
}

 

Restart Nagios

/bin/systemctl restart  nagios.service

 

nagios check_apt

 

Return code of 127 is out of bounds – plugin may be missing

captura-de-ecra%cc%83-2016-10-19-as-15-28-05

On the check_command, I had to prepend check_nrpe! since we are using the nrpe client on the server to read and return us the desired value.

No output returned from plugin

no_output_returned_from_plugin

On the check_command, I had to append, with for example !1, since we need to pass $$ARG1$$

NRPE: Command ‘check_apt’ not defined

captura-de-ecra%cc%83-2016-10-19-as-17-13-52

Check you /etc/nagios/nrpe.cfg you need to set it up there.

 

Well, I got it working!

On nagios nrpe server my /etc/nagios/nrpe.cfg I add the following line

command[check_apt]=/usr/lib/nagios/plugins/check_apt

Restart nagios-nrpe

service nagios-nrpe-server restart

 

On nagios server my /etc/nagios/hosts/s4.domain.com.cfg this is what I have

define service {
   use                     generic-service
   host_name               s4.domain.com
   service_description     SYS: system updates
   check_command           check_nrpe!check_apt!1
}

Restart nagios server

service nagios restart

Hooray!

captura-de-ecra%cc%83-2016-10-20-as-07-30-19