simplemonitor – who watches the Nagios watchman?
I rely on Nagios to monitor my smart home devices for me. I use it to get details on specific services and hosts and to notify me if any of them stop working or responding. One thing that’s always bothered me is that if Nagios stops running or if the Raspberry Pi it’s installed on stops working how will I know that it’s down? I don’t want to be in the dark and think that everything is fine.
simplemonitor is the solution! Thanks to @jamesoff and his work on simplemonitor, I now have a reliable way to watch Nagios to make sure it doesn’t go belly up. The premise of simplemonitor is very much like Nagios. You first define a host, then you can layer on top of that host with custom monitors.
I installed simplemonitor on my AWS EC2 instance, so I’m able to monitor it outside of my house. I chose to do this because I wanted to know if I had a power outage or internet outage from home. First line of defense in knowing if my Nagios is working properly is knowing if it can even get to the internet.
Since I’m monitoring from outside my house, I first opened a port on my firewall and did a port forward to Nagios. I verified I could get to Nagios using my Dynamic DNS hostname + the port I opened.
Then on my EC2 instance I installed simplemonitor with git clone https://github.com/jamesoff/simplemonitor
Open the monitor.ini
file and specify some system settings:
[monitor]
monitors=/home/pat/simplemonitor/monitors.ini
interval=60
[reporting]
loggers=db
alerters=gmail
[db]
type=db
db_path=/home/pat/simplemonitor/monitor.db
only_failures=1
[gmail]
type=execute
fail_command=/bin/bash -c "/usr/bin/printf \"The simplemonitor for {name} has failed on {hostname}.\n\nTime: {failed_at}\nInfo: {info}\n\" | /usr/bin/mailx -A gmail -s \"PROBLEM: simplemonitor {name} has failed from {hostname}.\" your@email"
success_command=/bin/bash -c "/usr/bin/printf \"The simplemonitor for {name} is now successful on {hostname}.\n\" | /usr/bin/mailx -A gmail -s \"RECOVERY: simplemonitor {name} is successful from {hostname}.\" your@email"
A few things are going on here, I am monitoring every 60 seconds, and when there’s an alert I execute a bash command for mailx
to send me an email on failure and success. You’ll need mailx
with a Gmail account profile for this to work. Modify to your needs if you end up going this route.
Then open monitors.ini
and define your monitors:
[home-ping]
type=host
host=home.myddns
tolerance=2
[home-nagios]
type=http
url=http://home.myddn:port
gap=300
depend=home-ping
If you’re using this as an example, replace home.myddns
with your DDNS hostname, and replace port
with the port you opened for Nagios.
In the host
type definition, the tolerance=2
means that after 2 ping checks failing, the host goes into failure state (and send me an email based on the monitor.ini
config)
In the http
type definition, the gap=300
means that it’ll only check the Nagios website every 5 minutes. The depend=home-ping
means that if the home-ping check is in a failure state, then don’t worry about this check because there’s something else wrong.
I set my simplemonitor to run on reboot by running crontab -e
and adding:
# Start simplemonitor at reboot
@reboot /usr/bin/python2 /home/pat/simplemonitor/monitor.py -f /home/pat/simplemonitor/monitor.ini >/dev/null 2>&1
However if you want to run it now without a reboot you can run this command which will send it to the background and monitoring will start right away.
/usr/bin/python2 /home/pat/simplemonitor/monitor.py -f /home/pat/simplemonitor/monitor.ini >/dev/null 2>&1 &
That’s it! A reliable way to determine if my Raspberry Pi Nagios is running as it should. Now to complete this setup would be to add a Nagios service check to make sure the /home/pat/simplemonitor/monitor.py
file is running on the AWS server within ps aux
.