Download presentation
Presentation is loading. Please wait.
1
High Availability For Nagios Mike Weber mweber@spidertools.com
2
20122 Alternatives Daily Image Creation for Restore (VMWare, etc.) - lose parts of history - create gaps in monitoring with image creation rsync to Synchronize Servers - requires IP address, hostname changes - requires modification of nagios.cfg - assumes Master will never be misconfigured - rsync can use a lot of resources Clustered Nagios Server
3
20123 Alternatives: Redundant Monitoring
4
20124 Alternatives: Redundant Monitoring
5
20125 Alternatives: Failover
6
20126 Alternatives: Failover
7
2011 Nagios World Conference 7 Perfect Solution: Does Not Exist
8
20128 High Availability: Outline of Goals Create Master/Slave Relationship Master Sends History to the Slave Slave Not Check Services, Hosts or Notifications Slave Monitors Master via Script Slave Enables Host, Service Checks and Notifications Slave Disables All Checks when Master is Up Simplicity
9
20129 Failover and Performance Enhancement
10
201210 Test Server: Puppet Master
11
201211 Step #1: Clone Master to Slave Backup Master Databases and Files - MySQL databases - Postgres database Backup Files - /usr/local/nagios - /usr/local/nagiosxi Install all dependencies for plugins Enable Access from Slave on all devices
12
201212 Step #2: Disable Slave Edit nagios.cfg execute_host_checks=0 execute_service_checks=0 enable_notifications=0 Save and Restart Nagios
13
201213 Step #3: Enable NSCA Master Sends History via NSCA - edit nagios.cfg (save and restart Nagios) obsess_over_hosts=1 obsess_over_services=1 Slave Maintains History via NSCA - install NSCA daemon on slave - allow connections from Master
14
201214 Master: Allow Outbound Transfers
15
201215 File Found in /usr/local/nagios/etc send_nsca-192.168.5.211.cfg # CONFIGURED BY NAGIOS XI password=LMb674FcsswP encryption_method=3 Master: Outbound Config
16
201216 default: on # description: NSCA (Nagios Service Check Acceptor) service nsca { flags = REUSE socket_type = stream wait = no user = nagios group = nagios server = /usr/local/nagios/bin/nsca server_args = -c /usr/local/nagios/etc/nsca.cfg --inetd log_on_failure += USERID disable = no only_from = 127.0.0.1 192.168.5.211 } Slave: NSCA Config
17
201217 Slave: Allow Inbound Transfers
18
201218 Step #4: Slave Monitor Master via SSH Create SSH Keys on Slave - push public key to master Create authorized_hosts file on Master Implement SSH script to check Master - passwordless login - set on a cron job (check every minute) - script detects status of Master - scripts turns on/off checks and notifications
19
201219 Create Key Pair su – nagios mkdir.ssh cd.ssh ssh-keygen -b 1024 -f id_dsa -t dsa -N '' Generating public/private dsa key pair. Your identification has been saved in id_dsa. Your public key has been saved in id_dsa.pub. The key fingerprint is: 61:23:17:2d:83:d8:d9:f9:87:2d:e1:6d:e6:3d:cb:5c nagios@slxi The key's randomart image is: +--[ DSA 1024]----+ | o +.o | |. + =.o | |. == = | | + o= * | | S *. | |. o E| | o + | | + | | +-----------------+
20
201220 Push Public Key to nagios user on Master scp id_dsa.pub nagios@192.168.5.211:/home/nagios/.ssh/slave This means that the nagios user must have a /home/nagios/.ssh directory. The public key name is changed to “slave” to avoid overwriting any keys. On the master (as the nagios user): cat slave >> authorized_keys chmod 644 authorized_keys
21
201221 Slave: Cron Job # /etc/cron.d/nagiosxi: crontab fragment for nagiosxi * * * * * nagios /bin/sh /usr/local/nagios/libexec/eventhandlers/check_master.sh
22
201222 Slave: check_master.sh #!/bin/bash masterip=192.168.5.210 function disable () { sed -i 's/execute_host_checks=1/execute_host_checks=0/' /usr/local/nagios/etc/nagios.cfg sed -i 's/execute_service_checks=1/execute_service_checks=0/' /usr/local/nagios/etc/nagios.cfg sed -i 's/enable_notifications=1/enable_notifications=0/' /usr/local/nagios/etc/nagios.cfg /sbin/service nagios reload } function enable () { sed -i 's/execute_host_checks=0/execute_host_checks=1/' /usr/local/nagios/etc/nagios.cfg sed -i 's/execute_service_checks=0/execute_service_checks=1/' /usr/local/nagios/etc/nagios.cfg sed -i 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg /sbin/service nagios reload } nagpid=$(ssh nagios@$masterip /etc/init.d/nagios status | grep running |wc -l) if [ $nagpid -eq 0 ]; then echo "Starting Checks" enable fi if [ $nagpid -eq 1 ]; then echo "Stopping Checks" disable fi exit 0
23
201223 Assumptions: Based on Simplicity Mature Implementation -set up once implementation of network is primarily complete Master Down Short Amount of Time - slave not send history to Master on return Master and Slave Independent of Updates - no rsync - guarantees integrity of one system
24
201224 Master
25
201225 Slave
26
201226 Master: Service States
27
201227 Slave: Service States
28
2011 Nagios World Conference 28 Problems
29
201229 NSCA: Version 2.9.1 Plugin Buffer is Larger * NSCA Server Receives OK * NSCA Sending Adds Wrong Information Replace with Version 2.7.2 on Master * send_nsca * Located in /usr/local/nagios/libexec
30
2011 Nagios World Conference 30 Questions?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.