Nagios – Our Open Source Network Management Solution Presenter: Ling Zhang LBLnet Services Group Information Technologies and Services Division LBNL
Contributors Nagios software design and development: Ethan Galstad (www.nagios.org) System integration, configuration, testing: Ling Zhang, Greg Bell, Harper Mann, Cedric Hui, Clark Wood, Mike Bennett 18 September 2018 ITSD/LBNL
Goals for this talk To explain: To discuss LBLnet’s point of view of Network Management System network monitoring problems we encountered the design of our Nagios network monitoring system To discuss the benefits of the nagios system our future development goals 18 September 2018 ITSD/LBNL
Our point of view of a NMS Proactive network management Alarm Panel Connectivity Performance Fault isolation Trend Analysis Capacity planning The Notification Precise Fast 18 September 2018 ITSD/LBNL
Background Information Network Monitoring tools we have tested and/or used before: Sun Net Manager Spectrum Whatsup Gold Netmon SNMPc Ipmonitor HP Openview OpenNMS InCharge Home grown scripts MRTG/RRDtool etc. 18 September 2018 ITSD/LBNL
Background Information Our fair share of problems with NMS: Notification storm 65 notifications were received during a router up/down event. The router has 20 active interface and 32 downstream monitored devices False alarms Integration with existing systems (MRTG, Trouble ticket system) Tech support our longest outstanding tickets: 2 years and counting Budget 18 September 2018 ITSD/LBNL
In Search of a Better NMS Accurate and efficient fault detection Good performance Extensible Can be integrated with our existing system Low maintenance Fits our budget 18 September 2018 ITSD/LBNL
Features of Nagios Open source system runs on most Unix system Highly extensible Reliable dependency monitoring Excellent service monitoring capabilities Ability to schedule maintenance periods Flexible notification 18 September 2018 ITSD/LBNL
Our Nagios Topology LBLnet NMS diagram 18 September 2018 ITSD/LBNL
Nagios Extensibility Plugins Event handlers External commands 18 September 2018 ITSD/LBNL
Nagios Extensibility - Plugins Compiled executables or scripts (Perl, shell, etc.) Run by nagios process Checks device or service status Example: define host { host_name switch1 address 1.2.3.4 check_command ping_switch } define service { host_name switch1 Service_description CPU Util check_command get_cpu_util 18 September 2018 ITSD/LBNL
Services Monitored by Nagios Nagios uses plugins to check service status DHCP DNS FTP HTTP HTTPS IMAP NTP Radius SMTP SQL TFTP WINS etc. 18 September 2018 ITSD/LBNL
Nagios Extensibility – Event Handelers Compiled executables or scripts Run by nagios process Triggered by host or service status change Example: define service{ host_name somehost service_description HTTP max_check_attempts 4 check_command check_http event_handler restart-httpd ...other service variables... } 18 September 2018 ITSD/LBNL
Nagios Extensibility – External Commands A predefined set of commands issued externally to control the behavior of nagios Controls notification, monitor scheduling, program start/stop Issued by external applications (CGI, snmptrapd, etc.) Reads in by nagios core process during run time Example User disabled monitoring of switch1 from web interface CGI wrote command “disable monitor switch1” to command file Nagios process read this command and stopped scheduling monitoring for switch1 18 September 2018 ITSD/LBNL
Monitoring Network Devices Ping Measures system responsiveness via average RTT SNMP get CPU Temperature Interface/port status System up time Power supply status Throughput Packet discard rate etc. SNMP trap 18 September 2018 ITSD/LBNL
Nagios Trap handling Requires Net-SNMP or other trap receiver daemon Trap receiver notifies nagios about traps received via External Commands Nagios calls event handlers and/or notifies user 18 September 2018 ITSD/LBNL
Dependency Configuration define host { use switch-tmpl host_name switch1 address 1.2.3.10 parents router1 } host_name switch2 address 1.2.3.20 parents switch1 host_name switch3 address 1.2.3.30 host_name switch4 address 1.2.3.40 parents switch2 Diagram 18 September 2018 ITSD/LBNL
Nagios Notification Similar to event handlers Triggered by host/service status change Calls third party notification tools (sendmail, qpage, etc.) Supports email, page, instant messaging etc. 18 September 2018 ITSD/LBNL
Nagios Notification format Email Subject: switch3 (1.2.3.30) DOWN Host: switch3 Address: 1.2.3.30 Date/Time: Thu Jul 15 14:03:37 PDT 2004 Additional Info: (No Information Returned From Host Check) Page DOWN switch3(1.2.3.40) 18 September 2018 ITSD/LBNL
Maintenance Scheduling Schedule a maintenance window via Nagios web interface Uses external commands Fixed window Float window Dependency aware 18 September 2018 ITSD/LBNL
Monitoring Subnet with Redundant Network Connections Solution: Monitor interface up/down status via Ping Monitor HSRP status via HSRP mib Challenge: Monitoring interface status Monitoring standby status at the same time 18 September 2018 ITSD/LBNL
Performance of Nagios False alarms Notification delay False positive False negative Unnecessary Notification delay Before: 303 sec After: 221 sec 18 September 2018 ITSD/LBNL
Money and Time Saved Software package cost InCharge ($$$) IPmonitor ( $1500) Nagios ($0) Software maintenance contract cost InCharge (>$15,000) IPmonitor ($500) Time saved from less unnecessary alarms (Compared to IPmontior) 20 man.hrs/month 18 September 2018 ITSD/LBNL
Future development of Nagios Performance Monitoring Network element out of resources Interface buffer drops Duplex mismatch Has to be done by inference Assume heterogeneous network equipment No use of host SNMP Derive from combination of interface error types and rates Integrating with other NMS elements Syslog MRTG/RRDtool Trouble ticket System Database Topology discovery 18 September 2018 ITSD/LBNL
Conclusion Nagios fits our Network Management needs because: Accurate and efficient fault detection Extensibility Can be easily integrated with our existing system Low maintenance Fits our budget Delete sample document icons and replace with working document icons as follows: From Insert Menu, select Object... Click “Create from File” Locate File name in “File” box Make sure “Display as Icon” is checked Click OK Select icon From Slide Show Menu, Select “Action Settings” Click “Object Action” and select “Edit” 18 September 2018 ITSD/LBNL
Thanks! We are happy to share Questions / comments send to lblnet@lbl.gov Delete sample document icons and replace with working document icons as follows: From Insert Menu, select Object... Click “Create from File” Locate File name in “File” box Make sure “Display as Icon” is checked Click OK Select icon From Slide Show Menu, Select “Action Settings” Click “Object Action” and select “Edit” 18 September 2018 ITSD/LBNL