Download presentation
Presentation is loading. Please wait.
Published byAlexia Taylor Modified over 9 years ago
1
Introduction To Nagios A Linux-based Monitoring System
2
What Is Nagios? Nagios is a system that monitors availability of network resources, such as hosts and services. Nagios is a system that monitors availability of network resources, such as hosts and services. It enables you to identify and resolve IT infrastructure problems before they affect critical processes. It enables you to identify and resolve IT infrastructure problems before they affect critical processes.
3
Brief History Originally created under the name NetSaint, it was written and is maintained by Ethan Galstad along with a group of plugin developers Originally created under the name NetSaint, it was written and is maintained by Ethan Galstad along with a group of plugin developers
4
History cont. Launched in March of 1999 under the GNU General Public License Launched in March of 1999 under the GNU General Public License March 2002 due to trademark issues with the name “NetSaint” Ethan decides to rename the project to Nagios, a recursive acronym that stands for “Nagios ain’t gonna insist on sainthood”. March 2002 due to trademark issues with the name “NetSaint” Ethan decides to rename the project to Nagios, a recursive acronym that stands for “Nagios ain’t gonna insist on sainthood”.
5
Requirements * A machine running Linux or Unix-variant * A machine running Linux or Unix-variant * C compiler, e.g. gcc * C compiler, e.g. gcc TCP/IP configured TCP/IP configured CGIs (optional) apache web server, Thomas Boutell’s gd library version 1.6.3 or higher. Used by the statusmap and trends CGIs. CGIs (optional) apache web server, Thomas Boutell’s gd library version 1.6.3 or higher. Used by the statusmap and trends CGIs. * Must have
6
What Can Nagios Monitor? Applications (tomcat servers) Applications (tomcat servers) Host Resources (cpu load, disk space) Host Resources (cpu load, disk space) Infrastructure components (routers, switches) Infrastructure components (routers, switches) Database servers (mySQL, Oracle) Database servers (mySQL, Oracle) Network services (http, ssh, ping) Network services (http, ssh, ping) Web servers Web servers Mail servers Mail servers
7
Nagios Configuration Nagios.cfg Nagios.cfg CGI.cfg CGI.cfg Resource.cfg Resource.cfg Object Definition Files Object Definition Files Commands Commands Hosts and Services Hosts and Services Contacts and contact groups Contacts and contact groups Plugins Plugins Homemade Plugins Homemade Plugins
8
Commands and Plugins A plugin is an executable or script that can be run from the command line and returns an exit code of 0=ok, 1=warning, 2=critical or 3=unknown A command consists of a plugin plus macros and is used to perform the host or service check. define command { command_name check_host-alive command_line $USER1$/check_ping $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1 } define host { host_name glastlnx19.slac.stanford.edu check_command check_host_alive }
9
Homemade Plugins
10
Host and Service Definitions define host { use generic-host; Name of host template to use use generic-host; Name of host template to use host_name glastlnx19.slac.stanford.edu host_name glastlnx19.slac.stanford.edu alias glastlnx19 alias glastlnx19 address 134.79.200.39 address 134.79.200.39 check_command check-host-alive check_command check-host-alive max_check_attempts 10 max_check_attempts 10 check_period 24 x 7 check_period 24 x 7 notification_interval 120 notification_interval 120 notification_period 24 x 7 notification_period 24 x 7 contact_groups core contact_groups core}
11
define service { use generic-service use generic-service host_name glastlnx19.slac.stanford.edu host_name glastlnx19.slac.stanford.edu service_description Web App Telemetry Trending – tomcat 12 service_description Web App Telemetry Trending – tomcat 12 is_volatile 0 is_volatile 0 check_period 24 x 7 check_period 24 x 7 max_check_attempts 4 max_check_attempts 4 normal_check_interval 5 normal_check_interval 5 retry_check_interval 1 retry_check_interval 1 contact_groups core contact_groups core notification_options w,u,c,r notification_options w,u,c,r notification_interval 960 notification_interval 960 notification_period 24 x 7 notification_period 24 x 7 check_command check_jmx!-uservice:jmx:rmi://jndi/rmi://glast- tomcat12.slac.stanford.edu:8081/jmxrmi!- mCatalina:j2eeType=WebModule,name=//localhost/TelemetryTrending,J2E EApplication=none,J2EEServer=none!-astate!-e1 check_command check_jmx!-uservice:jmx:rmi://jndi/rmi://glast- tomcat12.slac.stanford.edu:8081/jmxrmi!- mCatalina:j2eeType=WebModule,name=//localhost/TelemetryTrending,J2E EApplication=none,J2EEServer=none!-astate!-e1}
12
Nagios File Structure For Fermi
13
Monitoring
14
Nagios Remote Plugin Executor
15
Chronological Progression Of Service State
16
Notifications
17
Nagios Web Interface http://glast-nagios.slac.stanford.edu/nagios/ http://glast-nagios.slac.stanford.edu/nagios/ http://glast-nagios.slac.stanford.edu/nagios/
18
Contacts and Contact Groups define contact { contact_name Brian contact_name Brian alias Brian Van Klaveren alias Brian Van Klaveren service-_notification_options w,u,c,r service-_notification_options w,u,c,r service_notification_period 24 x 7 service_notification_period 24 x 7 service_notification_commands notify_by_email service_notification_commands notify_by_email host_notification_commands notify_by_email host_notification_commands notify_by_email email bvan@slac.stanford.edu email bvan@slac.stanford.edu} define contactgroup { contactgroup_name oracle_load_group contactgroup_name oracle_load_group alias Oracle Load Group alias Oracle Load Group members Brian, Tony members Brian, Tony}
19
Host and Service Definition define host{ use generic-host ; Name of host template to use host_name glast-astro-db1.slac.stanford.edu host_name glast-astro-db1.slac.stanford.edu alias glast-astro-db1 alias glast-astro-db1 address 134.79.200.16 address 134.79.200.16 check_command check-host-alive check_command check-host-alive max_check_attempts 10 max_check_attempts 10 check_period 24x7 check_period 24x7 notification_interval 120 notification_interval 120 notification_period 24x7 notification_period 24x7 notification_options d,r notification_options d,r contact_groups oracle_load_group contact_groups oracle_load_group } define service{ use generic-service ; Name of service template to use host_name glast-astro-db1.slac.stanford.edu host_name glast-astro-db1.slac.stanford.edu service_description Oracle Astro Pass 7 service_description Oracle Astro Pass 7 is_volatile 0 is_volatile 0 check_period 24x7 check_period 24x7 max_check_attempts 4 max_check_attempts 4 normal_check_interval 5 normal_check_interval 5 retry_check_interval 1 retry_check_interval 1 contact_groups oracle_load_group contact_groups oracle_load_group notification_options w,u,c,r notification_options w,u,c,r notification_interval 1800 notification_interval 1800 notification_period 24x7 notification_period 24x7 check_command check_oracle2!/@astro_pass7 check_command check_oracle2!/@astro_pass7 }
20
Plugins Plugins are executables or scripts that can be run from a command line and return an exit code Plugins are executables or scripts that can be run from a command line and return an exit code homemade plugins (aka commands) are built from plugins and macros; Nagios can call external programs using these commands homemade plugins (aka commands) are built from plugins and macros; Nagios can call external programs using these commands
21
define service { use generic-service; Name of service template use generic-service; Name of service template host_name glastlnx19.slac.stanford.edu host_name glastlnx19.slac.stanford.edu service_description Ping service_description Ping is_volatile 0 is_volatile 0 check_period 24 x 7 check_period 24 x 7 max_check_attempts 4 max_check_attempts 4 normal_check_interval 5 normal_check_interval 5 retry_check_interval 1 retry_check_interval 1 contact_groups core contact_groups core notification_options w,u,c,r notification_options w,u,c,r notification_interval 960 notification_interval 960 notification_period 24 x 7 notification_period 24 x 7 check_command check_ping!100.0,20%!500.0,60% check_command check_ping!100.0,20%!500.0,60%}
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.