Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nagios on Tier1 farm Jonathan Wheeler RAL Tier1 Fabric Team 20 th June 2008.

Similar presentations


Presentation on theme: "Nagios on Tier1 farm Jonathan Wheeler RAL Tier1 Fabric Team 20 th June 2008."— Presentation transcript:

1 Nagios on Tier1 farm Jonathan Wheeler RAL Tier1 Fabric Team 20 th June 2008

2 Overview What we had before (Sure) Introduction to Nagios and how it is configured for the farm What might we do next

3 Sure monitoring - 1 Consists of a server and clients Communication via sysreq command Required scripts set up for each client to run checks and report results to server

4 Sure monitoring - 2 3 main tasks: a)check host alive active using ping passive accepting heartbeat messages b)receive alarm messages c)receive backup started and backup finished messages

5 Sure monitoring - 3 Problems: configuration not directly under Tier1 control requires locally-written and locally maintained scripts limited view of farm alarms and state alarms only visible on server screen

6 Introduction to Nagios highly configurable under active development (Nagios 2.11 legacy, Nagios 3.0.2 latest stable) active user community (mailing list) some commercial offerings extensive documentation part of installation allows local extensions

7 Introduction to Nagios – basics -1 Nagios: schedules test commands, for example: is space used in /var filesystem larger than permitted limit accepts results as return code (0 - OK, 1 – warning, 2 – critical, 3/-1 – unknown), and a single line message

8 Introduction to Nagios – basics -2 Nagios (continued): displays via Web interface to authorised users sends notification via e-mail, SMS, RSS, Morse code, jungle drums etc may run an event handler, e.g. if a test fails, then put this batch node offline

9 Introduction to Nagios – networked clients Nagios server can use check_nrpe command to run test on networked client client must be running nrpe client process to –accept and run check requests –accept results and return to server Nagios server can also use ssh or smtp to perform checks (little experience on Tier1)

10 Nagios server Nagios client Nagios client Nagios client Nagios client Single server, many clients

11 Introduction to Nagios – slave servers Running scheduled checks and web server puts heavy load on Nagios server Tier1 uses master and slave servers: –master keeps all results, runs web server and sends notifications –slaves schedule tests, run them and return results to master (using send_nsca command to nsca daemon)

12 Introduction to Nagios – freshness If slave server has crashed: master server checks whether tests have been run to schedule (freshness checking) if test is stale (test results not returned to schedule), master will run test (force check)

13 Master and slaves servers; many clients Master server Slave server Client

14 Introduction to Nagios – clearing alarms If check condition has been corrected and you want to clear alarm before the next scheduled test: can force check (from master or slave) by issuing appropriate formatted command to server scripts available to do this

15 Introduction to Nagios - configuration In our configuration Nagios knows about: –hosts –host groups –services (for checking) –contacts and contact groups –time periods (when tests are valid, when to send contact messages)

16 Introduction to Nagios - configuration Configuration is made simpler by extensive use of templates, for example: – define a template for a generic host –use it to define many other hosts, only changing parameters that are different (e.g. host name, address, group to which it belongs) –can be recursive

17 # Generic host definition template define host{ name generic-host; name of host template notifications_enabled1; Host notifications are enabled event_handler_enabled1; Host event handler is enabled flap_detection_enabled1; Flap detection is enabled process_perf_data1; Process performance data retain_status_information1; Retain status information retain_nonstatus_information1; Retain non-status information register0; Template definition check_commandcheck-host-alive max_check_attempts10 notification_interval720 notification_period24x7 notification_optionsd,u,r }

18 define host{ usegeneric-host host_nameganglia0430 parentsswt-5530-0 aliasGanglia Host hostgroupsaux-services contact_groupsthorne address130.246.183.173 } define host{ usegeneric-host host_nameshelob parentsswt-4400-1 aliasCSF Webserver ……………

19 Introduction to Nagios - plugins Test scripts are known as plugins Can be written in any suitable language: shell script, Perl, C, Pascal About 60 standard plugins (available by RPM from Dag Wieers repository) About 30+ locally written plugins plus 14+ specially written for Castor

20 Nagios links Nagios home page: http://www.nagios.org/ http://www.nagios.org/ For locally written plugins: http://cvs.gridpp.rl.ac.uk/viewcvs/vie wcvs.cgi/nagios/plugins/ http://cvs.gridpp.rl.ac.uk/viewcvs/vie wcvs.cgi/nagios/plugins/ For GridPP information about Nagios: http://www.gridpp.ac.uk/wiki/Nagios http://www.gridpp.ac.uk/wiki/Nagios


Download ppt "Nagios on Tier1 farm Jonathan Wheeler RAL Tier1 Fabric Team 20 th June 2008."

Similar presentations


Ads by Google