Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,"— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic, SRCE EGEE’08

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 2 Overview Introduction Architecture Standard grid probes Credential management Remote gatherers MSG bridge Nagios Config Generator Remote gLite UI Future work Conclusions

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 3 Introduction Provide site admin-centric monitoring –simplify grid resources operations Enable better resource availability & reliability –issue notifications as soon as problem appears Achieve sensor’s dependencies –only relevant notifications are issued –enables problem isolation Report generation –availability, problem history Use well known, widely accepted system

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 4 Nagios-based Grid Monitoring Monitoring CRO-GRID Infrastructure (2004-2006) –Globus Toolkit Pre-WS & WS, UNICORE, other services –active recovery of services –http://www.cro-ngi.hrhttp://www.cro-ngi.hr Monitoring EGEE resources in Central Europe (CE) –core services since mid 2006 –all CE sites for 1st line support since September 2006 –http://nagios.ce-egee.orghttp://nagios.ce-egee.org

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 5 Nagios-based Grid Monitoring Grid Services Monitoring (GSM) WG –site monitoring prototype, mid 2007 –packaging, standard probe format, … –http://crnjak.srce.hr/nagios (egee.srce.hr)http://crnjak.srce.hr/nagios Operations Automation Team –part of the high level strategy –continued development and integration

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 6 Site Monitoring Prototype … Site nodes Site BDII CESELFC MyProxy Refresh proxy Get VOMS proxy Service checks Get remote results Probe descriptions … Get site’s & nodes information Get nodes information Live node checks Site admins Get site status Issue alarms Monitoring server Publish results

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 7 Standard Grid Probes Probes for monitoring grid services –reusable in any monitoring framework –Grid Monitoring Probes Specification –https://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringProbeSpec ificationhttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringProbeSpec ification $ MyProxy-probe -u se1-egee.srce.hr \ -m hr.srce.MyProxy-CertLifetime serviceType: MyProxy metricName: hr.srce.MyProxy-CertLifetime metricStatus: OK timestamp: 2008-04-23T22:48:08Z summaryData: Certificate will expire in 113.87 days (Aug 15 19:34:35 2008 GMT). serviceURI: se1-egee.srce.hr gatheredAt: crnjak.srce.hr EOT

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 8 Standard Grid Probes Run by Nagios server –WLCG probe wrapper (check_wlcg) Atomic checks of grid services –e.g. transfer file via SRM, store MyProxy, check certificate lifetime Three sets of standard probes –SRCE –CERN –OSG

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 9 Credential Management Provides credentials for standard probes Based on MyProxy certificate –host certificate-based MyProxy certificates –must be renewed periodically by site admin Run by cronjob –result reported as passive check –hr.srce.GridProxy-Get MyProxy certificate lifetime check –sends expiration warning –hr.srce.MyProxy-ProxyLifetime

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 10 Remote Gatherers Gather results from other monitoring systems –Grid Monitoring Data Exchange Standard –https://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringDataExcha ngeStandardhttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringDataExcha ngeStandard OK 2008-04-24T01:44:03Z......

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 11 Remote Gatherers Run by Nagios server –check gathers all results and import to Nagios –SAM-Gather (check_sam), NPM-Gather (check_npm) Results are imported as passive checks Two external monitoring systems –SAM –ENOC DownCollector

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 12 MSG Bridge Publish results from Nagios –enables integration with external systems –e.g. Nagios to Nagios communication A Message System for Grids –Apache ActiveMQ –https://twiki.cern.ch/twiki/bin/view/LCG/MessagingSystemforGridhttps://twiki.cern.ch/twiki/bin/view/LCG/MessagingSystemforGrid

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 13 Nagios Config Generator Generates Nagios configuration Uses multiple information sources –SAM, BDII, active heuristic checks, admin’s rules –special logic for aliases and load balancing nodes Modular approach –plugging in additional information sources –integration with other monitoring systems (e.g. LEMON)

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 14 Nagios Config Generator Probe description database –frequencies, timeouts, arguments needed, dependencies –part of NCG Service dependencies –alarm masking –hierarchy of probes  simple probes more often (e.g. 5 min)  heavyweight probes less often (e.g. 30 min)

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 15 NCG Objects Services –high level Grid services –e.g. CE, SE, VOMS, LFC –mapped to Nagios hostgroups Metric sets –concrete low-level services –e.g. BDII, GRAM Gatekeeper, GridFTP, SRMv1, DPNS –mapped to Nagios servicegroups Metrics –metrics from standard probes, remote results –e.g. hr.srce.GRAM-CertLifetime, hr.srce.GridProxy-Get, CE-sft- job-OPS –mapped to Nagios services

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 16 Service Types Local –standard grid probes (active) Remote –results gathered by remote gatherers (passive) –provided links to external interfaces and documentation Native –native Nagios checks (active) –provided links to documentation

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 17 Service Types Mapped to Nagios servicegroups –local, remote (sam, npm), native Service dependencies –e.g. service from standard grid probes depend on hr.srce.GridProxy-Valid, remote probes depend on gatherers SAM-Gather, NPM-Gather

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 18 NCG Configuration Apache HTTP structure Detailed configuration for each module Modules’ documentation is provided in modules –perldoc # global variable, loads env variable SITENAME = ${SITE_NAME} MYPROXY_SERVER=${MYPROXY_SERVER} GLITE_VERSION=$GLITE_VERSION PROBES_TYPE=all # NRPE_UI=nrpe.srce.hr

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 19 Remote gLite UI Avoid installation of grid middleware on Nagios server

20 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 20 Future Work NCG cache –keep old info when nodes are down GOCDB integration –region, site and services information –site personnel information –scheduled downtimes NRPE on service nodes –monitoring local components (logs, processes, etc.) Bidirectional MSG bridge –receive information from external systems

21 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 21 Conclusions Nagios –highly configurable monitoring framework with notifications, service dependencies, … –widely used by site admins Grid extensions –integration with existing infrastructure (user certificates, VOMS, GOCDB, SAM) –probes for key grid services –GSM WG specifications key for integration Nagios @ grid –enables sites’ better availability –admins get only relevant notifications

22 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 WLCG Collaboration Workshop / Nagios for Grid Services 22 Thank You! Questions? https://twiki.cern.ch/twiki/bin/view/LCG/GridServiceMonitoringInfo


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,"

Similar presentations


Ads by Google