EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE.

Slides:



Advertisements
Similar presentations
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Site Monitoring for Grid Services WLCG Grid.
Advertisements

HPDC 2007 / Grid Infrastructure Monitoring System Based on Nagios Grid Infrastructure Monitoring System Based on Nagios E. Imamagic, D. Dobrenic SRCE HPDC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Simply monitor a grid site with Nagios J.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
INFSO-RI Enabling Grids for E-sciencE Experience with monitoring of Prague T2 site Tomáš Kouba NEC 2007, Varna, Bulgaria
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Introduction to GILDA and gaining access.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks State of Interoperability Laurence Field.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Feedback on SAM from SA1 site representatives.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stuart Kenny and Stephen Childs Trinity.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The GILDA training infrastructure.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The GILDA t-Infrastructure Roberto Barbera.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI How to integrate portals with the EGI monitoring system Dusan Vudragovic.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
SAM Database and relation with GridView Piotr Nyczyk SAM Review CERN, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite – UNICORE interoperability Daniel Mallmann.
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
OSG Status and Rob Gardner University of Chicago US ATLAS Tier2 Meeting Harvard University, August 17-18, 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
1 Grid Service Monitoring James Casey, CERN IT-GD WLCG/OSG Operations Meeting 14th June 2007.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations automation team presentazione.
RSV and Nagios in OSG Rob Quick. March 11, 2008 USCMS Tier-2 Workshop 2 Current State of OSG ~ 100 Sites ~ 30 VOs April 8th:  216,000 jobs (85% successful)
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
Monitoring Working Group Update Grid Deployment Board 5 th December, CERN Ian Neilson.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks New WLCG Grid Service Monitoring Displays.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the SAM/Nagios/GSTAT Components.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
NGI and Site Nagios Monitoring
Use of Nagios in Central European ROC
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Patricia Méndez Lorenzo ALICE Offline Week CERN, 13th July 2007
Monitoring in EGEE Automatisierung & Regionalisierung im Hinblick auf EGI Torsten Antoni (KIT), James Casey (CERN), Sabine Reißer (KIT)
Kashif Mohammad Deputy Technical Co-ordinator (South Grid) Oxford
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios for Grid Services E. Imamagic, SRCE WLCG Collaboration Workshop

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 2 Overview Introduction Architecture Standard grid probes Remote gatherers Credential management Nagios Config Generator Web interface Installation procedure Future work Conclusions

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 3 Introduction Provide site admin-centric monitoring –simplify grid resources operations Enable better resource availability –issue notifications as soon as problem appears Achieve complex sensor’s dependencies –enables problem isolation –only relevant notifications are issued Visualization & management interface –grid resources status Report generation –availability, problem history

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 4 Nagios-based Grid Monitoring Monitoring CRO-GRID Infrastructure ( ) –Globus Toolkit Pre-WS & WS, UNICORE, other services –active recovery of services – Monitoring EGEE resources in Central Europe (CE) –core services since mid 2006 –all CE sites for 1st line support since September 2006 – Grid Services Monitoring (GSM) WG –site monitoring prototype, mid 2007 – (egee.srce.hr) – (CERN-PPS)

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 5 Site Monitoring Prototype … Site nodes Site BDII CESELFC MyProxy Refresh proxy Get VOMS proxy Service checks Get remote results Probe descriptions … Get site’s & nodes information Get nodes information Live node checks Site admins Get site status Issue alarms Monitoring server

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 6 Standard Grid Probes Probes for monitoring grid services –reusable in any monitoring framework –can provide multiple metrics –Grid Monitoring Probes Specification – ificationhttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringProbeSpec ification $ MyProxy-probe -u se1-egee.srce.hr \ -m hr.srce.MyProxy-CertLifetime serviceType: MyProxy metricName: hr.srce.MyProxy-CertLifetime metricStatus: OK timestamp: T22:48:08Z summaryData: Certificate will expire in days (Aug 15 19:34: GMT). serviceURI: se1-egee.srce.hr gatheredAt: crnjak.srce.hr EOT

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 7 Standard Grid Probes Run by Nagios server –WLCG probe wrapper (check_wlcg) –local probes Simple atomic checks of grid services –e.g. transfer file via SRM, store MyProxy, check certificate lifetime Three sets of standard probes –SRCE, CERN, OSG

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 8 Remote Gatherers Gather results from other monitoring systems –Grid Monitoring Data Exchange Standard – ngeStandardhttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringDataExcha ngeStandard OK T01:44:03Z......

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 9 Remote Gatherers Run by Nagios server –check gathers all results and import to Nagios –SAM-Gather (check_sam), NPM-Gather (check_npm) Results are imported as passive checks –remote probes Two external monitoring systems –SAM –ENOC DownCollector

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 10 Credential Management Provides credentials for standard probes Based on MyProxy certificate –password-based MyProxy certificates –must be renewed periodically Run by Nagios –like standard grid probe –hr.srce.GridProxy-Get (refresh_proxy) MyProxy certificate lifetime check –issues expiration warning –hr.srce.MyProxy-ProxyLifetime

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 11 Nagios Config Generator Generates Nagios configuration Uses multiple information sources –SAM, BDII, active heuristic checks, user definitions –special logic for aliases and load balancing nodes

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 12 NCG Objects Services –high level Grid services –e.g. CE, SE, VOMS, LFC –mapped to Nagios hostgroups Metric sets –concrete low-level services –e.g. BDII, GRAM Gatekeeper, GridFTP, SRMv1, DPNS –mapped to Nagios servicegroups Metrics –metrics from standard probes, remote results –e.g. hr.srce.GRAM-CertLifetime, hr.srce.GridProxy-Get, CE-sft- job-OPS –mapped to Nagios services

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 13 Metric Types Local –standard grid probes (active) Remote –results gathered by remote gatherers (passive) –provided links to external interfaces and documentation Native –native Nagios checks (active)

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 14 Metric Types Mapped to Nagios servicegroups –local, remote (sam, npm), native Metric dependencies –e.g. metrics from standard grid probes depend on hr.srce.GridProxy-Get, remote probes depend on gatherers SAM-Gather, NPM-Gather

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 15 NCG Modules Modular approach –plugging in additional information sources –integration with other monitoring systems (e.g. LEMON) Three phases –site information (host-service mappings) –metrics information (host-metric sets/metrics mappings) –configuration generation

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 16 NCG Modules ModuleAvailable modules Description NCG::SiteInfoSAM, BDII, FileGets list of tuples (host, service, VO) NCG::LocalMetricSetsHash, FileGets list of tuples (host, metric set) NCG::LocalMetricsAttrsActive, File, LDAPGets data needed for local probes (e.g. ports, SE paths, service URIs) and filters out metric sets.

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 17 NCG Modules ModuleAvailable modules Description NCG::LocalMetricsHash, FileGets list of tuples (host, local & native metric) NCG::RemoteMetricsSAM, NPMGets list of tuples (host, remote metric) NCG::LocalRulesFileModule applies local rules on gathered data NCG::ConfigGenNagiosGenerates configuration.

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 18 NCG Modules Configuration –Apache HTTP structure –detailed configuration for each module –modules’ documentation is provided in modules (perldoc) # global variable, loads env variable SITENAME = ${SITE_NAME} MYPROXY_SERVER=${MYPROXY_SERVER} GLITE_VERSION=$GLITE_VERSION PROBES_TYPE=all # NRPE_UI=nrpe.srce.hr

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 19 User-defined Rules Static file definitions –possible in each phase –useful for nodes outside of information systems (e.g. test nodes) –NCG::*::File modules # NCG::SiteInfo::File # add non-VO-dependent service to host #HOST_SERVICE!myhost.srce.hr!CE # add service for defined VO to host #HOST_SERVICE_VO!myhost.srce.hr!CE!ops # NCG::LocalMetricSets::File SERVICE_METRICSET!CE!TestMetricSet # NCG::LocalMetrics::File # association between metric and metricset METRICSET_METRIC!TestMetricSet!TestMetric...

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 20 User-defined Rules Rules –rules are applied before generation –module NCG::LocalRules # remove the host REMOVE_HOST!host # remove the host/service REMOVE_SERVICE!host!service # add load balancing node ADD_LB!host!node # remove load balancing node REMOVE_LB!host!node # add contact to all hosts # add contact to all hosts

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 21 Remote gLite UI Execute standard grid probes on existing gLite UI –WLCG_UI –avoid installation of grid middleware on Nagios server –use Nagios Remote Plugin Executor (NRPE) … Site nodes Site BDII CESELFC Service checks

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 22

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 23

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 24

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 25 SAM Standard probes NPM Native probes

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 26

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 27 Installation Procedure Ncghttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoring Ncg 1.Request access to SAM DB –"WLCG Grid Services Monitoring Profile“ –if your site is in SAM and want to use remote probes 2.Install NCG package –grid-monitoring-config-gen – sitoryhttps://twiki.cern.ch/twiki/bin/view/LCG/GridMonitoringYumRepo sitory 3.For remote probes –install package grid-monitoring-fm-nagios-remote

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 28 Installation Procedure 4.For local probes –find suitable MyProxy server & generate MyProxy certificate 1.for Nagios server –install gLite UI on Nagios server –install packages on Nagios server: grid-monitoring-fm-nagios- local, grid-monitoring-probes-* 2.for remote gLite UI –install NRPE on remote gLite UI –install packages on remote gLite UI: grid-monitoring-fm-nagios- local, grid-monitoring-probes-* 5.Generate configuration with NCG 6.Start Nagios

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 29 Future Work NCG development –providing configuration for multiple sites (regional monitoring) –providing configuration for multiple VOs Integration with global monitoring systems –ActiveMQ messaging system –Operations Automation Team mandate Credential management alternatives –passwordless MyProxy certificates (-Z) –certificate-based MyProxy certificates (-R) –custom credential management

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 30 Conclusions Nagios –highly configurable monitoring framework with notifications, service dependencies, … –widely used by site admins Grid extensions –integration with existing infrastructure (user certificates, VOMS, GOCDB, SAM) –probes for key grid services –GSM WG specifications key for integration grid (EGEE) –enables sites’ better availability –admins get only relevant notifications

Enabling Grids for E-sciencE EGEE-II INFSO-RI WLCG Collaboration Workshop / Nagios for Grid Services 31 Thank You! Questions?