March Availability Report for EGEE Sites based on Nagios

Slides:



Advertisements
Similar presentations
New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
Advertisements

Marian Babik, Luca Magnoni SAM Test Framework. Outline  SAM Test Framework  Update on Job Submission Timeouts  Impact of Condor and direct CREAM tests.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
02/07/09 1 WLCG NAGIOS Kashif Mohammad Deputy Technical Co-ordinator (South Grid) University of Oxford.
ARC Accounting John Gordon. Limitations Resilience – Religious objection to using the BDII for service discovery so only one message broker is hardcoded.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Service Availability Monitoring – Status.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
Automatic Resource & Usage Monitoring Steve Traylen/Flavia Donno CERN/IT.
RSV: OSG Grid Fabric Monitoring and Interoperation with WLCG Monitoring Systems Rob Quick, Arvind Gopu, and Soichi Hayashi Computing in High Energy and.
Julia Andreeva on behalf of the MND section MND review.
Validation of SAM3 monitoring data (availability & reliability of services) Ivan Dzhunov, Pablo Saiz (CERN), Elena Tikhonenko (JINR, Dubna) April 11, 2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Vendredi 19 février 2016 CIC portal development status and TODO list Gilles Mathieu, Osman Aidel, Cyril L’Orphelin IN2P3/CNRS Computing Centre, Lyon, France.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
Open Science Grid OSG Resource and Service Validation and WLCG SAM Interoperability Rob Quick With Content from Arvind Gopu, James Casey, Ian Neilson,
Computation of Service Availability Metrics in Gridview Digamber Sonvane, Rajesh Kalmady, Phool Chand, Kislay Bhatt, Kumar Vaibhav Computer Division, BARC,
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Probes Requirement Review OTAG-08 03/05/ Requirements that can be directly passed to EMI ● Changes to the MPI test (NGI_IT)
1 Grid Service Monitoring James Casey, CERN IT-GD WLCG/OSG Operations Meeting 14th June 2007.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI 2 nd level support training Marian Babik, David Collados, Wojciech Lapka,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Update on Service Availability Monitoring (SAM) Marian Babik, David Collados,
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
RSV and Nagios in OSG Rob Quick. March 11, 2008 USCMS Tier-2 Workshop 2 Current State of OSG ~ 100 Sites ~ 30 VOs April 8th:  216,000 jobs (85% successful)
Best 20 jobs jobs sites.
Flexible Availability Computation Engine for WLCG Rajesh Kalmady, Phool Chand, Vaibhav Kumar, Digamber Sonvane, Pradyumna Joshi, Vibhuti Duggal, Kislay.
ALICE WLCG operations report Maarten Litmaath CERN IT-SDC ALICE T1-T2 Workshop Torino Feb 23, 2015 v1.2.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Metrics Portal Development Update.
SRM v2.2: service availability testing and monitoring SRM v2.2 deployment Workshop - Edinburgh, UK November 2007 Flavia Donno IT/GD, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operational Tools M2 Update James Casey.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the SAM/Nagios/GSTAT Components.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MyEGEE David Horat (
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Current status of ARC integration within monitoring, accounting, and EGI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Nagios Grid Monitor E. Imamagic, SRCE OAT.
CERN IT Department CH-1211 Geneva 23 Switzerland t LHCOPN Meeting Madrid, 11 th March 2008 James Casey WLCG Monitoring – An overview.
Daniele Bonacorsi Andrea Sciabà
Monitoring Evolution and IPv6
James Casey, CERN IT-GD WLCG Workshop 1st September, 2007
EGI Operations Management Board
PPS All sites Meeting: - CODs and PPS - Monitoring Tools
NGI and Site Nagios Monitoring
POW MND section.
Pedro Andrade ACE Status Update Pedro Andrade
Short term improvements to the Information System: a status report
Operational Tools Update OMB 27/07/2010
Introduction to OAT presentations
Evolution of SAM in an enhanced model for monitoring the WLCG grid
Regional Grid Monitoring - timeline
Security Monitoring in a Nagios world
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Grid Service Monitoring Working Group
Maite Barroso, SA1 activity leader CERN 27th January 2009
MPI probes OMB Meeting 26th February 2013
Status of MC production on the grid
Monitoring in EGEE Automatisierung & Regionalisierung im Hinblick auf EGI Torsten Antoni (KIT), James Casey (CERN), Sabine Reißer (KIT)
Complete the missing numbers using the inverse.
Operational Tools & Middleware Versions Monitoring
UMD 2 / EMI 2 Decommissioning Status
UMD 2 Decommissioning Status
UMD 2 Decommissioning Status
Kashif Mohammad Deputy Technical Co-ordinator (South Grid) Oxford
EGEE Operation Tools and Procedures
Site availability Dec. 19 th 2006
Nagios with The Decision Engine Implementing Passive Checks
Presentation transcript:

March Availability Report for EGEE Sites based on Nagios James Casey, David Collados

SAM and Nagios in WLCG At Current Moment Calculation of site availability for sites done in SAM March Availability computations Parallel computations based on SAM & Nagios probes Equivalent metrics for CE, SRMv2 & sBDII Using same algorithm (gridview)

March - Nagios Based Report March Availability reports for EGEE sites Official SAM report: https://edms.cern.ch/file/963325/1/EGEE_Mar2010.pdf Unofficial Nagios report: https://edms.cern.ch/file/963325/1/Unofficial_Nagios_EGEE_Mar2010.pdf 315 EGEE sites in the report Sites whose availability changed > 10% 40 = 12.7% Sites whose availability increased > 10% 19 = 6.0% Sites whose availability decreased > 10% 21 = 6.7%

March - Nagios Based Report Sites whose availability decreased > 10% 21 = 6.7% 12 failed due to messaging brokers discovery, now OK 5 due to timeouts in job submit or missing libraries on WNs, now 4 OK, 1 fails 4 failing sBDII tests, now 2 OK, 2 fail Differences understood and corrected in Nagios Sites can/should check their current Nagios status In the GridView Nagios portal: http://gvdev.cern.ch/NAGIOS/same_index.php Or in their corresponding ROC Nagios or MyEGEE instances: https://twiki.cern.ch/twiki/bin/view/EGEE/NagiosROCURL#Production_installations_Nagios And report any issue through the Nagios Support Unit in GGUS

April - Availability Official availability reports for April will not change Still calculated based on SAM probe results We will also generate availability reports based on Nagios results Continue the validation of Nagios availability by sites and project Sites should check their current Nagios status and report any issue