CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t CCRC’08 Tools for measuring our progress CCRC’08 F2F 5 th February 2008 James Casey, IT-GS-MND.

Slides:



Advertisements
Similar presentations
CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Tier-1 Overview Andrew Sansum 21 November Overview of Presentations Morning Presentations –Overview (Me) Not really overview – at request of Tony.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES WLCG operations: communication channels Andrea Sciabà WLCG operations.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t Service Management GLM 15 November 2010 Mats Moller IT-DI-SM.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services GS group meeting Monitoring and Dashboards section Activity.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008.
WLCG Service Requirements WLCG Workshop Mumbai Tim Bell CERN/IT/FIO.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overlook of Messaging.
James Casey, CERN, IT-GT-TOM 1 st ROC LA Workshop, 6 th October 2010 Grid Infrastructure Monitoring.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Grid Monitoring Tools Alexandre Duarte CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Wojciech Lapka SAM Team CERN EGEE’09 Conference,
WLCG Monitoring Roadmap Julia Andreeva, CERN , WLCG workshop, CERN.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
Monitoring for CCRC08, status and plans Julia Andreeva, CERN , F2F meeting, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Ricardo Rocha CERN (IT/GS) EGEE’08, September 2008, Istanbul, TURKEY Experiment.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Ticket review T1 Service Coordination Meeting 2010/10/28.
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
CCRC’08 Monthly Update ~~~ WLCG Grid Deployment Board, 14 th May 2008 Are we having fun yet?
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey WLCG Monitoring – some worked examples.
Visualization Ideas for Management Dashboards
CERN IT Department CH-1211 Geneva 23 Switzerland t A proposal for improving Job Reliability Monitoring GDB 2 nd April 2008.
CERN IT Department CH-1211 Geneva 23 Switzerland t DBES LHC(b) Grid operations Roberto Santinelli IT/ES 5 th User Forum – Uppsala April.
ATP Future Directions Availability of historical information for grid resources: It is necessary to store the history of grid resources as these resources.
4 March 2008CCRC'08 Feb run - preliminary WLCG report 1 CCRC’08 Feb Run Preliminary WLCG Report.
CERN IT Department CH-1211 Genève 23 Switzerland t HEPiX Conference, ASGC, Taiwan, Oct 20-24, 2008 The CASTOR SRM2 Interface Status and plans.
Julia Andreeva on behalf of the MND section MND review.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Monitoring Tools E. Imamagic, SRCE CE.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Deliverable DSA1.4 Jules Wolfrat ARM-9 –
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ian Bird All Activity Meeting, Sofia
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
SRM v2.2 Production Deployment SRM v2.2 production deployment at CERN now underway. – One ‘endpoint’ per LHC experiment, plus a public one (as for CASTOR2).
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
CERN IT Department CH-1211 Geneva 23 Switzerland t James Casey CCRC’08 April F2F 1 April 2008 Communication with Network Teams/ providers.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
CERN IT Department CH-1211 Genève 23 Switzerland t Managing changes - 1 Managing changes Olof Bärring WLCG 2009, 14 th November 2008.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS Section input to GLM For GLM attended by Director for Computing.
CERN IT Department CH-1211 Geneva 23 Switzerland t Michel Jouvin (GRIF/LAL) on behalf of James Casey (CERN) (All materials from J. Casey)
Flexible Availability Computation Engine for WLCG Rajesh Kalmady, Phool Chand, Vaibhav Kumar, Digamber Sonvane, Pradyumna Joshi, Vibhuti Duggal, Kislay.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
CERN IT Department CH-1211 Geneva 23 Switzerland t LHCOPN Meeting Madrid, 11 th March 2008 James Casey WLCG Monitoring – An overview.
Pedro Andrade ACE Status Update Pedro Andrade
Presentation transcript:

CERN IT Department CH-1211 Geneva 23 Switzerland t CCRC’08 Tools for measuring our progress CCRC’08 F2F 5 th February 2008 James Casey, IT-GS-MND

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Overview Tracking the challenge –‘Observations elog’ Measuring MoU response times –‘Logbook elog’ Reconciling the experiment and infrastructure views –CCRC’08 ServiceMap Things to come… –Reporting MoU to the sites 2

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services SC4 Twiki 3

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Problems with the twiki Hard to generate reports from a twiki Statistics extraction is manual –Messages/Incidents per day, per site, … Everyone has to poll –No feeds No categorization No threading Want it to be write-once, read-many –No changing history ! 4

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Solution We believe elog gives us these features Let’s use CCRC’08 to test it –Fallback solution could be a standard blog I’d encourage everyone to use if –Also secretary of CCRC’08 daily meeting will add items of interest that arise …Demo… RSS feed : 5

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MoU response times We’ve agreed to try and measure MoU metrics during CCRC’08 –To evaluate if we can actually do it ! 6 ServiceMaximum delay in responding to operational problemsAverage availability measured on an annual basis Service interruptionDegradation of the capacity of the service by more than 50% Degradation of the capacity of the service by more than 20% During accelerator operation At all other times Acceptance of data from the Tier-0 Centre 12 hours 24 hours99%n/a Networking service to the Tier-0 Centre during accelerator operation 12 hours24 hours48 hours98%n/a Data-intensive analysis services, including networking to Tier-0, Tier-1 Centres 24 hours48 hours 98% All other services – prime service hours 2 hour 4 hours98% All other services – other times 24 hours48 hours 97%

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Response time reporting workflow 7 Site Acknowledge New Problem! Site Fixed VO Confirmed Problem Solved ! :30 – Site Acknowledged – working on it ! :30 – New Problem. VO: Atlas, MoU Area: Distribution of data toTier-1 centres, Site: CERN-PROD - SRM not working :49 – Site Fixed – We’ve found the problem in the endpoint, restarted :43 – VO Confirmed – All working again, thanks ! Problem Report: Issue ID #42 : :30 : MoU Area: CERN-PROD/ Distribution of data to Tier-1 Centres Time to First Response : 1:00 Time to Problem resolved : 1:29 Time to VO confirmation : 2:23

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Measuring MoU availability 8 Experiment Framework/ Dashboard View Operational Testing (SAM/SLS) View “Human” ` View - the control

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Mapping to MoU Services 9 Tier-1 Grid Service ArcCE BDII CE FTS LFC MYPX OSGCE RB RGMA SE SRM SRMv2 VOBOX gCE gRB sBDII MoU Category Acceptance of data from Tier-0 * Networking Services to Tier-0 * Data-intensive analysis service, including networking to Tier-0 All Other Services Map grid services status (from SAM) to MoU categories –These are “custom” service availability calculations Use the CMS SAM portal framework as basis for implementing this –And send results direct to Tier-1 Nagios

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services CMS SAM Portal 10

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services ServiceMap What’s a ServiceMap? –It’s a gridmap with many different maps, showing different aspects of the WLCG infrastructure What’s the CCRC’08 ServiceMap? –Service ‘readiness’ –Service availability For VO critical services –Experiment Metrics A single place to see both the VO and the infrastructure view of the grid 11

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services CCRC’08 ServiceMap …Demo… 12

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services 13 Measure of how ‘production-ready’ a service : –In terms of software, service and deployment Manually edited (under SVN control) by responsibles –EIS team, service managers, deployment team Service Readiness

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Experiment metrics Show the VO view of the infrastructure Two extra ‘maps’ –Reliability (e.g successful data transfer, jobs, …) –Metrics (MB/s, events/s, …) Need interaction with experiments to create these two views Note that this is very similar structure to MoU view –perhaps we merge the two, and report to sites on this structure ?

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Summary CCRC’08 is a good opportunity to try some new operational tools –And evaluated them in a ‘real-world’ mode The CCRC’08 ServiceMap seems to give a useful view of the grid –Need to iterate on what is useful to show –And fill in the white spaces… Next Steps –MoU calculation and reporting to sites Feedback on all the tools welcome ! 15

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Links to tools CCRC’08 ServiceMap CCRC’08 Observations logbook RSS feed : logger.cern.ch/elog/CCRC'08+Observations/elog.rdfhttps://prod-grid- logger.cern.ch/elog/CCRC'08+Observations/elog.rdf Reponse tracking logbook RSS feed : logger.cern.ch/elog/CCRC'08+Logbook/elog.rdfhttps://prod-grid- logger.cern.ch/elog/CCRC'08+Logbook/elog.rdf Presentation title - 16