Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.

Slides:



Advertisements
Similar presentations
29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
Advertisements

LCG-France Project Status Fabio Hernandez Frédérique Chollet Fairouz Malek Réunion Sites LCG-France Annecy, May
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
CREAM: Update on the ALICE experiences WLCG GDB Meeting Patricia Méndez Lorenzo (IT/GS) CERN, 11th March 2009.
New VOMS servers campaign GDB, 8 th Oct 2014 Maarten Litmaath IT/SDC.
Patricia Méndez Lorenzo (IT/GS) ALICE Offline Week (18th March 2009)
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
SSC2 and Update on Multi-user Pilot Jobs Framework Mingchao Ma, STFC – RAL HEPSysMan Meeting 20/06/2008.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Deployment GridPP 18, Glasgow, 21 st March 2007 Tony Cass Leader, Fabric Infrastructure.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Experience with the gLite Workload Management.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.
Ian Bird LCG Project Leader LHCC Referee Meeting Project Status & Overview 22 nd September 2008.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
WLCG GDB, CERN, 10th December 2008 Latchezar Betev (ALICE-Offline) and Patricia Méndez Lorenzo (WLCG-IT/GS) 1.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
Grid Operations Centre LCG Accounting Trevor Daniels, John Gordon GDB 8 Mar 2004.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The usage of the gLite Workload Management.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Status of the WMS Salvatore Monforte (INFN.
Procedure to follow for proposed new Tier 1 sites Ian Bird CERN, 27 th March 2012.
WLCG Planning Issues GDB June Harry Renshall, Jamie Shiers.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update Authorization Service Christoph Witzig,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA3 partner collaboration tasks & process.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 5 th August 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite configuration (plans) Robert Harakaly.
Christmas running post- mortem (Part III) ALICE TF Meeting 15/01/09.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Next Steps after WLCG workshop Information System Task Force 11 th February
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
EGEE-II INFSO-RI Enabling Grids for E-sciencE middleware status and plans Claudio Grandi (INFN and CERN) John White.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Current status WMS and CREAM CE deployment Patricia Mendez Lorenzo ALICE TF Meeting (CERN, 02/04/09)
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
CERN IT Department CH-1211 Genève 23 Switzerland t CHEP 2009, Monday 26rd March 2009 (Prague) Patricia Méndez Lorenzo on behalf of the IT/GS-EIS.
WP3 WP3 at Budapest 2/9/2002 Steve Fisher / RAL. WP3 Steve Fisher/RAL - 2/9/2002WP3 at Budapest2 Summary News –EDG Retreat –EDG Tutorials –Quality –Release.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
EGEE is a project funded by the European Union under contract IST Report from the PTF Fabrizio Pacini Datamat S.p.a. Milan, IT-CZ JRA1 meeting,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CREAM: current status and next steps EGEE-JRA1.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
The ALICE Christmas Production L. Betev, S. Lemaitre, M. Litmaath, P. Mendez, E. Roche WLCG LCG Meeting 14th January 2009.
Andreas Unterkircher CERN Grid Deployment
Summary on PPS-pilot activity on CREAM CE
The CREAM CE: When can the LCG-CE be replaced?
1 VO User Team Alarm Total ALICE ATLAS CMS
JRA2 Pisa, Tuesday, 25 October 2005
TCG Discussion on CE Strategy & SL4 Move
WLCG Collaboration Workshop;
The LHCb Computing Data Challenge DC06
Presentation transcript:

Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007

October 7, LCG MB Meeting; 6 th March 2007 Introduction  gLite WMS:  Effort to push into production readiness started July 2006  Performance-wise was ~OK for CSA06  But many ongoing issues of reliability and manageability – prevented from making this the production version and replacing the LCG-RB  Now, also work on gLite porting, and simplifying dependencies mean that CERN team cannot take the responsibility for driving the WMS improvements.  INFN have agreed that this should be their responsibility and that we will agree criteria for taking the WMS back into certification again.  gLite CE:  Assume similar process and define CE criteria

October 7, LCG MB Meeting; 6 th March 2007 WMS performance CMSATLAS Performance 2007 Dress rehearsals Not specified but was 50K jobs/day in CSA06 20K successful jobs/day + analysis load K jobs/day through WMS <10 WMS 100K jobs/day100K <10 WMS Stability Not specified<1 restart of WMS or LB every month (== LCG RB) From discussions with CMS and ATLAS: The numbers for ALICE and LHCb are understood to be within these requirements

October 7, LCG MB Meeting; 6 th March 2007 LCG requirement  Based on these numbers we propose the following as the LCG requirements on the WMS:  Performance:  2007 dress rehearsals: 50K successful jobs/day  2008: 200K successful jobs/day using <10 WMS entry points  Stability:  <1 restart of WMS or LB every month under this load

October 7, LCG MB Meeting; 6 th March 2007 gLite WMS criteria  A single WMS machine should demonstrate submission rates of at least 10K jobs/day sustained over 5 days, during which time the WMS services including the L&B should not need to be restarted. This performance level should be reachable with both bulk and single job submission.  During this 5 day test the performance must not degrade significantly due to filling of internal queues, memory consumption, etc. i.e. the submission rate on day 5 should be the same as that on day 1.  Proxy renewal must work at the 98% level: i.e. <2% of jobs should fail due to proxy renewal problems (the real failure rate should be less because jobs may be retried).  The number of stale jobs after 5 days must be <1%.  The L&B data and job states must be verified:  After a reasonable time after submission has ended, there should be no jobs in "transient" or "cancelled" states  If jobs are very short no jobs should stay in "running" state for more than a few hours  After proxy expires all jobs must be in a final state (Done-Success or Aborted)  For verifying these criteria the test suite written by Andrea and currently used by Simone and Andrea will be taken as the baseline.

October 7, LCG MB Meeting; 6 th March 2007 gLite CE criteria  Performance:  2007 dress rehearsals:  5000 simultaneous jobs per CE node.  50 user/role/submission node combinations (Condor_C instances) per CE node  End 2007:  5000 simultaneous jobs per CE node (assuming same machine as 2007, but expect this to improve)  1 CE node should support an unlimited number of user/role/submission node combinations, from at least 10 VOs, up to the limit on the number of jobs. (might be achieved with 1 Condor_C per VO with user switching done by glexec in blah)  Reliability:  Job failure rates due to CE in normal operation: < 0.5%; Job failures due to restart of CE services or CE reboot <0.5%.  2007 dress rehearsals:  5 days unattended running with performance on day 5 equivalent to that on day 1  End 2007:  1 month unattended running without performance degradation

October 7, LCG MB Meeting; 6 th March 2007 Summary  WMS:  Propose as LCG requirements – clear statement from CMS, but not from ATLAS (yet …)  Discussed with certification team, deployment testers, EIS testers, developers  CE:  Propose these requirements as LCG requirements – based on LCG-CE and deployment experience  Discussed with certification team, deployment testers, and developers  Expect to write similar document for LFC to clarify performance goals