Project Status David Britton,15/Dec/08.. 2 Outline Programmatic Review Outcome CCRC08 LHC Schedule Changes Service Resilience CASTOR Current Status Project.

Slides:



Advertisements
Similar presentations
NANPA Oversight Working Group Status Report to the NANC April 17, 2001 Chair Pat Caldwell.
Advertisements

Addition Facts
D. Britton GridPP Status - ProjectMap 8/Feb/07. D. Britton08/Feb/2007GridPP Status GridPP2 ProjectMap.
GridPP22 – Service Resilience and Disaster Planning David Britton, 1/Apr/09.
GridPP Status David Britton, 3/Sep/ /03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.
S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Oversight Committee Meeting.
GridPP4 – Revised Plan Implementing the PPAN recommendations.
1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.
Hardware Reliability at the RAL Tier1 Gareth Smith 16 th September 2011.
Tony Doyle - University of Glasgow GridPP EDG - UK Contributions Architecture Testbed-1 Network Monitoring Certificates & Security Storage Element R-GMA.
Storage Review David Britton,21/Nov/ /03/2014 One Year Ago Time Line Apr-09 Jan-09 Oct-08 Jul-08 Apr-08 Jan-08 Oct-07 OC Data? Oversight.
Tony Doyle Executive Summary, PPARC, MRC London, 15 May 2003.
RAL Tier1 Operations Andrew Sansum 18 th April 2012.
RAL Tier1: 2001 to 2011 James Thorne GridPP th August 2007.
Partner Logo Tier1/A and Tier2 in GridPP2 John Gordon GridPP6 31 January 2003.
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
GridPP4 Oversight Committee Meeting 4th February 2010
GridPP4 Project Management Pete Gronbech April 2012 GridPP28 Manchester.
David Britton, 28/May/ TeV Collisions 27 km circumference m 8.36 Tesla SC dipoles 8000 cryomagnets 40,000 tons of metal at -271c 700,000L.
1Oxford eSc – 1 st July03 GridPP2: Application Requirement & Developments Nick Brook University of Bristol ALICE Hardware Projections Applications Programme.
GridPP: Executive Summary Tony Doyle. Tony Doyle - University of Glasgow Oversight Committee 11 October 2007 Exec 2 Summary Grid Status: Geographical.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP Oversight Committee Meeting.
Level 1 Components of the Project. Level 0 Goal or Aim of GridPP. Level 2 Elements of the components. Level 2 Milestones for the elements.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP24 Collaboration Meeting.
Tony Doyle GridPP2 Proposal, BT Meeting, Imperial, 23 July 2003.
05/07/00LHCb Computing Model Meeting (CERN) LHCb(UK) Computing Status Glenn Patrick Prototype (Tier 1) UK national computing centre Bid to Joint.
Service Data Challenge Meeting, Karlsruhe, Dec 2, 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Plans and outlook at GridKa Forschungszentrum.
VOORBLAD.
GridPP From Prototype to Production David Britton 21/Sep/06 1.Context – Introduction to GridPP 2.Performance of the GridPP/EGEE/wLCG Grid 3.Some Successes.
15 May 2006Collaboration Board GridPP3 Planning Executive Summary Steve Lloyd.
GridPP News NeSC opening “Media” dissemination Tier 1/A hardware Web pages Collaboration meetings Nick Brook University of Bristol.
Nick Brook University of Bristol The LHC Experiments & Lattice EB News Brief overview of the expts  ATLAS  CMS  LHCb  Lattice.
2 GridPP2 Budget David Britton, 4/12/03 Imperial College.
Addition 1’s to 20.
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Week 1.
Weekly Attendance by Class w/e 6 th September 2013.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
D. Britton GridPP Status - ProjectMap 22/Feb/06. D. Britton22/Feb/2006GridPP Status GridPP2 ProjectMap.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
GridPP Status Report David Britton, 15/Sep/09. 2 Introduction Since the last Oversight: The UK has continued to be a major contributor to wLCG A focus.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
GridPP23 – Final Steps to Data David Britton, 8/Sep/09.
GridPP3 project status Sarah Pearce 14 April 2010 GridPP24 RHUL.
John Gordon STFC-RAL Tier1 Status 9 th July, 2008 Grid Deployment Board.
GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.
Project Management Sarah Pearce 3 September GridPP21.
11 March 2008 GridPP20 Collaboration meeting David Britton - University of Glasgow GridPP Status GridPP20 Collaboration Meeting, Dublin David Britton,
GridPP Presentation to AstroGrid 13 December 2001 Steve Lloyd Queen Mary University of London.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow PPAP Community Meeting Imperial,
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
GridPP3 project status Sarah Pearce 24 April 2010 GridPP25 Ambleside.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
UK Tier 1 Centre Glenn Patrick LHCb Software Week, 28 April 2006.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridP36 Collaboration Meeting.
17 September 2004David Foster CERN IT-CS 1 Network Planning September 2004 David Foster Networks and Communications Systems Group Leader
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Presentation transcript:

Project Status David Britton,15/Dec/08.

2 Outline Programmatic Review Outcome CCRC08 LHC Schedule Changes Service Resilience CASTOR Current Status Project Management Feedback from the last Oversight Committee Forward Look 31/03/2014

3 Programmatic Review 31/03/2014 The programmatic review recommended a 5% cut to GridPP: Although ALICE and LHCb ultimately rescued, the cut was still imposed. However, there was a silver lining: Bottom Line: GridPP3 reduced by £1.24m on top of the £1.20m removed from GridPP2 noted at the last OC.

4 Funding Cut 31/03/2014 Savings of £1.24m achieved by: –Planned and unplanned late starts to a number of GridPP3 posts. –Reduction in Tier-1 hardware to reflect changes imposed by the programmatic review (LHCb and BaBar). –Re-costing of hardware based on the 2007 procurement. –A reduction in the budget line for the second tranche of Tier-2 hardware, consistent with the reduction in Tier-1 hardware. –Reduction in travel and miscellaneous spending. New plan presented to STFC in July 08; Updated in GridPP- PMB-133-Resources.doc

5 CCRC08 The Combined Computing Readiness Challenge took place in two phases, February and May Largely successful for all experiments. 31/03/2014

6 LHC Schedule Current indications are: - Machine cold in June. - First beams in July. - Collisions at some point later. - Plans may change! Consequences on GridPP - Capacity and services need to be ready in June. - Meanwhile many exercises (MC productions, Cosmics re-processings, Analysis challenges) to keep things busy and stress the system. - Prudent to maintain procurement schedule for April 2009 (little downside to this and helps reduce risks). - Opportunity to build on the service quality and resilience.

7 Service Resilience Emphasis over the last year of making the Grid resilient. –Much work on monitoring and alarms. –24 x 7 service initiated. –Extensive work on making the component services more resilient at many levels (see document). Future work on Resilience –Create project-manager overview to keep this active at the PMB level –Provision a back-up link for the OPN (significant cost). –Link to the (evolving) experiment disaster planning (UCL meeting) 31/03/2014

8 CASTOR CASTOR proved unreliable in early 2007 but performed well with the upgrade to for CCRC08. In time for first collisions, an upgrade from to was required in order to maintain a version supported by CERN. This coincided with a move to a resilient RAC Oracle system – combination of upgrades led to instability in August and September. System is now stabilising and the problems have lead to improved communications and management processes. –High load-testing identified as a critical missing step for new releases. –Oracle problems raised to a higher level of awareness in wLCG. –Storage Review at RAL in November. Other Tier-1s have had similar or worse problems with mass storage – a difficult area where effort is underestimated. 31/03/2014

9 Status: Resources 2008 (2007) 31/03/2014 Tier-1 Tier-2 CPU [kSI2k] 4590 (1500) (8588) Disk [TB] 2222 (750 ) 1365 (743) Tape [TB] 2195 (~800) MOU commitments for 2008 met. Combined effort from all Institutions.

10 Global Resource 31/03/2014 Status in Oct 2007: 245 sites, 40,518 CPUs, 24,135 TB storage Status in Dec 2008: 263 sites, 81,953 CPUs, xx,xxx TB storage

11 Current Performance 31/03/2014 Tier-1 Tier-2s Good and improving reliability at the Tier-1 and Tier-2s (but need to move to experiment-specific SAM tests) MOU resources at Tier-1 and Tier-2s delivered in full. Following CCRC08 successes, other exercises continue: eg. CMS Cosmic Reprocessing at the end of November which inadvertently ran (successfully) at 10x the I/O rate (Tier-1 LAN and CASTOR service) for 3.5 days! Although some problems, RAL ~the best Tier-1 for LHCb globally. CMS needs also ~met.

12 Current Performance 31/03/2014 Disk failure rate ~1/working day or ~6% failure rate (twice our assumption). ATLAS hit by two multiple disk failures within a RAID array resulting in data loss. CASTOR and the Oracle RAC upgrade caused considerably instability and ATLAS lost 2 weeks of UK simulated production when the Tier-1 became unavailable to receive data. Database loads are running several times higher than at CERN; this is partly a cost-issue; also partly triggered by the higher than average number of transactions triggered by some ATLAS jobs.

13 Project Map 31/03/2014

14 Project Plan 31/03/2014

15 Feedback from last Oversight Committee 8.1(Disaster recovery) – GridPP-PMB-135-Resilience.doc 8.2(CASTOR) – GridPP-PMB-136-CASTOR.doc 8.3(Documentation) (Certificates) (24x7 Cover) – Now fully operational. 8.6(Experiment Support Posts) – Despite all the cuts we have managed to fund 1-FTE for each of ATLAS, CMS, and LHCb. 31/03/2014

16 Forward Look Move to the new building at RAL. Concentrate on further improving service resilience and engage ATLAS, CMS, LHCb in developing coherent disaster management strategies. Investigate (even more) rigorous certification of CASTOR releases. Recognise global conclusion that mass data storage requires more effort than anticipated. Preparations for GridPP3 took ~20 months: Need to start considering now what happens after GridPP3. 31/03/2014

17 Backup Slides 31/03/2014

18 Job Success Rate 31/03/2014 ATLAS data analysis site tests – Nov

19 Job Efficiencies Efficiency for RAL Tier-1: CPU-Time / Wall-Clock Nov 2008 – Overall efficiency 58% - LHC experiments 83% 31/03/2014

20 Error Messages 31/03/2014