WLCG Service Schedule June 2007.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
WLCG Production Services using EGEE Infrastructure Grid Operations Workshop Stockholm, June 2007.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Deployment GridPP 18, Glasgow, 21 st March 2007 Tony Cass Leader, Fabric Infrastructure.
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
WLCG Service Report ~~~ WLCG Management Board, 1 st September
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Alberto Aimar CERN – LCG1 Reliability Reports – May 2007
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Ian Bird LCG Project Leader WLCG Collaboration Issues WLCG Collaboration Board 24 th April 2008.
WLCG Service Report ~~~ WLCG Management Board, 16 th December 2008.
SC4 Planning Planning for the Initial LCG Service September 2005.
Report from the WLCG Operations and Tools TEG Maria Girone / CERN & Jeff Templon / NIKHEF WLCG Workshop, 19 th May 2012.
WLCG Planning Issues GDB June Harry Renshall, Jamie Shiers.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
4 March 2008CCRC'08 Feb run - preliminary WLCG report 1 CCRC’08 Feb Run Preliminary WLCG Report.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
WLCG Service Schedule LHC schedule: what does it imply for SRM deployment? WLCG Storage Workshop CERN, July 2007.
The Worldwide LHC Computing Grid Introduction & Housekeeping Collaboration Workshop, Jan 2007.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 5 th August 2008.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
Criteria for Deploying gLite WMS and CE Ian Bird CERN IT LCG MB 6 th March 2007.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
LCG Service Challenge: Planning and Milestones
gLite->EMI2/UMD2 transition
Database Readiness Workshop Intro & Goals
Update on Plan for KISTI-GSDC
Olof Bärring LCG-LHCC Review, 22nd September 2008
Summary from last MB “The MB agreed that a detailed deployment plan and a realistic time scale are required for deploying glexec with setuid mode at WLCG.
Data Management cluster summary
LHC Data Analysis using a worldwide computing grid
WLCG Collaboration Workshop: Outlook for 2009 – 2010
The LHCb Computing Data Challenge DC06
Presentation transcript:

WLCG Service Schedule June 2007

Agenda The machine The experiments The service

LHC commissioning - CMS June /6/2007 LHC Schedule.. Operation testing of available sectors Mar. Apr. May Jun. Jul. Aug. Oct. Sep. Nov. Dec. Jan. Apr. Feb. Mar. May Jun. Jul. Aug. Sep. Nov. Dec. Oct. Mar. Apr. May Jun. Jul. Aug. Oct. Sep. Nov. Dec. Jan. Apr. Feb. Mar. May Jun. Jul. Aug. Sep. Nov. Dec. Oct Machine Checkout Beam Commissioning to 7 TeV Consolidation Interconnection of the continuous cryostat Leak tests of the last sub-sectors Inner Triplets repairs & interconnections Global pressure test &Consolidation Flushing Cool-down Warm up Powering Tests

LHC commissioning - CMS June /6/ LHC Accelerator schedule

LHC commissioning - CMS June /6/ LHC Accelerator schedule

Machine Summary No engineering run in 2007 Startup in May 2008 “…we aim to be seeing high energy collisions by the summer.”

Experiments Continue preparations for ‘Full Dress Rehearsals’ Schedule from CMS is very clear: – CSA07 runs September 10 for 30 days – Ready for cosmics run in November – Another such run in March ALICE have stated FDR from November Bottom line: continuous activity – post CHEP likely to be (very) busy

CERN, June 26,2007Software & Computing Workshop8 Event sizes We already needed more hardware in the T0 because –In the TDR there was no full ESD copy to BNL included –Transfers require more disk servers than expected  10% less disk space in CAF From TDR: RAW=1.6 MB, ESD=0.5 MB, AOD=0.1 MB –5-day buffer at CERN  127 TByte –Currently 50 disk servers  300 TByte OK for buffer For Release 13: RAW=1.6 MB, ESD=1.5 MB, AOD=0.23 MB (incl. trigger&truth) –2.2  3.3 MB = 50% more at T0 –3 ESD, 10 AOD copies: 4.1  8.4 MB = factor 2 more for exports –More disk servers needed for T0 internal and exports  40% less disk in CAF –Extra tapes and drives  25% cost increase –Have to be taken away from CAF again Also implications for T1/2 sites –Can store 50% less data  Goal: run this summer 2 weeks uninterrupted at nominal rates with all T1 sites

CERN, June 26,2007Software & Computing Workshop9 Tier-1 SiteEfficiency (%) Average Thruput (MB/s) Nominal Rate (MB/s) 50% of nominal achieved 100% of nominal achieved 150% of nominal achieved 200% of nominal achieved ASGC0060 BNL CNAF FZK Lyon NDGF PIC RAL00100 SARA Triumf ATLAS T0  T1 Exports situation at May 28/

Services Q: What do you (CMS) need for CSA07? A: Nothing – would like FTS 2.0 at Tier1s (and not too late) but not required for CSA07 to succeed Other major ‘residual service’: SRM v2.2 Windows of opportunity: post CSA07, early 2008  No long shutdown end 2008

S.W.O.T. Analysis of WLCG Services StrengthsWe do have a service that is used, albeit with a small number of well known and documented deficiencies (with work-arounds) WeaknessesContinued service instabilities; holes in operational tools & procedures; ramp-up will take at least several (many?) months more… ThreatsHints of possible delays could re-ignite discussions on new features OpportunitiesMaximise time remaining until high-energy running to: 1.) Ensure all remaining residual services are deployed as rapidly as possible, but only when sufficiently tested & robust; 2.) Focus on smooth service delivery, with emphasis on improving all operation, service and support activities. All services (including ‘residual’) should be in place no later than Q1 2008, by which time a marked improvement in the measurable service level should also be achievable.

LCG Steep ramp-up still needed before first physics run 4x 6x Evolution of installed capacity from April 06 to June 07 Target capacity from MoU pledges for 2007 (due July07) and 2008 (due April 08)

WLCG Service: S / M / L vision Short-term: ready for Full Dress Rehearsals – now expected to fully ramp-up ~mid-September (>CHEP) The only thing I see as realistic on this time-frame is FTS 2.0 services at WLCG Tier0 & Tier1s Schedule: June 18 th at CERN; available mid-July for Tier1s Medium-term: what is needed & possible for 2008 LHC data taking & processing The remaining ‘residual services’ must be in full production mode early Q at all WLCG sites! Significant improvements in monitoring, reporting, logging  more timely error response  service improvements Long-term: anything else The famous ‘sustainable e-Infrastructure’… ? WLCG Service Deployment – Lessons Learnt 14

WLCG Service Deployment – Lessons Learnt 15

Types of Intervention 0. (Transparent) – load balanced servers / (ices) 1. Infrastructure: power, cooling, network 2. Storage services: CASTOR, dCache 3. Interaction with backend DB: LFC, FTS, VOMS, SAM etc.

EGI Preparation Meeting, Munich, March Transparent Interventions - Definition Have reached agreement with the LCG VOs that the combination of hardware / middleware / experiment-ware should be resilient to service “glitches”  A glitch is defined as a short interruption of (one component of) the service that can be hidden – at least to batch – behind some retry mechanism(s)  How long is a glitch? All central CERN services are covered for power ‘glitches’ of up to 10 minutes Some are also covered for longer by diesel UPS but any non-trivial service seen by the users is only covered for 10’ Can we implement the services so that ~all interventions are ‘transparent’? YES – with some provisos to be continued…

More Transparent Interventions I am preparing to restart our SRM server here at IN2P3-CC so I have closed the IN2P3 channel on prod-fts-ws in order to drain current transfer queues. I will open them in 1 hour or 2. Is this a transparent intervention or an unscheduled one? A: technically unscheduled, since it's SRM downtime. An EGEE broadcast was made, but this is just an example… But if the channel was first paused – which would mean that no files will fail – it becomes instead transparent – at least to the FTS – which is explicitly listed as a separate service in the WLCG MoU, both for T0 & T1! i.e. if we can trivially limit the impact of an intervention, we should (c.f. WLCG MoU services at Tier0/Tier1s/Tier2s) WLCG Service Deployment – Lessons Learnt 18

WLCG Service Deployment – Lessons Learnt 19

Summary 2008 / 2009 LHC running will be lower than design luminosity (but same data rate?) Work has (re-)started with CMS to jointly address ‘critical services’ Realistically, it will take quite some effort and time to get services up to ‘design luminosity’

Service Progress Summary ComponentSummary – updates presented at June GDBJune GDB LFCBulk queries deployed in February, Secondary groups deployed in April. ATLAS and LHCb are currently giving new specifications for other bulk operations that are scheduled for deployment this Autumn. Matching GFAL and lcg-utils changes. DPMSRM 2.2 support released in November. Secondary groups deployed in April. Support for ACLs on disk pools has just passed certification. SL4 32 and 64-bit versions certified apart from vdt (gridftp) dependencies. FTS 2.0Has been through integration and testing including certificate delegation, SRM v2.2 support and service enhancements – now being validated in PPS and pilot service (already completed by ATLAS and LHCb); will then be used in CERN production for 1 month (from June 18 th ) before release to Tier-1. Ongoing (less critical) developments to improve monitoring piece by piece continue. 3DAll Tier 1 sites in production mode and validated with respect to ATLAS conditions DB requirements. 3D monitoring integrated into GGUS problem reporting system. Testing to confirm streams failover procedures in next few weeks then will exercise coordinated DB recovery with all sites. Also starting Tier 1 scalability tests with many ATLAS and LHCb clients to have correct DB server resources in place by the Autumn. VOMS rolesMapping to job scheduling priorities has been implemented at Tier 0 and most Tier 1 but behavior is not as expected (ATLAS report that production role jobs map to both production and normal queues) so this is being re-discussed.

Service Progress Summary ComponentSummary – updates presented at June GDBJune GDB gLite 3.1 WMS WMS passed certification and is now in integration. It is being used for validation work at CERN by ATLAS and CMS with LHCb to follow. Developers at CNAF fix any bugs then run 2 weeks of local testing before giving patches back to CERN. gLite 3.1 CECE still under test with no clear date for ‘completion’. Backup solution is to keep the existing 3.0 CE which will require SLC3 systems. Also discussing alternative solutions. SL4 SL3 built SL4 compatibility mode UI and WN released but decision to deploy left to sites. Native SL4 32 WN in PPS now and UI ready to go in. Will not be released to production until after experiment testing is completed. SL4 DPM (needs vdt) important for sites that buy new hardware. SRM 2.2 CASTOR2 work is coupled to the ongoing performance enhancements; dCache 1.8 beta has test installations at FNAL, DESY, BNL, FZK, Edinburgh, IN2P3 and NDGF, most of which also are in the PPS. DAQ-Tier-0 Integration Integration of ALICE with the Tier-0 has been tested with a throughput of 1 GByte/sec. LHCb testing planned for June then ATLAS and CMS from September. OperationsMany improvements are under way for increasing the reliability of all services. See this workshop & also WLCG Collaboration N.B. its not all dials & dashboards!