LCG LHC Grid Deployment Board Regional Centers Phase II Resource Planning Service Challenges LHCC Comprehensive Review 22-23 November 2004 Kors Bos, GDB.

Slides:



Advertisements
Similar presentations
Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Advertisements

Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
HEPiX Edinburgh 28 May 2004 LCG les robertson - cern-it-1 Data Management Service Challenge Scope Networking, file transfer, data management Storage management.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
The LCG Service Challenges: Experiment Participation Jamie Shiers, CERN-IT-GD 4 March 2005.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
CERN – June 2007 View of the ATLAS detector (under construction) 150 million sensors deliver data … … 40 million times per second.
1 Data Storage MICE DAQ Workshop 10 th February 2006 Malcolm Ellis & Paul Kyberd.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Grid Computing Status Report Jeff Templon PDP Group, NIKHEF NIKHEF Scientific Advisory Committee 20 May 2005.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
CERN LCG Deployment Overview Ian Bird CERN IT/GD LHCC Comprehensive Review November 2003.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
Summary of Services for the MC Production Patricia Méndez Lorenzo WLCG T2 Workshop CERN, 12 th June 2006.
SC4 Planning Planning for the Initial LCG Service September 2005.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
ATLAS Computing Requirements LHCC - 19 March ATLAS Computing Requirements for 2007 and beyond.
David Stickland CMS Core Software and Computing
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Service Challenge Meeting “Review of Service Challenge 1” James Casey, IT-GD, CERN RAL, 26 January 2005.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Planning the Planning for the LCG Service Challenges Jamie Shiers, CERN-IT-GD 28 January 2005.
LCG Introduction John Gordon, STFC-RAL GDB June 11 th, 2008.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
1 S. JEZEQUEL- First chinese-french workshop 13 December 2006 Grid: An LHC user point of vue S. Jézéquel (LAPP-CNRS/Université de Savoie)
17 September 2004David Foster CERN IT-CS 1 Network Planning September 2004 David Foster Networks and Communications Systems Group Leader
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
“A Data Movement Service for the LHC”
LCG Service Challenge: Planning and Milestones
Kors Bos NIKHEF, Amsterdam.
NL Service Challenge Plans
Data Challenge with the Grid in ATLAS
Update on Plan for KISTI-GSDC
Status and Prospects of The LHC Experiments Computing
CMS transferts massif Artem Trunov.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Summary of Service Challenges
LCG Service Challenges Overview
LHC Data Analysis using a worldwide computing grid
Overview & Status Al-Ain, UAE November 2007.
The ATLAS Computing Model
The LHCb Computing Data Challenge DC06
Presentation transcript:

LCG LHC Grid Deployment Board Regional Centers Phase II Resource Planning Service Challenges LHCC Comprehensive Review November 2004 Kors Bos, GDB Chair NIKHEF, Amsterdam CERN-LHCC-CR

LCG Regional Centers

23 November 2004 LCG Kors Bos  OMC  Only CERN  CIC  CERN, RAL, CCIN2P3, CNAF  Provide central grid services like VO, Monitoring, Accounting, RB, BDII, etc  ROC  RAL, CCIN2P3, CNAF, Stockholm & SARA Amsterdam, FZK Karlsruhe, Athens, PIC Barcelona, Moscow  Responsible for a region  EGEE is hierarchical organised  CERN & 4 countries & 4 Federations  For HEP and non-HEP applications LCG  EGEE in Europe LCG

23 November 2004 LCG Kors Bos  OMC  Only CERN  CIC 4  CERN, RAL, CCIN2P3, CNAF  Provide central grid services like VO, Monitoring, Accounting, RB, BDII, etc  ROC 9  Stockholm, Amsterdam, Karlsruhe, Athens, Barcelona, Lyon, Bologna, Moscow, Didcot  Responsible for a region LCG  Tier-0  Only CERN  Tier-1 ~10  RAL,CCIN2P3, CNAF, GridKa, NL, Nordic, PIC, BNL, FNAL, Triumf, ASCC  Provide central grid services like VO, Monitoring, Accounting, RB, BDII, etc  Data archive, re-processing  Tier-2 ~100  No data archive, Monte Carlo, analysis  Depending on a Tier-1 LCG / EGEELCG

23 November 2004 LCG Kors Bos Regional Centers & LCG Tier-1 Sites ALICEATLASCMSLHCb 1GridKaKarlsruheGermanyXXXX4 2CCIN2P3LyonFranceXXXX4 3CNAFBolognaItalyXXXX4 4NIKHEF/SARAAmsterdamNetherlandsXX X3 5NordicDistributedDk, No, Fi, Se X 1 6PICBarcelonaSpain XXX3 7RALDidcotUKXXXX4 8TriumfVancouverCanada X 1 9BNLBrookhavenUS X 1 10FNALBatavia, Ill.US X 1 11ASCCTaipeiTaiwan XX & LCG Tier-1 Sites

23 November 2004 LCG Kors Bos Grid Deployment Board  National representation of countries in LCG  Doesn’t follow T0/1/2 or EGEE hierarchy  Reps from all countries with T1 centers  Reps from countries with T2 centers but no T1’s  Reps from LHC experiments (comp. coordinators)  Meets every month  Normally at CERN (twice a year outside CERN)  Reports to the LCG Proj.Exec.Board  Standing working groups for  Security (same group also serves LCG and EGEE)  Software Installation Tools (Quattor)  Network Coordination (not yet)  Only official way for centers to influence LCG  Plays an important role in Data and Service Challenges

23 November 2004 LCG Kors Bos Phase 2 Resources in Regional Centers

23 November 2004 LCG Kors Bos Phase 2 Planning Group  Ad hoc group to discuss LCG resources (3/04)  Expanded to include representatives from major T1 and T2 centres and experiments and project management)  Not quite clear what Phase 2 is: up to start-of-LHC (?)  Collected resource planning data from most T1 centres for Phase 2  But very little information available yet for T2 resources  Probable/possible breakdown of resources between experiments is not yet available from all sites – essential to complete the planning round  Fair uncertainty in all numbers  Regular meetings and reports

23 November 2004 LCG Kors Bos Preliminary Tier-1 planning data Experiment requirements and models still under development Two potential Tier-1 centres missing - 6% - 52% 19%

23 November 2004 LCG Kors Bos Some conclusions  The T1 centers will have the cpu resources for their tasks  Variation of a factor 5 in cpu power of T1 centers  Not clear how much cpu resources will be in the T2 centers  Shortage of disk space at T1’s compared to what experiments expect  Enough resources for data archiving but not clear what archive speed is needed and if this will be met  Network technology is there but coordination is needed  End-to-end service is more important than just bandwidth

23 November 2004 LCG Kors Bos Further Planning for Phase 2  end November 2004: complete Tier-1 resource plan  first quarter 2005:  assemble resource planning data for major Tier-2s  understand the probable Tier-2/Tier-1 relationships  initial plan for Tier-0/1/2 networking  Developing a plan for ramping up the services for Phase 2  set milestones for the Service Challenges  See slides on Service Challenges  Develop a plan for Experiment Computing Challenges  checking out the computing model and the software readiness  not linked to experiment data challenges – which should use the regular, permanent grid service!  TDR editorial board established  TDR due July 2005

23 November 2004 LCG Kors Bos Service Challenge for Robust File Transfer

23 November 2004 LCG Kors Bos  Expts  Tier-0  Tier-1   Tier-2 is a complex engine  Experiments DCs mainly driven by production of many MC events  Distributed computing better tested than data distribution  Not well tested:  Tier 0/1/2 model  Data distribution  Security  Operations and support  Specific service Challenge for  Robust file transfer  Security  Operations  User Support Service Challenges I will now only talk about this

23 November 2004 LCG Kors Bos Example: ATLAS  Trigger rate = 200 Hz  Event size = 1.6 MByte  10 T1 centers Result of first pass reconstruction is called ESD data  ESD = 0.5 MByte  Trigger rate = 200 Hz  2 copies at T1’s More refined datasets AOD and TAG add another few % Total for ATLAS: To each T1 ~500 Mbit/sec NB1 other experiments have fewer T1’s NB2 not all T1’s support 4 experiments NB3 Alice events are much bigger, but runs are shorter NB4 Monte Carlo, Re-Processing and Analysis not mentioned Conclusion 1 ~10 Gbit/sec network needed between CERN and all T1’s To each T1 32 MByte/sec = 256 Mbit/sec To each T1 20 MByte/sec = 180 Mbit/sec

23 November 2004 LCG Kors Bos CMS ATLAS ALICE LHCb CERN SARA RAL CNAF SNIC PIC FZK ASCC FNAL BNL CCIN2P3 Triumf Reco farms 24 hours/day 7 months/year uninterrupted 10 ~4 G streams from disk to tape over 100,000 km With buffers for fall-over capability Needs to be built and tested !

23 November 2004 LCG Kors Bos Principles for Service Challenges  Not a network bandwidth race In Gbit/sec has already been proven to be possible  International network topology is important All T0-T1 dedicated links, dark fibers, redundancy, coordination  End-to-end application: from the exp. DAQ to remote tape robot Progress to be made in steps by adding more components each step  Sustainability is a challenge 24 hours/day for 7 months in a row  Redundancy and fall-over capability Data buffers for non-stop operation If one site fails other sites must be able to take more  Performance must include grid software Not only bare GridFTP but SRM and Catalogs  Performance must include experiment specific hard/soft/people- ware Concentrate on generic issues first  Schedule/synchronise related service and computing challenges Must be able to all run concurrently

23 November 2004 LCG Kors Bos hard to achieve wth current infrastructure disk  disk 500 Mbyte/s 4 weeks Radiant softw March CCIN2P3 SARA BNL FNAL FZK RAL disk  disk 500 Mbyte/s 2 weeks GridFTP Dec. CCIN2P3 SARA BNL FNAL July CCIN2P3 SARA BNL FNAL FZK RAL disk  tape 300 Mbyte/s 4 weeks reco farm SRM Nov. CCIN2P3 SARA BNL FNAL FZK RAL T2? Prague T2? Wuppertal Manchester disk  tape 50% rate 4 weeks reco farm SRM Start using dedicated 10 Gbit/s links Experiments Milestones 1-4

23 November 2004 LCG Kors Bos Expt daq  disk To tape at T1’s Reco farm ESD to T1’s Full rate 1 month Apr. All T1’s & many T2’s Feb. Ready All expts Full T0/1/2 Nominal rate All T1’s & T2’s Aug. All T1’s & more T2’s All expts Expt daq  disk To tape at T1’s Reco farm Full T0/1/2 test Full rate 1 month Nov. All expts Expt daq  disk To tape at T1’s Reco farm Full T0/1/2 test Twice the rate 1 month All T1’s & selected T2’s Experiments Milestones 5-8

23 November 2004 LCG Kors Bos Planning for Service Challenges Role of GDB  Planning and coordination  Monthly reporting  Network coordination: dedicated GDB working group Service challenge meetings:  Oct Amsterdam  Dec GridKa, Karlsruhe  Jan 2005 – RAL, Abingdon  March 2005 – CCIN2P3, Lyon  April 2005– ASTW, Taipei Dedicated Network Coordination Meeting  Jan – CERN: T1 reps + NRENs Milestones Document : end January 2005

23 November 2004 LCG Kors Bos END

23 November 2004 LCG Kors Bos Role of Tier-1 Centers  Archive of raw data  1/n fraction of the data for experiment A where n is the number of T1 centers supporting experiment A  Or an otherwise agreed fraction (MoU)  Archive reconstructed data (ESD, etc)  Large disk space for keeping raw and derived data  Regularly re-processing of the raw data and storing new versions of the derived data  Operations coordination for a region (T2 centers)  Support coordination for a region  Archiving of data from the T2 centers in its region

23 November 2004 LCG Kors Bos Tier-2 Centers  Unclear how many there will be  Less than 100, depends on definition  Role for T2 centers:  Data analysis  Monte Carlo simulation  In principle no data archiving  no raw data archiving  Possibly derived data or MC data archiving  Resides in a region with a T1 center  Not clear to what extend this picture holds  A well working grid doesn’t have much hierarchy

23 November 2004 LCG Kors Bos 2004 Achievements for T0  T1 Services  Introduced at May 2004 GDB and HEPIX meeting  Oct – PEB concluded  Must be ready 6 months before data arrives: early 2007  Close relationship between service & experiment challenges  include experiment people in the service challenge team  use a.m.a.p. real applications – even if in canned form  experiment challenges are computing challenges – treat data challenges that physics groups depend on separately  Oct – service challenge meeting in Amsterdam  Planned service challenge meetings:  Dec GridKa, Karlsruhe  Jan 2005 – RAL, Abingdon  March 2005 – CCIN2P3, Lyon  April 2005– ASTW, Taipei  First Generation Hardware and Software in place at CERN  Data transfers have started to Lyon, Amsterdam, Brookhaven and Chicago

23 November 2004 LCG Kors Bos Milestone I & II Proposal Service Challenge 2004/2005 Dec04 - Service Challenge I complete  mass store (disk) -mass store (disk)  3 T1s (Lyon, Amsterdam, Chicago)  500 MB/sec (individually and aggregate)  difficult !  2 weeks sustained  18 December shutdown !  Software; GridFTP plus some macro’s Mar05 - Service Challenge II complete  Software: reliable file transfer service  mass store (disk) - mass store (disk),  5 T1’s (also Karlsruhe, RAL,..)  500 MB/sec T0-T1 but also between T1’s  1 month sustained  start mid February !

23 November 2004 LCG Kors Bos Milestone III & IV Proposal Service Challenge 2005 July05 - Service Challenge III complete  Data acquisition  disk pool  on tape at T0 and T1’s  Reconstruction Farm at CERN: ESD also to T1’s  Experiment involvement: DAQ, Reconstruction Software  Software: real system software (SRM)  5 T1s  300 MB/sec including mass storage (disk and tape)  1 month sustained: July ! Nov05 - Service Challenge IV complete  ATLAS and/or CMS T1/T2 model verification  At 50% of data rate T0  T1   T2   T2  Reconstruction scaled down to 2005 cpu capacity  5 T1’s and 5 T2’s  1 month sustained: November !

23 November 2004 LCG Kors Bos Milestone V & VI Proposal Service Challenge 2006 Apr06 - Service Challenge V complete  Data acquisition  disk pool  on tape at T0 and T1’s  Reconstruction Farm at CERN: ESD also to T1’s  ESD skimming, distribution to T1’s and T2’s  Full target data rate  Simulated traffic patterns  To all T1s and T2’s  1 month sustained Aug06 - Service Challenge VI complete  All experiments (ALICE in proton mode)  Full T0/1/2 model test  100% nominal rate  Reconstruction scaled down to 2006 cpu capacity  1 month sustained

23 November 2004 LCG Kors Bos Milestone VII & VIII Proposal Service Challenge 2006/2007 Nov06 - Service Challenge VII complete  Infrastructure ready at T0 and all T1’s and selected T2’s  Twice the target data rates, simulated traffic patterns  1 month sustained T0/1/2 operation Feb07 - Service Challenge VIII complete  Ready for data taking  All experiments  Full T0/1/2 model test  100% nominal rate

23 November 2004 LCG Kors Bos Resources for Service Challenges Cannot be achieved without significant investments in (initially)  Manpower: few fte per T1 and at CERN  Hardware: dedicated data servers, disk space, network interfaces  Software: SRM implementations  Network: 10 Gb dedicated T0 – T1 Role of GDB  Planning and coordination  Monthly reporting  Network coordination: dedicated GDB working group Concerns  T1 centers have not yet invested very much in it  Also experiments have to take into their planning  Dedicated network needs to be realised (coordination, finances, politics)