1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
December Pre-GDB meeting1 CCRC08-1 ATLAS’ plans and intentions Kors Bos NIKHEF, Amsterdam.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
External and internal data traffic in Tier-2 ATLAS farms. Sketch of farm organization Some approximate estimate s of internal and external data flows in.
Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
CMS STEP09 C. Charlot / LLR LCG-DIR 19/06/2009. Réunion LCG-France, 19/06/2009 C.Charlot STEP09: scale tests STEP09 was: A series of tests, not an integrated.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
CCRC08-1 report WLCG Workshop, April KorsBos, ATLAS/NIKHEF/CERN.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
Status of 2015 pledges 2016 requests RRB Report Concezio Bozzi INFN Ferrara LHCb NCB, November 3 rd 2014.
Introduction to CMS computing J-Term IV 8/3/09 Oliver Gutsche, Fermilab.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
May Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
2012 RESOURCES UTILIZATION REPORT AND COMPUTING RESOURCES REQUIREMENTS September 24, 2012.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
NA62 computing resources update 1 Paolo Valente – INFN Roma Liverpool, Aug. 2013NA62 collaboration meeting.
ATLAS Computing Requirements LHCC - 19 March ATLAS Computing Requirements for 2007 and beyond.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
16 September 2014 Ian Bird; SPC1. General ALICE and LHCb detector upgrades during LS2  Plans for changing computing strategies more advanced CMS and.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Maria Girone, CERN  CMS in a High-Latency Environment  CMSSW I/O Optimizations for High Latency  CPU efficiency in a real world environment  HLT 
Oct 16, 2009T.Kurca Grilles France1 CMS Data Distribution Tibor Kurča Institut de Physique Nucléaire de Lyon Journées “Grilles France” October 16, 2009.
WLCG November Plan for shutdown and 2009 data-taking Kors Bos.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
CMS transferts massif Artem Trunov.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Thoughts on Computing Upgrade Activities
ALICE Computing Model in Run3
An introduction to the ATLAS Computing Model Alessandro De Salvo
ALICE Computing Upgrade Predrag Buncic
New strategies of the LHC experiments to meet
LHC Data Analysis using a worldwide computing grid
The ATLAS Computing Model
Presentation transcript:

1 June 11/Ian Fisk CMS Model and the Network Ian Fisk

2 June 11/Ian Fisk CMS Data Distribution ModelTier-0 Tier-1Tier-1Tier-1 Tier-2Tier-2Tier-2 CMS Tier-2 Tier-2 CAF Tier-2Tier-2

3 CRB/18 June 09/MCSawley STARTING POINT CONSTANTS Trigger rate300 Hz RAW Size0.5MB SimRAW2MB RECO size0.5MB AOD size0.2MB Total number of events 2360 MEvents Total number of simulated events2076MEvents Overlap between PD40, then 20% Total size of RAW 3538 TB Total size RECO 1180 TB Total Primary AOD 472 TB

4 June 11/Ian Fisk Tier-0 to Tier-1 CERN to Tier-1 Average since beginning of 2010 run 600MB/s Transfer Quality is excellent

5 June 11/Ian Fisk Recovery Tier-0 to Tier-1

6 June 11/Ian Fisk CERN to Tier-1 Rate is defined by the accelerator, the detector and the data distribution policy –Livetime of the machine is lower than we expect for the future ↓ System is specified to recover between fills –Data is over subscribed ↑ Will continue as resources allow –RAW event size is smaller than our estimates ↓ –Event rate is defined by the physics program → We expect the average rate from CERN to Tier-1s will increase, but we expect the rate is predictable and until there is a fundamental change in the input it should match the planning –Peaks are roughly what we would expect

7 June 11/Ian Fisk 1. Exporting custodial data: methodology T0—> T1s : exporting FEVT –BW=(RAW+RECO) x Trigger frequency x (1+overlap factor). For the chosen parameters, this yields: BW= 1.0MB x 300Hz x 1.4 = 420 MB/sec, or 3.3 Gb/sec. We expect at least 2 copies of the data from CERN for 2010 so 6.6Gb/s Each T1 receives a share according to its relative size in CPUs Proportional to the trigger rate, event size and Tier-1 size In 2010 We will continue to oversubscribe the data as resources allow

8 June 11/Ian Fisk Somewhat More Interesting T1 to T1 Tier-1 to Tier-1 transfers are used to replicate raw, reco and AOD data, recover from losses and failures at Tier- 1 sites

9 June 11/Ian Fisk Tier-1 to Tier-1 Example from 2009 Repopulation of IN2P3 24 Hour averages since in 2010 data running T1 - T1

10 June 11/Ian Fisk Tier-1 to Tier-1 The CMS plan currently is ~ 3.5 copies of the AOD –After an refresh of the full sample of a year’s running this is 1.6PB of disk to update –Using 10Gb/s that takes 20 days. –Achieving 30Gb/s is a week The Computing TDR had 2 weeks In 2010 we will also be replicating large samples of RECO Recovering from a data loss event at a Tier-1 is more challenging because the data might be coming from one place –Could also take longer with the normal risk of double failure

11 June 11/Ian Fisk Tier-1 to Tier-2 Getting more interesting CMS is very close to completing commissioning the full mesh of Tier-1 to Tier-2 transfers at a low rate –Working on demonstrating more links at 100MB/s –Daily average exceeding 1GB/s

12 June 11/Ian Fisk Tier-1 to Tier-2 Looking at hourly averages is more interesting –See already several examples of 500MB/s for bursts 2GB/s CMS

13 June 11/Ian Fisk This week: a case study CMS produced a 35TB skim of the data sample after a reprocessing pass –Skim took about 36 hours to produce –Data is then subscribed to analysis users

14 June 11/Ian Fisk Total Export Rate Source site is exporting data at more than 1.5GB/s (12Gb/s) Higher than CERN for all 4 VOs

15 June 11/Ian Fisk Tier-1 to Tier-2 In CMS Tier-1 to Tier-2 –Driven by group and user requests –Already we have a 35TB sample users are trying to replicate and access Somewhat unwieldy generally –Full mesh topology is challenging because there are oceans and heterogeneous environments in the way Data is refreshed frequently and even large samples may need to refreshed –500MB/s is already demonstrated Hardest use case is going to be refreshing data after reprocessing –Typically comes from one place and needs to go to many

16 June 11/Ian Fisk Tier-1 to Tier-2 Data from Tier-1 to Tier-2 is driven by event selection efficiency, frequency of reprocessing, level of activity –All of these are harder to predict, but translate into physics potential The connections between data production sites and analysis tiers needs to allow prompt replication –CMS is currently replicating 35TB of data that took 36 hours to produce to 3 sites (~100TB) –These bursts are not atypical

17 June 11/Ian Fisk Tier-2 to Tier-2 Tier-2 to Tier-2 transfers are relatively new, but a growing issue in CMS –Started with trying to replicate group produced data between supporting Tier-2 sites Tier-2 to Tier-2 0.5GB/s

18 June 11/Ian Fisk Tier-2 to Tier-2 Making sure transfer links work has been a lot of effort –Many permutations between Tier-2 sites Tier-2 data is always on disk Many Tier-2s have good WAN connections Sometimes geographically close

19 June 11/Ian Fisk Looking Forward Transfers between Tier-2s is already a big step toward flattening the hierarchy –Should decrease the latency and decrease load on the Tier-1s, but it makes more unstructured use of the networks CMS has been working to optimize the number of objects we read from the files and the order they are requested –Initially this was intended to make better use of local storage –Initial indications are that one can get reasonable CPU efficiency for applications reading data directly over long distances

20 June 11/Ian Fisk Looking Forward How data access over the WAN will evolve is under discussion –ALICE has similar functionality now, it’s not clear if a location unaware solution would be efficient or desirable in CMS –Might be a reasonable backup channel for data –Might be more It’s also not completely clear that all applications of reading over the WAN necessarily cause the networking to go up –Moving whole files, if you only need a fraction of the objects might not be efficient utilization.

21 June 11/Ian Fisk Outlook CERN to Tier-1s is driven by the detector and the accelerator –Somewhat predictable as networking scales with instantaneous luminosity Tier-1 to Tier-1 is driven by need to replicate samples and to recover from problems –See reasonable bursts that will grow with the datasets. Bursts scale as integrated luminosity Tier-1 to Tier-2 is driven by activity and physics choices –Large bursts already. Scale as activity level and integrated lumi Tier-2 to Tier-2 is ramping up.

22 2. Data taking: total sample 6. Daily export for data analysis: -from affliliated T2s for the whole AOD sample (re-RECO) -from all Tiers 2 for custodial AOD Number kSP06 TO Tier1s Tape system 5. MC PROD 1.Custodial Data (prop. to the relative size) 4. AOD Sync N% Users Affiliat ed Tier2s Generic Tier1 Imports Exports 4. AOD Sync 3. During re-reprocessing

23 The results 1 slide per regional Tier1 Pledged Cores are for 2010 Remember: this are really raw values Links: –Solid line: sustained bandwidth (data taking and re- processing periods ONLY) –Broken line: peak bandwidth (may happen at any time: numbers shown is the total if it all happens at the same time ) For each Tier 1, the fraction of served users for analysis is a combination based on –Relative size T2s for analyzing the share of 1srt AOD at considered Tier1, number of users based on the number of supported physics groups –Relative size of T1 for analyzing the full AOD

24 6. Daily export for data analysis: 70 TB IN2P kSP06 IN2P kSP06 TO Tier1s Tape system 7 Gb/s 5. MC PROD: 700TB 0.20 Gb/s 0.40 Gb/s 1. Data taking 4. AOD Sync2.5 Gb/s 13% Users 6 Tier2s FRANCE IN2P3 Imports 520TB Exports 370 TB 4. AOD Sync 3.5 Gb/s 45 MB/sec 20 MB/sec 2. Data taking: (total sample 300TB) 50 MB/sec 3. During re-reprocessing

25 6. Daily export for data analysis: 36 TB FZK kSP06 FZK kSP06 TO Tier1s Tape system 3.5 Gb/s 5. MC PROD: 580 TB 0.15 Gb/s 0.3 Gb/s 1. Data taking 4. AOD Sync 2.3 Gb/s 8% Users 8% Users 3 Tier2s GERMANY FZK Imports 522 TB Exports 367 TB 4. AOD Sync 3.6 Gb/s 40 MB/sec 20 MB/sec 2. Data taking: (total sample 280TB) 40 MB/sec 3. During re-reprocessing

26 5. Daily export for data analysis: 40 TB CNAF kSP06 CNAF kSP06 TO Tier1s Tape system 3.6 Gb/s 5. MC PROD: 440TB 0.12 Gb/s 0.4 Gb/s 1. Data taking 4. AOD Sync 2.8 Gb/s 10 % Users 4 Tier2s ITALY CNAF Imports 514 TB Exports 415 TB 4. AOD Sync 3.5 Gb/s 50 MB/sec 20 MB/sec 2. Data taking: (total sample 345TB) 50 MB/sec 3. During re-reprocessing

27 5. Daily export for data analysis: 32 TB PIC 5.46 kSP06 PIC 5.46 kSP06 TO Tier1s Tape system 3.0 Gb/s 5. MC PROD: 420 TB 0.08 Gb/s 0.2 Gb/s 1. Data taking 4. AOD Sync 1.2 Gb/s 8 % Users 8 % Users 3 Tier2s SPAIN PIC Imports 553 TB Exports 181 TB 4. AOD Sync 3.7 Gb/s 20 MB/sec 10 MB/sec 2. Data taking: (total sample 151TB) 20 MB/sec 3. During re-reprocessing

28 6. Daily export for data analysis: 5 TB TAIWAN 14 kSP06 TAIWAN 14 kSP06 TO Tier1s Tape system 0.4 Gb/s 5. MC PROD: 250 TB 0.08 Gb/s 0.4 Gb/s 1. Data taking 4. AOD Sync 3.10 Gb/s 5 % Users 5 % Users 3 Tier2s TAIWAN ASGC Imports 506 TB Exports 464 TB 4. AOD Sync 3.40 Gb/s 50 MB/sec 25 MB/sec 2. Data taking: (total sample 400 TB) 50 MB/sec 3. During re-reprocessing

29 6. Daily export for data analysis: 39 TB RAL 8.04 kSP06 RAL 8.04 kSP06 TO Tier1s Tape system 3.7 Gb/s 5. MC PROD: 700 TB 0.2 Gb/s 0.25 Gb/s 1. Data taking 4. AOD Sync 1.8 Gb/s 15% Users 4 Tier2s UK RAL Imports 530 TB Exports 267 TB 4. AOD Sync 3.6 Gb/s 30 MB/sec 20 MB/sec 2. Data taking: (total sample 225TB) 30 MB/sec 3. During re-reprocessing

30 6. Daily export for data analysis: 120 TB FNAL 20.4 kSP06 FNAL 20.4 kSP06 TO Tier1s Tape system 11 Gb/s 5. MC PROD: 1274 TB 0.34 Gb/s 1.30 Gb/s 1. Data taking 4. AOD Sync 10 Gb/s 24 % Users 9 Tier2s USA FNAL Imports 340 TB Exports 1.5 PB 4. AOD Sync 2.3 Gb/s 165 MB/sec 80 MB/sec 2. Data taking: (total sample 1230 TB) 160 MB/sec 3. During re-reprocessing