Presentation is loading. Please wait.

Presentation is loading. Please wait.

PSM, 16.06.03 Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject.

Similar presentations


Presentation on theme: "PSM, 16.06.03 Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject."— Presentation transcript:

1 PSM, 16.06.03 Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject to modifications! Disclaimer: To be considered as VERY PRELIMINARY!

2 PSM, 16.06.03 Maria Girone, IT-DB 2 Introduction POOL project: common persistency framework for physics applications at LHC Provides persistency for c++ transient objects and transparent navigation to single objects integrated with a Grid-aware File Catalog POOL has chosen the Local Replica Catalog (LRC) and Replica Metadata Catalog (RMC) as the basis of the Grid catalog implementation Pre-production service based on Oracle (from IT/DB), RLSTEST, already in use for POOL V1.0 (May 13 th ) POOL will provide a production release in June to be used for LCG-1

3 PSM, 16.06.03 Maria Girone, IT-DB 3 Inputs and assumptions Calculate the registration and lookup frequency for the EDG-RLS service for –LCG-1 - based on CMS PCP and DC04 documents –2008 Assumptions: –LCG-1 - CPU power: PIII, 1 GHz= 400 SI2k, –2008 – CMS CPU power: T1 ~2M SI2k, x5 T1 ~ 10M SI2k, All T2 = All T1 World Wide 20M SI2k [4]

4 PSM, 16.06.03 Maria Girone, IT-DB 4 Processing time LCG-1 –Simul: 1 job reads 1 kine file (100 events) and writes 1 output file. Takes 12 hours (400s/event, or 160k SI2k s/event) –Digi: 1 job reads 10 simul files (1000 events) and writes 1 output file. Takes 6 hours (20s/event, or 8k SI2k s/event) –Reco: 1 job reads 1 digi file (1000 events) and writes 1 output file. Takes 9 hours (30s/event, or 12k SI2k s/event) 2008 –Analysis: 100 SI2k s/event[5]

5 PSM, 16.06.03 Maria Girone, IT-DB 5 Requirements for LCG-1 Take CMS PCP as example: Total number of events to produce (kine, Simul, Digi and half Reco): 50M [1] July November –Fraction of expected LCG-1 production [2] 10% 75% –Number of input files 50k 375k –Number of output files 50k 450k –Input file size 20MB 0.02+10*0.2+1.5GB –Output file size 200MB 0.2 + 1.5 +0.5GB –Total file size 10TB 140TB

6 PSM, 16.06.03 Maria Girone, IT-DB 6 Database requirements for LCG-1 Assuming 25% loss July November Average number of jobs/day[2] 4500 8000 in LCG-1 2500 6000 If all would be in LCG-1: –File lookup frequency: 0.05Hz 0.2Hz –File registration frequency: 0.05Hz 0.1Hz –Total interaction rate: 0.1Hz 0.3Hz 1 interaction every 10 sec 3 sec Current tests show performance well in excess of these requirements!

7 PSM, 16.06.03 Maria Girone, IT-DB 7 Predictions for 2008 In CMS one event is 100 kB and is analyzed in ~100 SI2k s [5] The rate capability world wide is 20M SI2k / 100 SI2k s = 200k events/s Data rate capability world wide is 20 GB/s The size of useful data in 1 file is guessed to be File size * fraction of data used = 2 GB * 0.1 = 0.2 GB Max number of file opening in CMS world wide = 20 GB/s / 0.2 GB = 100 Hz If at CERN 50 GB/s [6] lookup frequency 250 Hz decreases by increasing the useful data size in 1 file.

8 PSM, 16.06.03 Maria Girone, IT-DB 8 Predictions for 2008 Considering analysis, simulation, reprocessing and reconstruction[8] at CERN Alice needs 7.4M SI2k Atlas needs 6.3M SI2k CMS needs 7.4M SI2k LHCb needs 2.0M SI2k --------------------------- 23M SI2k Assuming that all tasks require on average a single process time as for CMS analysis (0.1k SI2k s/event) and an event size of 100kB throughput of 23 GB/s is needed Assuming a useful file size of 2GB*10%=0.2GB lookup frequency = 120 Hz at CERN 1 look up every 8 ms

9 PSM, 16.06.03 Maria Girone, IT-DB 9 Conclusions The estimated lookup and registration frequencies for LCG-1 is based on the CMS PCP and DC04 figures. The unknown is the fraction of data production via LCG-1 – If all goes via LCG-1, the average rate of registrations and lookups is 300mHz (1 every 3 s) The projection of these numbers to 2008 is based on the following assumptions –Available CPU power to the experiments: 23M SI2k [8] –Average process time: 0.1k SI2k s/event[5] (230k evts/s) –Average event size in analysis: 0.1 MB [5] (throughput 23GB/s) –Useful file size: 2GB*10%=200MB From above, the maximum peak lookup frequency (throughput/useful file size) ~120 Hz

10 PSM, 16.06.03 Maria Girone, IT-DB 10 Consistency check In 2008 1 dual cpu box will correspond to an average 5k SI2k [7] 1 CPU will be able to process 2.5 MB/s To reach 25 GB/s 10k CPUs or 5k Dual CPU are needed and will be available [9]

11 PSM, 16.06.03 Maria Girone, IT-DB 11 References CMS Pre-DC04 –[1] DC04.xml (Claudio Grandi) –[2] Performance Metrics (Tony Wildish) –[3] Derived from [1-2] considering the average data read and written / job and the number of job/day CMS Analysis –[4] Table A3.6 of the CERN/LHCC/2001-004 CERN/RRB-D 2001-3 1SI95=9SI200 –[5] P. Capiluppi "World Wide Computing" LHCC Comprehensive Review of CMS SW and Computing –[6] Bernd Panzer-Steindel “LAN and disk I/O predictions” Draft 1.0 –[7] Table A3.13 of the CERN/LHCC/2001-004 CERN/RRB-D 2001-3 1SI95=9SI200 –[8] Table A3.9 of the CERN/LHCC/2001-004 CERN/RRB-D 2001-3 1SI95=9SI200 –[9] End of Section 3.5.3.1 of the CERN/LHCC/2001-004 CERN/RRB- D 2001-3


Download ppt "PSM, 16.06.03 Database requirements for POOL (File catalog performance requirements) Maria Girone, IT-DB Strongly based on input from experiments: subject."

Similar presentations


Ads by Google