Presentation is loading. Please wait.

Presentation is loading. Please wait.

Notes on offline data handling M. Moulson Frascati, 29 March 2006.

Similar presentations


Presentation on theme: "Notes on offline data handling M. Moulson Frascati, 29 March 2006."— Presentation transcript:

1 Notes on offline data handling M. Moulson Frascati, 29 March 2006

2 Data flow in reprocessing Jobs: 1 per raw file and run (datarec_reproc_ibm.csh) Prefetch: All raw files for run Script: start_recall_files.csh, recall_one_dr.tcl Method: “recall raw (PROD areas)” List of PROD areas specified explicitly (from DB2 query) Files recalled one at a time Jobs started when files on disk Explore option of recalling all files for run at once? Input: raw files (1 per job) Script: runXreproc_ibm.csh Method: KID URL dbraw: GROUP_ID=PROD Output: datarec files (1 per stream and job) Written to /datarec and archived Advance recall to DSTPROD area: ”recall datarec dstprod”

3 Data flow in DST production Jobs: 1 per stream and run Disk mode (i.e. for output from reprocessing: datarec_dstfd_ibm.csh) Prefetch: None Input: All datarec files for stream and run Script: dstXprocfd_ibm.csh Method: KID URL dbdatarec GROUP_ID DSTPROD Tape mode (datarec_dst_ibm.csh) Prefetch: All datarec files for stream and run Script: start_recall_files.csh recall_one_dr.tcl Method: “recall datarec (PROD areas)” Change to DSTPROD area to avoid inconsistency? Input: datarec files Script: dstXprod_ibm.csh Method: KID URL dbdatarec GROUP_ID DSTPROD Output: DST files (1 per stream and run) Written to /datarec and archived

4 Data flow in MC production (1/2) Jobs: 1 per run and card type (mcprod.pl) Processes: 1 or more GEANFI processes, each followed by a datarec process 1 DST job per requested DST stream at end GEANFI output: 1 mco file per GEANFI process Written to /datarec, not archived Reconstruction prefetch: All bgg/lsb (datarec) files for run Method: “recall datarec all” Currently, prefetch all files at start of each reconstuction job Instead, prefetch once before first GEANFI process Reconstruction input: mco file Method: KID URL “ybos:” (files on /datarec) Reconstruction input: Subset of bgg/lsb files for run Method: KID URL “dbdatarec:”

5 Data flow in MC production (2/2) Reconstruction output: 1 mcr file per mco file (GEANFI process) Written to /datarec and archived Advance recall to DSTPROD area ”recall mc dstprod” Is this really a good idea? DSTs start right away from same directory See notes below DST input: All mcr files for job, for each requested DST stream (process) Method: KID URL dbmc DSTPROD DST output: 1 MC DST file per process Written to /datarec and archived

6 Data flow for standalone MC DSTs Jobs: 1 per run and card type (mcprod_dst.pl) Processes: 1 per requested DST stream Prefetch: None Input: All mcr files for run and card type Method: KID URL dbmc DSTPROD Output: 1 MC DST per process Written to /datarec and archived

7 Standard offline file types TypeDescriptionProduced byDSRV group kpm, ksl, rpi, rad, clb, ufo, bha Reconstructed datadatarecUSER csm, tmu, filfo, afilfo, selcos Non-reconstructed and calibration data No longer producedUSER bgg 2001-2002 background events datarec (bgg jobs)USER? lsb 2004-2005 background events datarecUSER? mcr Reconstructed MC events GEANFIUSER dkc, dk0, d3p, drn, drc DST filesdatarec (DST jobs)DST mkc, mk0, m3p, mrn, mrc MC DST filesdatarec (MC/MCDST jobs)DST MC files (mcr) are technically datarec streams of type ALL (stream_id = 0) DESCRIPT.STREAM_OFFLINE contains separate DSRV groups for data and MC DSRV groups shown are for data, except for mcr files, for which the MC DSRV group is shown

8 DSRV groups for background files All datarec types already have DSRV group dir = USER All DST types (data and MC) already have DSRV group dir = DST By default these are recalled to “DST cache” Exception is background files (bgg, lsb): 1.Change to DST group? 2.Leave as USER and recall to PROD? 3.Leave as USER and recall with kcp?

9 Summary of recall areas VolumeGBDSRV group recalled470DST, USER_RO recalled1470DSTPROD, PROD recalled2160DSTPROD, PROD recalled31640DST, USER_RO recalled4160USER, USER_RO recalled51640USER, USER_RO recalled63280DST, USER_RO recalled73280DST, USER_RO recalled83280DST, USER_RO DSRV groupBeforeAfter DST11950 PROD + DSTPROD630470 USER18001640 /datarec currently 470 GB SSA disk Must add more disks to string: Access bandwidth: All MC output to /datarec 84 MB/s with 300 B80 for MC 138 MB/s if input to MCDST also from /datarec Adding disks to string helps parallelism Size: Archiver maintains /datarec at 40% full MC requires <90% full to start /datarec filled to 50% within 1 hour Archiving bandwidth: Saturated with 150 B80 for MC Must increase by system tuning: Any amount of /datarec space “immediately” filled if archiving bandwidth insufficient

10 Transfers to and from /datarec Transfer type Event size (KB) Rate/B80 (KB/s) Rate 300 B80 (MB/s) GEANFI output204012 datarec input204012 datarec output306018 mcr archiving306018 Recall to DSTPROD306018 DST input12024072 DST output5103 DST archiving5103 Total (DST input DSTPROD)14028084 Total (DST input /datarec)230460138 Assumes:0.5 B80 s to fully produce 1 event, including DSTs 4 DST processes per job, zero overlap in DSTs

11 Recommended tape space allocation TypeCurrent (TB) Temporary allocation (TB) Final allocation (TB) raw248.7250 rec181.2250 DST34.745 MC55.2150350 MC DST9.62560 Total529.4720955 1.Allocations include currently occupied space 2.MC DSTs probably appear as datarec files to archiver 3.Current library system capacity ~720 GB New cassettes will have to be ordered in future 4.Temporary allocation based on 720 GB library Assumes MC production slow 5.Final allocation assumes completion of KLOE offline program


Download ppt "Notes on offline data handling M. Moulson Frascati, 29 March 2006."

Similar presentations


Ads by Google