Presentation is loading. Please wait.

Presentation is loading. Please wait.

A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, 25-30 Aug. 2013NA62 collaboration meeting.

Similar presentations


Presentation on theme: "A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, 25-30 Aug. 2013NA62 collaboration meeting."— Presentation transcript:

1 A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, 25-30 Aug. 2013NA62 collaboration meeting

2 2 1. Execute each operation [transfer, reconstruction…] Requirements/1  Execute/Launch the transfer/reconstruction operations  Typically done with a set of scripts, in part running as demons, in part controlled by an operator  [Adapted] NA48 CDR and [adapted] COMPASS CDR have been used in 2007-2008 and during the technical run.  Logging and [in some case] error recovery 2. Log operations and errors

3 3 1. Execute each operation, controlling the sequence of all steps 2. Record every operation, keep a catalog of all files and relative operations on it 3. Monitor/display the status of entire process, following each element during its lifetime Requirements/2  Not only execute operation, but also know their status, recognize success/failure, handle anomalies, interface to operator, …  Know and control the sequence of operations  Handle/Notify the status of the “sequence” Central Data Recording  Central Data Handler

4 RAW States and Transitions NTUP Reconstruction Filter RECO THIN Thinning RECO Split THIN MakeNtup  The atomic unit is the burst.  A burst is connected to a sequence of operations to be performed:  First of all, generation of the RAW file  From the RAW file, a number of tasks involving generation of other files or file transfers  Each operation is a transition from one state to another:  RAW_on_farm_disk  RAW_file_on_disk_pool  RAW_on_tape  RAW_generated  RECO-1_generated  THIN-1_generated…  An “operation” is performed for each transition: RAW  RECO-1, the reconstruction pass-1 has to be executed; the appropriate copy or remote copy command for file transfers  A new entry has to be created in the file catalog for each transition  Essentially, 2 kinds of transition:  File generation  File transfer 4 RAW  burst cdr00099923-0000 Burst 99923-0000 RAW RECO

5 The idea The “Handler”  We must have the catalog of all the files [+ their meta-data, e.g. data quality information, basic information from TDAQ, etc.]  Link all the files relative to a given burst: the logical unit is the burst  Define the sequence of states through which each burst has to pass  Each state transition defines an operation to be performed on the files  Define a “task” as the operations to be applied to given set of entries in the file catalog [thus causing a state transition for the relative bursts]  We build an “Handler” process to control operations: given a task the Handler will:  Create the list of files on which execute the command(s)  Trigger the execution of the appropriate command(s) on them [typically launching a script]  The trigger for starting this can be either automatic or performed by an operator  Check the execution and notify/handle anomalous or failures The file catalog on_farm_disk on_disk_pool on_T0_tape Distributed_to_T1 Sequence: file storage The file catalog The “Handler”

6  cdr00099923-0000.dat  cdr00099923-0000.reco-1  cdr00099923-0000.reco-1.thin-1  cdr00099923-0000.reco-1.thin-2  …  cdr00099923-0000.reco-2  cdr00099923-0000.reco-2.thin-1  cdr00099923-0000.reco2.thin-2  … 6  The atomic unit is the burst. A burst is connected to a number of files:  There is only one RAW for each burst  Many RECO, THIN, NTUP, …, files can be generated starting from one burst  The files can have multiple copies on different filesystems and in different sites  Files of different kind are generated (RECO, THIN, …)  Use the burst id as primary key.  Generate the first entry as soon as the RAW file appears in the farm disk  Then, attach to it all the following steps in the lifetime of the burst cdr00099923-0000.dat Burst 99923-0000

7 Let’s make a toy example: file storage 7 on_farm_disk /merger/../cdr00099923-0000.dat /merger/../cdr00099924-0000.dat /merger/../cdr00099925-0000.dat For the first step it would be ideal to have the MERGER to insert a new record for each new burst into the catalog, as soon as it creates a new RAW file (otherwise we’ll have to poll) 1.The Handler queries for bursts in the state on_farm_disk and creates the list of files to be copied 2.The Handler creates the appropriate tranfer command 3.The Handler issues the execution of the command on each of the files in the list and checks for success: – If success: Create new entry in the file catalog, corresponding to the new replica of the RAW file Change the status of the burst N to on_disk pool – Otherwise: handle or just notify the failure Probably intermediate states are needed in order to correctly handle the progress of the operation xrdcp//eos/na62/data/cdr root://eosna62.cern.ch + + + file on_farm_disk on_disk_pool on_farm_disk on_disk_pool_pending on_disk_pool_canceled on_disk_pool_started on_disk_pool on_disk_pool_failed //eos/../cdr00099923-0000.dat //eos/../cdr00099924-0000.dat //eos/../cdr00099925-0000.dat + … /merger/../cdr00099925-0000.dat

8 8 The database  The file catalog and the states plus all necessary information will be in this database  Basic tasks of the catalog:  Give an unique file-id and relate to local filename  Relate to its metadata  We also want to:  Keep the relations between all the files related to the same burst,  Keep the state related to the reconstruction/transfer steps  The Handler will trigger the transition, based on the current state of the file  Name*  FileType [FileType]  CustodialLevel  Version  CreationTimestamp  ModificationTimestamp  DeletionTimeStamp  Site [Site]  Storage [Storage]  CopyNumber  Mother [File]  …  Name*  FileType [FileType]  CustodialLevel  Version  CreationTimestamp  ModificationTimestamp  DeletionTimeStamp  Site [Site]  Storage [Storage]  CopyNumber  Mother [File]  … Table: File  Name*  StorageType [StorageType]  isActive  hasReplica  …  Name*  StorageType [StorageType]  isActive  hasReplica  … Table: Storage  Name*  isCustodial  …  Name*  isCustodial  … SCRATCH-1 FARMDISK-1 EOSNA62 CASTORNA62 … TAPE EOS DISK …  Name*  hasTape  hasDisk  …  Name*  hasTape  hasDisk  … Table: SiteType FARM TIER-0 TIER-1 TIER-2 …  Name*  SiteType [SiteType]  Location  ContactPerson  isActive  …  Name*  SiteType [SiteType]  Location  ContactPerson  isActive  … Table: Site NA62-FARM CERN-PROD RAL INFN-CNAF …  Name*  isData  hasVersion  …  Name*  isData  hasVersion  … Table: FileType RAW RECO THIN NTUP …  Number*  MotherRAW [File]  RunType  RunNumber  …  Number*  MotherRAW [File]  RunType  RunNumber  … Table: Burst Table: StorageType

9 9 Example Burst File RAW (farm) File RAW (disk pool) RAW (T1 disk) RECO-1 (T1 disk) File RAW (T0 tape) File RAW (T1 tape) File RECO-1 (T1 tape) File THIN-1 (T1 disk) File RAW (T1 disk) copy 2 File RECO-2 (T1 disk) File THIN-2 (T1 disk) First reprocessing Reconstruction & thinning File THIN-1 (T2 disk) File 300k bursts/year × 3 years 1,000,000 bursts × O(100) entries = 100M entries

10 10 Which DB technology? 300k bursts/year × 3 years 1,000,000 bursts × O(100) entries = 100M entries Looks huge e.g. for MySQL, but ALiEn (ALICE distributed environment, including CATALOG an JOB management) successfully uses MySQL A number of optimizations/tricks can be used:  Partitioning  Indexes  Common queries/caching  … Of course there are alternatives. SQUID caching necessary. By the way…  ALiEn is a very close example: it uses open source software and can be inspirational or even reused  ALiEn project started to provide a file catalog to ALICE and then expanded

11 ALiEn 11

12 12 Catalog Grid services User Interface Handler Job management The other piece to have a complete system…

13 ALICE ATLAS WMS 13 LHCb

14 Pull vs. Push job submission  gLite: a set of grid middleware components responsible for the distribution and management of tasks across grid resources  Push model:  Working as a super-batch system  Jobs submitted to the WMS which schedules the jobs to a Grid CE (computing center)  Computing centers implement their internal batch queues to schedule jobs on the worker nodes  Experiments have implemented their solutions to integrate between middleware and application layer  Frameworks born to manage high-level workflows  Direct control on translation from workflow into grid jobs Independently, the LHC experiments are evolving towards “Pilot job” systems:  Pull model:  Pilot jobs are asynchronously submitted jobs which are running on worker nodes  Users submit jobs to a centralized queue  Pilot jobs communicate with the WMS (pilot aggregator) pulling user jobs from the repository 14

15 To be continued… 15


Download ppt "A proposal: from CDR to CDH 1 Paolo Valente – INFN Roma [Acknowledgements to A. Di Girolamo] Liverpool, 25-30 Aug. 2013NA62 collaboration meeting."

Similar presentations


Ads by Google