Presentation is loading. Please wait.

Presentation is loading. Please wait.

Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.

Similar presentations


Presentation on theme: "Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric."— Presentation transcript:

1 Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric Hjort, LBNL (STAR/PPDG Collaborations) Presented by Doug Olson, LBNL

2 September 27, 2004Eric Hjort2 The STAR Experiment (Solenoidal Tracker at RHIC)

3 September 27, 2004Eric Hjort3 STAR Computing Overview Main computing facilities at BNL (RCF) & LBNL (PDSF at NERSC) –BNL (tier 0) has raw data, performs event reconstruction (DST production), simulations, user data analysis –LBNL (tier 1) replicates some raw data and all DSTs from BNL, performs embedding simulations, user data analysis Data Management and Analysis tools: –Combined metadata/file/replica catalog at each site –SRM w/RRS for file replication and cataloging (this talk) –STAR Unified Meta-Scheduler (SUMS): user interface to files, catalogs, batch systems, grid computing (J. Lauret’s talk, Thu. 14:20 [318]) –STAR GridCollector for data mining (Thu. 16:50 [319]) RHIC Run IV ended April 2004 –STAR recorded ~200 TB raw data –DST’s will be ~40 TB per pass

4 September 27, 2004Eric Hjort4 What are SRMs? Grid middleware developed by the Scientific Data Management (SDM) Group, LBNL Examples of SRMs (Storage Resource Managers) Disk Resource Managers (DRMs) Tape Resource Managers (TRMs) Hierarchical Resource Managers (HRM = TRM+DRM) API and CLI are supported For STAR: HRMs reduce the 3-step transfer: (BNL HPSS)  (BNL disk)  (LBNL disk)  (LBL HPSS) to a 1-step process. Details of cache management, request queuing, fault recovery, etc. are hidden.

5 September 27, 2004Eric Hjort5

6 September 27, 2004Eric Hjort6 Advantages of SRMs I Same uniform interface to all types of storage systems –Disk Storage System –Mass Storage Systems –HPSS, Castor, Enstore, JASMine –HSI, FTP, PFTP Configurable resource utilization –Degree of parallelism for each transfer –Maximum total concurrency of transfers –Cache sizes –File “pinning”: duration, maximum size and number

7 September 27, 2004Eric Hjort7 Advantages of SRMs II For Mass Storage Systems it provides: –Queuing and pre-staging Queued multi-file “get” requests (avoid flooding MSS) Pre-staging of files (concurrent with transfer) Queued archiving of “put” requests (avoid flooding MSS) –Robustness and efficiency Recover in case of transient failures from MSS Reorders pre-staging requests to minimize tape mounts –Recovery from failed GridFTP transfers Re-issues requests in case of failure

8 September 27, 2004Eric Hjort8 SRM-COPY (thousands of files) SRM-GET (one file at a time) GridFTP GET (pull mode) stage files archive files Network transfer Get list of files From directory Anywhere Disk Cache DataMover (Command-line Interface) HRM (performs writes) LBNL Disk Cache HRM (performs reads) BNL DataMover/HRMs as used by STAR for Robust Multi-file Replication

9 September 27, 2004Eric Hjort9 Request Submitted Space Reserved PFTP->HPSS done PFTP->HPSS start HPSS request GridFTP end GridFTP start File Released Time (seconds) - Test transfer of 56 files as seen from the HRM at LBNL - 9 concurrent GridFTPs, 4 concurrent pftps into HPSS Transfer Schematic

10 September 27, 2004Eric Hjort10 STAR Data Processing: 2004 Run 200 TB BNL LBNL Raw Data Micro-DST 10 TB User Analysis 40 TB 40+20 TB TB’s HRM STAR Unified Meta-Scheduler (SUMS) + GridCollector

11 September 27, 2004Eric Hjort11 Star File Replication and Catalogs BNL LBNL BNL File Catalog BNL FC Mirror LBNL FC LBNL FC Mirror Files/Datasets MySQL HRM (metadata catalog+filecatalog+replica catalog) data

12 September 27, 2004Eric Hjort12 BNL LBNL BNL File Catalog BNL FC Mirror LBNL FC LBNL FC Mirror MySQL RRS Prototype Replica Registration Service (RRS) read write Files/Datasets HRM

13 September 27, 2004Eric Hjort13 Prototype RRS Details DRM server calls the RRS server –Each time a single file transfer is completed –Passes the SRM format for file transfers –No error handling, multiple attempts, etc. RRS server calls catalog update script –Once for every N files –Script parses URLs to get some metadata from path/filename –Remaining metadata is read from BNL catalog mirror –LBNL catalog is updated –If error, SRM format is saved The present implementation of RRS, provided by the SDM group, is being tested within the STAR data transfer framework for scalability and feasability. Formal release of a generic implementation of RRS is scheduled for SRM v2.0 in late ’04.

14 September 27, 2004Eric Hjort14 CLI example 1 srm-copy.linux -conf hrm.rc -sd "srm://bnlhost.rcf.bnl.gov:port /home/starsink/raw/daq/2004/011/st_physics_adc_5011048_raw_1* ?remoteobj=HRMServerBNL&msshost=hpss.rcf.bnl.gov&mssport=port" -td "srm://garchive.nersc.gov/nersc/projects/starofl/raw/daq/2004/011" -at PLAIN -et GSI -al fakeACCT -ap “fakePWD" Copy the entire contents of a directory, subject to a wild-card search: - Recursive and conditional transfers also supported

15 September 27, 2004Eric Hjort15 CLI example 2 srm-copy.linux -d -f daq.get -c /auto/u/hjort/hrm2/hrm.rc -l daq.log -w -at PLAIN -et GSI -al hpssaccount -ap “hpsspwd" daq.get: (sourceURL, size, targetURL) srm://bnlhost.rcf.bnl.gov:port /home/starsink/raw/daq/2004/015/st_physics_adc_5015002_raw_1030001.daq ?remoteobj=HRMServerBNL&msshost=hpss.rcf.bnl.gov&mssport=port 525349883 srm://garchive.nersc.gov/nersc/projects/starofl /raw/daq/2004/015/st_physics_raw_1030001.daq Supply SRM with a file list: - File lists created by comparing BNL mirror and LBNL catalogs - Useful for getting missing files, completing crashed transfers, etc.

16 September 27, 2004Eric Hjort16 CLI example 3 srm-copy.linux -d -f disk_ppLong-1FullFieldP03ih.get -c /auto/u/hjort/hrm2/hrm.rc -l disk_ppLong-1FullFieldP03ih.log gsiftp://bnlhost.rcf.bnl.gov/star/data16 /reco/ppLong-1/FullField/P03ih/2003/150 /st_physics_4150010_raw_0020078.MuDst.root 23315827 srm://garchive.nersc.gov/nersc/projects/starofl /reco/ppLong-1/FullField/P03ih/2003/150 /st_physics_4150010_raw_0020078.MuDst.root Advantage: NFS access typically faster than HPSS Get files from remote NFS disks instead of HPSS: disk_ppLong-1FullFieldP03ih.get:

17 September 27, 2004Eric Hjort17 Performance and Statistics –Maximum WAN transfer rate ~30 MB/s –Production transfer rates of up to 10 MB/s Limited by memory and I/O of the nodes running HRM as well as data source I/O (HPSS or disk) Extremely reliable – typical transfers run for days, 10k’s of files, TB volumes –Since RRS (~1 year) 250k files, 25 TB transferred and cataloged RRS Reliability has been essentially 100%

18 September 27, 2004Eric Hjort18 Milestones April ’02: HRM is a STAR production tool –Fast, automated, robust file replication achieved –Simple to use, highly configurable January ’03: Distributed catalogs deployed –Mysql catalog replication between sites means all catalog information available locally –SUMS allows users to utilize distributed data without seeing underlying details and complexities October ’03: RRS in production –Automatic cataloging necessary for increased data volume from RHIC run IV September ’04: Begin run IV production –Integrated replication and cataloging greatly reduces effort –Immediate accessability to datasets as files are transferred and registered on tier1 sites

19 September 27, 2004Eric Hjort19 Summary HRMs play an essential role in STAR computing –Bulk file replication is simplified and automated –Fast distribution of data to Tier1 sites along with immediate cataloging increases the resources available to analyze the data –Other STAR tools (SUMS, GridCollector) provide user interface to STAR data Faster, more efficient analysis of STAR data, yielding better physics sooner

20 September 27, 2004Eric Hjort20 Related URLs Scientific Data Management group: http://sdm.lbl.gov/ STAR experiment http://www.star.bnl.gov/ Particle Physics Data Grid (PPDG) http://www.ppdg.net/


Download ppt "Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric."

Similar presentations


Ads by Google