Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking.

Similar presentations


Presentation on theme: "Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking."— Presentation transcript:

1 Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking Fermilab, USA Computing Division DØ in brief SAM feature Overview SAM operation at DØ Summary Roadmap of Talk

2 22 Aug. 2003Lee Lueking, HEP Data Grid2 The DØ Experiment D0 Collaboration –18 Countries; 80 institutions –>600 Physicists Detector Data (Run 2a end mid ‘04) –1,000,000 Channels –Event size 250KB –Event rate 25 Hz avg. –Est. 2 year data totals (incl. Processing and analysis): 1 x 10 9 events, ~1.2 PB Monte Carlo Data (Run 2a) –6 remote processing centers –Estimate ~0.3 PB. Run 2b, starting 2005: >1PB/year Tevatron Chicago  pp p CDF DØ

3 SAM Features http://d0db.fnal.gov/sam

4 22 Aug. 2003Lee Lueking, HEP Data Grid4 Managing Resources in SAM Data Resources (Storage + Network) Compute Resources (CPU + Memory) Local Batch SAM Station Servers Data and Compute Co-allocation SAM Global Optimizer SAM metadata Fair-share Resource allocation User groups Consumer(s) Project= DS on Station Dataset Definitions Datasets (DS) Batch scheduler SAM Meta-dataSAM servers Batch + SAM Replica Catalog Replica Management Service

5 22 Aug. 2003Lee Lueking, HEP Data Grid5 Simplified SAM Database Schema (SAM Metadata) Files ID Name Format Size # Events Files ID Name Format Size # Events Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Volume Project Data Tier Physical Data Stream Physical Data Stream Trigger Configuration Trigger Configuration Creation & Processing Info Creation & Processing Info Run Event-File Catalog Event-File Catalog Run Conditions Luminosity Calibration Trigger DB Alignment Run Conditions Luminosity Calibration Trigger DB Alignment Group and User information Group and User information Station Config. & Cache info Station Config. & Cache info File Storage Locations File Storage Locations MC Request & Info MC Request & Info SAM schema has over 100 tables There are several other related table spaces also available

6 22 Aug. 2003Lee Lueking, HEP Data Grid6 Monte Carlo Request System User defines required data in terms of a set of metadata keyword/values which define the physics details of the requested MC sample. This is then stored in SAM and when the request is processed, this physics data is extracted, and augmented with further 'processing mechanics' information and converted into executable jobs which are tailored to the resource they are executed on. The resulting data is stored in SAM with the physics metadata augmented by the details of the workflow and data provenance. Essentially it provides a metadata materialization service (a.k.a. virtual data system).

7 22 Aug. 2003Lee Lueking, HEP Data Grid7 SAM File Forwarding and Routing MSS SAM Station 1 SAM Station 2 SAM Station 3 SAM Station 4 Remote SAM Station Station Responsibilities Pre-stage files for consumers. Manage local cache Store files for producers File Forwarding File stores can be forwarded through other stations File Routing Routes for file transfers are configurable Extra-domain transfers use bbftp or GridFTP (parallel transfer protocols) Remote SAM Station Remote SAM Station

8 SAM at DØ d0db.fnal.gov/sam

9 22 Aug. 2003Lee Lueking, HEP Data Grid9 Overview of DØ Data Handling Registered Users600 Number of SAM Stations56 Registered Nodes900 Total Disk Cache40 TB Number Files - physical1.5M Number Files - virtual0.7M Robotic Tape Storage400 TB Regional Center Analysis site Summary of DØ Data HandlingIntegrated Files Consumed vs Month (DØ) Integrated GB Consumed vs Month (DØ) 4.0 M Files Consumed 1.2 PB Consumed Mar2002 Mar2003

10 22 Aug. 2003Lee Lueking, HEP Data Grid10 Great Britain 200 All Monte Carlo Production Netherlands 50 France 100 Texas 64Czech R. 32 fnal.gov DØ Data Flows UNIX hosts ENSTORE movers LINUX farm 300+ dual PIII/IV nodes Startap Chicago switch a: production c: development ADIC AML/2 STK 9310 powderhorn ClueDØ Linux desktop user cluster 227 nodes Fiber to experiment switch DEC4000 d0ola,b,c L3 nodes RIP data logger collector/router a b c SUN 4500 Linux quad d0ora1 d0lxac1 Linux d0dbsrv1 switch SGI Origin2000 128 R12000 processors 27 TB fiber channel disks Central Analysis Backend (CAB) 160 dual 2GHz Linux nodes 35 GB cache ea. Experimental Hall/office complex CISCO Datalogger Worldwide Analysis

11 22 Aug. 2003Lee Lueking, HEP Data Grid11 DØ SAM Station Summary NameLocationNodes/cpuCacheUse/comments Central- analysis FNAL128 SMP*, SGI Origin 2000 14 TBAnalysis & D0 code development CAB (CA Backend) FNAL16 dual 1 GHz + 160 dual 1.8 GHz 6.2 TBAnalysis and general purpose FNAL-FarmFNAL100 dual 0.5-1.0 GHz +240 dual 1.8 GHz 3.2 TBReconstruction CLueD0FNAL50 mixed PIII, AMD. (may grow >200) 2 TBUser desktop, General analysis D0karlsruhe (GridKa) Karlsruhe, Germany 1 dual 1.3 GHz gateway, >160 dual PIII & Xeon 3 TB NFS shared General/Workers on PN. Shared facility D0umich (NPACI) U Mich. Ann Arbor 1 dual 1.8 GHz gateway, 100 x dual AMD XP 1800 1 TB NFS shared Re-reconstruction. workers on PN. Shared facility Many Others > 4 dozen WorldwideMostly dual PIII, Xeon, and AMD XP MC production, gen. analysis, testing *IRIX, all others are Linux

12 22 Aug. 2003Lee Lueking, HEP Data Grid12 Station Stats: GB Consumed (by jobs) Daily Feb 14 – Mar 15 Central-Analysis FNAL-farm ClueD0 CAB 2.5 TB Feb 22 270 GB Feb 17 1.1 TB Mar 6 >1.6 TB Feb 28

13 22 Aug. 2003Lee Lueking, HEP Data Grid13 Station Stats: MB Delivered/Sent Daily Feb 14 – March 15 Central-Analysis FNAL-farm ClueD0 CAB Delivered to Sent from 1 TB Feb 22 150 GB Feb 17 1.2 TB Mar 6 600 GB Feb 28 2.5 TB Feb 22 Consumed 270 GB Feb 17 Consumed 1.1 TB Mar 6 Consumed 1.6 TB Feb 28 Consumed

14 22 Aug. 2003Lee Lueking, HEP Data Grid14 Challenges (1) Getting SAM to meet the needs of DØ in the many configurations is and has been an enormous challenge. –Automating Monte Carlo Production and Cataloging with MC request system in conjunction with MC RunJob meta system. –File corruption issues. Solved with CRC. –Preemptive distributed caching is prone to race conditions and log jams. These have been solved. –Private networks sometimes require “border” naming services. This is understood. –NFS shared cache configuration provides additional simplicity and generality, at the price of scalability (star configuration). This works. –Global routing completed.

15 22 Aug. 2003Lee Lueking, HEP Data Grid15 Challenges (2) –Convenient interface for users to build their own applications. SAM user api is provided for python. –Installation procedures for the station servers have been quite complex. They are improving and we plan to soon have “push button” and even “opportunistic deployment” installs. –Lots of details with opening ports on firewalls, OS configurations, registration of new hardware, and so on. –Username clashing issues. Moving to GSI and Grid Certificates. –Interoperability with many MSS. –Network attached files. Consumer is given file URL and data is delivered to consumer over the network via RFIO, dCap, etc.

16 22 Aug. 2003Lee Lueking, HEP Data Grid16 Summary SAM is a well-hardened, multi-featured, distributed Data Management and Delivery system. The DØ Experiment has many challenging data management needs which are being met by SAM on a worldwide scale. (CDF is also using SAM). Many complex issues have been solved to provide the needed level of service to the experiment. Now, on to SAMGrid…

17 Thank You


Download ppt "Data Management with SAM at DØ The 2 nd International Workshop on HEP Data Grid Kyunpook National University Daegu, Korea August 22-23, 2003 Lee Lueking."

Similar presentations


Ads by Google