Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf."— Presentation transcript:

1 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf of the ATLAS DDM team

2 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Overview 1.Introduction 2.ATLAS DDM Architecture 3.ATLAS DDM and the users 4.Summary

3 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Necessity of Data Management in ATLAS Presently Expected during data taking  Generation of RAW data  Processing and reprocessing  Simulation production … STEP09: STEP09: Stress test involving all key elements from data-taking to analysis

4 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services ATLAS DDM responsibilities Central link between WLCG and ATLAS analysis components Manage the experiment’s data: –Data movement between associated sites –Bookkeeping & accounting –Data access to Production systems Physics meta-data systems Analysis systems End users …following ATLAS’ Computing model

5 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services ATLAS Computing Model: Hierarchical Grid Organization Tier-0 facility (CERN): –Archival and distribution of primary RAW detector data –First pass processing of the primary event stream –Distribute the derived data to the Tier-1s 10 Tier-1 data centers: –Long-term access to all data –Reprocessing capacity ~100 Tier-2 institutes: Analysis capacity for users & physics-groups Monte-Carlo simulation RAW ESD AOD RAW ESD AOD CASTOR Event filter Reconstruction RAW ESD, AOD Tier 0 RAW ESD AOD MC RAW ESD AOD MC Analysis Re-Reconstruction ESD, AOD RAW ESD AOD RAW ESD AOD Monte Carlo Analysis MC ESD, AOD RAW, ESD, AOD RAW Tier 1 Tier 2 RAW, ESD, AOD

6 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services ATLAS Computing Model – WLCG Structure Cloud model 3 different Grid flavors with different middleware: –Open Science Grid (OSG): US –NorduGrid (NDGF): Nordic European countries –EGEE: Europe and rest of the world

7 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services DDM Overview OSG EGEE NorduGrid

8 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services DDM Bookkeeping – The Central Catalogues Datasets: Collection of files acting as transfer and organization units Subscriptions: Replication requests of ATLAS datasets Dataset catalogues (Oracle) RepositoryContentLocationSubscription DQ2 catalog API (Apache+mod_python) Container What files are in every dataset? Where is the dataset located? What subscription requests exist in the system? What is the hierarchical organization of the datasets? What datasets exist in the system?

9 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services DDM Data Movement – Site Services simplified schema Fetch Agent FTS LFC SS DB Subscription Catalog Submit Agent Register Agent Callback Agent Poll Agent Subscriptions LFCs Content Catalog Dataset content Update subscriptions, sources & dataset content Sources for files Files to transfer Transfer jobs Transfer status Transfer jobs & states Files to register Files to register Status of transfers, registrations and subscriptions HTTP callbacks

10 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Monitoring for: –Shifters: 24/7 follow-up of DDM activity –Site/Cloud operators: Overview of site/cloud activity –VO Managers: Overview of the whole activity –End users: State of subscriptions DDM Monitoring – The ATLAS DDM Dashboard http://dashb-atlas-data.cern.ch/dashboard/request.py/site

11 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services DDM and the users User access to data is chaotic by nature Behavior can not be foreseen Computing model: Users should send their jobs to the data Users want their data immediately: –dq2-get&direct storage access vs. SS subscriptions –dq2-get abuse can lead to –Storage Element overload –Network congestion –DDM service degradation MB/serrors/hour 10 th September 2008 - First beam day Single user brings down a site

12 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services DDM and the users – Current work Not all data will be accessed with equal regularity: –Tackle user behavior and anticipate future workload by tracing the user’s operations –Predict workload using ARMA models –Distribute important data before it is needed: HOTDISK spacetoken Sites are responsible to replicate files in this spacetoken to different pools

13 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services DDM and the users - Accounting

14 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Summary and conclusions ATLAS DDM is successfully handling all experiment data since 2005 Major challenges passed successfully: ATLAS DDM is capable of handling far more subscriptions than defined in the Computing Model User workload will become the critical factor once we have real data

15 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services References “Distributed Data Management in the ATLAS Experiment”, Mario Lassnig “Distributed Data Management in ATLAS”, Ricardo Rocha “ATLAS, the Grid and the UK”, Roger Jones “The ATLAS Computing Model”, D. Adams et al. “ATLAS STEP09”, Graeme Stewart

16 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Backup slides

17 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services Main types of ATLAS data NameDescriptionSize Monte CarloEvent generator output RAWDetector output – Byte stream format1.6 MB/ev Event Summary Data (ESD)Full output of reconstruction – Object format1.0 MB/ev Analysis Object Data (AOD)Summary of reconstruction. Primary analysis data0.2 MB/ev TagThumbnail of each event used for identifying interesting events at the analysis stage 0.01 MB/ev Derived Physics Data (DPD)Skimmed, slimmed, thinned events derived from AODs0.01 MB/ev

18 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services DDM Deployment Model Central Catalogues and Site Services hosted at CERN LFC and FTS at Tier-1s SRM at every site Site Services Site Services Site Services Central Catalogs

19 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services CCRC 08 T0->T1s throughput Throughput tests: Burst subscriptions injected every 4 hours and immediately honored Failover tests: 12h backlog fully recovered in 30 minutes All Experiments in the game MB/s

20 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services STEP 09 Moved 4PB in 2 weeks Traffic rates up to 5.5 GB/s


Download ppt "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf."

Similar presentations


Ads by Google