Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,

Similar presentations


Presentation on theme: "Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,"— Presentation transcript:

1 Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29, 2006

2 Kaushik De 2 Introduction  Computing needs for HENP experiments keep growing  Computing models have evolved to meet needs  We have seen many important paradigm shifts in the past: farm computing, distributed information systems (world-wide web), distributed production, and now distributed analysis (DA)  Many lessons from the past – from FNAL, SLAC, RHIC and LHC  I will talk about some general ideas, with examples from ATLAS and D0 experiments – see other talks here for additional LHC specific details

3 July 29, 2006 Kaushik De 3 Distributed Analysis Goals  Mission statement: remote physics analysis on globally distributed computing systems  Scale: set by experimental needs  Lets look at LHC example from ATLAS in 2008  10,000-20,000 CPU’s distributed at ~40 sites  100 TB transferred from CERN per day (100k files per day)  20-40 PB data stored worldwide from 1 st year at LHC  Simultaneous access to data for distributed production and DA  Physicists (users) will need access to both large scale storage and CPU from thousands of desktops worldwide  DA systems are being designed to meet these challenges at the LHC, while learning from current & past experiments

4 July 29, 2006 Kaushik De 4 Distributed Analysis Challenges  Distributed production is now routinely done in HENP  For MC production and reprocessing of data - not yet LHC scale  Scale: few TB’s of data generated/processed daily in ATLAS  Scope: organized activity, managed by experts  Lessons learned from production  Robust software systems to automatically recover from grid failures  Robust site services – with hundreds of sites, there are daily failures  Robust data management – pre-location of data, cataloguing, transfers  Distributed analysis is in early stages of testing  Moving from Regional Analysis Center model (ex. D0) to fully distributed analysis model – computing on demand  Presents new challenges, in addition to those faced in production  Chaotic by nature – hundreds of users, random fluctuations in demand  Robustness becomes even more critical – software, sites, services

5 July 29, 2006 Kaushik De 5 Role of Grid Middleware  Basic grid middleware for Distributed Analysis  Most HEP experiments use VDT (which includes Globus)  Security and Accounting - GSI authentication, Virtual Organizations  Tools for secure file transfer and job submission to remote systems  Data location catalogues (RLS, LFC)  Higher level middleware through international Grid projects  Resource brokers (ex. LCG, gLite, CondorG…)  Tools for reliable file transfer (FTS…)  User and group account management (VOMS)  Experiments build application layers on top of middleware  To manage experiment specific workflow  Data (storage) management tools, and database applications

6 July 29, 2006 Kaushik De 6 Divide and Conquer  Experiments optimize/factorize both data and resources  Data factorization  Successive processing steps lead to compressed physics objects  End user does physics analysis using physics objects only  Limited access to detailed data for code development, calibration  Periodic centralized reprocessing to improve analysis objects  Resource factorization  Tiered model of data location and processors  Higher tiers hold archival data and perform centralized processing  Middle tiers for MC generation and some (re)processing  Middle and lower tiers play important role in distributed analysis  Regional centers are often used to aggregate nearby resources

7 July 29, 2006 Kaushik De 7 Example of Data Factorization in ATLAS Warning – such projections are often underestimated for DA

8 July 29, 2006 Kaushik De 8 Example from D0 from A. Boehnlein

9 July 29, 2006 Kaushik De 9 Computing Model Data handling Services (SAM, Dbservers) Central Analysis Systems Remote Farms Central Farms Raw Data RECO Data RECO MC User Data CLuEDO Central Storage Remote Analysis Systems Fix/skim Resource Factorization Example D0 Computing Model from A. Boehnlein

10 July 29, 2006 Kaushik De 10 ATLAS Computing Model  Expected resources  10 Tier 1’s each with 500- 1000 CPU’s, ~1 PB disk, ~1 PB tape  30 Tier 2’s each with 100-500 CPU’s, 100-500 TB disk  Satellite Tier 3 sites – small clusters, user facilities  10 Gb/s network backbone  Tier 0 – repository for raw data, first pass processing  Tier 1 – repository of full set of processed data, reprocessing capabilities, repository for MC data generated at Tier 2’s  Tier 2 – MC production, repository of data summaries  Distributed analysis – uses resources at all Tier’s

11 July 29, 2006 Kaushik De 11 ATLAS CM Resource Requirements Projected resources needed in 2008, assuming 20% MC

12 July 29, 2006 Kaushik De 12 Data Management Systems  DA needs robust distributed data management systems  Example from D0 – SAM  10 years of development/experience  Has evolved from data/metadata catalogue to grid enabled workflow system for central production and user analysis (in progress)  Example from ATLAS – DQ2  3 years of development/experience  Has evolved from data catalogue API to data management system  Central catalogue for data collection information (datasets)  Distributed catalogues for dataset content - file level information  Asynchronous site services for data movement by subscription  Client-server architecture with REST-style HTTP calls

13 July 29, 2006 Kaushik De 13 The Panda Example  Production and Distributed Analysis system in ATLAS  Similar to batch systems for the grid (central job queue)  Marriage of three ideas  Common system for distributed production and analysis  Distributed production jobs submitted through web interface  Distributed analysis jobs submitted through command line interface  Jobs processed through the same workflow system (with common API)  Production operations group maintains Panda as a reliable service for users, working closely with site administrators  Local analysis jobs and distributed analysis jobs with same interface  Use case – physicist develops and tests code on local data, submits to grid for dataset processing (thousands of files) using same interface  ATLAS software framework Athena becomes ‘pathena’ in Panda  Highly optimized for and coupled to ATLAS DDM system DQ2

14 July 29, 2006 Kaushik De 14 Some ATLAS DA User Examples  Use case 1:  User wants to run analysis on 1000 AOD files (1M events)  User copies a few data files using DQ2  User develops and tests analysis code (Athena) on these local files  User runs pathena over 1000 files on the grid to create Ntuples  User retrieves Ntuples for final analysis and to makes plots  Use case 2:  User needs to process 20,000 ESD files (1M events)  Or user wants to generate large signal MC sample  User requests centralized production through web interface  Use case 3:  User needs small MC sample or to process few file on grid  User runs GUI or CL tools (Ganga, AtCom, LJSF, pathena…)

15 July 29, 2006 Kaushik De 15 Panda (pathena) DA Status

16 July 29, 2006 Kaushik De 16 Panda – User Accounting Example

17 July 29, 2006 Kaushik De 17 Conclusion  Distributed production works well – still needs to scale up  Distributed analysis is new challenge – both for current and future experiments in HENP  Scale of resources and users unprecedented at LHC  Many systems being tested – I showed only one example  Robustness of services and data management critically important  Looking to the future  Self organizing systems  Agent based systems


Download ppt "Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,"

Similar presentations


Ads by Google