Presentation is loading. Please wait.

Presentation is loading. Please wait.

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.

Similar presentations


Presentation on theme: "DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics."— Presentation transcript:

1 DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics Conference on High Energy Physics Manchester, England 19-25 July 2007

2 EPS-HEP 2007 Manchester 2Outline Introduction  DØ Computing Model  SAMGrid Components  Interoperability Activities  Monte Carlo Generation  Data Processing Conclusion  Next Steps / Issues  Summary

3 EPS-HEP 2007 Manchester 3Introduction Tevatron –Running experiments (Less data than LHC, but still PBs/experiment) –Growing - great physics & better still to come.. Have >3fb -1 of data and expect up 5fb -1 more by end 2009 Computing model: Datagrid (SAM) for all data handling & originally distributed computing with evolution to automated use of common tools/solutions on the grid (SAMGrid) for all tasks –Started with production tasks –Started with production tasks eg MC generation, data processing Greatest need & easiest to ‘gridify’ - ahead of wave & a running expt. –Base on SAMGrid, but have program of interoperability from v. early on Initially LCG and then OSG –Increased automation, user analysis considered last SAM gives remote data analysis

4 EPS-HEP 2007 Manchester 4 Computing Model Remote Analysis Systems Data Handling Services Central Analysis Systems Remote Farms Central Farms User Desktops Central Storage Raw Data RECO Data RECO MC User Data

5 EPS-HEP 2007 Manchester 5 Components - Terminology SAM (Sequential Access via Metadata) –Well developed metadata & distributed data replication system –Originally developed by DØ & FNAL-CD, now used by CDF & MINOS JIM (Job Information and Monitoring ) –handles job submission and monitoring (all but data handling) –SAM + JIM → SAMGrid – computational grid Runjob –handles job workflow management Automation –d0repro tools, automc (UK Role – Project leadership, key technology and operations)

6 EPS-HEP 2007 Manchester 6 SAMGrid Interoperability Long programme of interoperability – LCG 1 st and then OSG Step 1: Co-existence – use shared resources with SAM(Grid) headnode –Widely done for both MC and p17 2004/5 data reprocessing Step 2: SAMGrid interface –SAM does data handling & JIM job submission –Basically forwarding mechanism SAMGrid-LCG –1 st used early 2006 for data fixing –MC & p20 data reprocessing since SAMGrid-OSG –Learnt from SAMGrid-LCG –p20 data reprocessing (spring 07) Replicate as needed

7 EPS-HEP 2007 Manchester 7 SAM plots Over 10 PB (250B evts) last yr Up to 1.6 PB moved per month (x5 increase over 2 yrs ago) SAM TV - monitor SAM and SAM stations Continued success: SAM shifters – often remote http://d0om.fnal.gov/sam/samTV/current/ 1PB / month http://d0db-prd.fnal.gov/sam/diagnostics.html

8 EPS-HEP 2007 Manchester 8 SAMGrid plots - I JIM: > 10 active execution sites “Moving to forwarding nodes” http://samgrid.fnal.gov:8080/ “No longer add red dots” http://samgrid.fnal.gov:8080/list_of_schedulers.php http://samgrid.fnal.gov:8080/list_of_resources.php http://samgrid.fnal.gov:8080/known_scheduler.php?scheduler name=samgrid.fnal.gov

9 EPS-HEP 2007 Manchester 9 SAMGrid plots - II “native” SAMGrid (Europe) SAMGrid-LCG forwarding mechanism (Europe) SAMGrid-OSG forwarding Mechanism (US) “native” SAMGrid (China !)

10 EPS-HEP 2007 Manchester 10 Monte Carlo Massive increase with spread of SAMGrid use & LCG (OSG later) p17/p20 – 550M events since 09/05 Up to 12M events/week –Downtimes due to software transition, p20 reprocessing and site availability 80% in Eu –30% in Fr UKRAC –Full details on web –http://www.hep.ph.ic.ac.uk/~villeneu /d0_uk_rac/d0_uk_rac.htmlhttp://www.hep.ph.ic.ac.uk/~villeneu /d0_uk_rac/d0_uk_rac.html LCG gridwide submission reached scaling problem

11 EPS-HEP 2007 Manchester 11 p14 Reprocessing: Winter 2003/04 –100M events remotely, 25M in UK –Distributed computing rather than Grid p17 Reprocessing: Spring – Autumn 05 –x 10 larger ie 1B events, 250TB, from raw –SAMGrid as default p17 Fixing: Spring 06 –All RunIIa – 1.4B events in 6 weeks –SAMGrid-LCG ‘burnt-in’ Increasing functionality –Primary processing tested, will become default Data – reprocessing & fixing - I Site certification

12 EPS-HEP 2007 Manchester 12 Data – reprocessing & fixing - II p20 (Run IIb) reprocessing –Spring 2007 –Improved reconstruction & detector calibration for RunIIb data (2006 and early 2007) –~ 500M events (75TB) –Reprocessing using native SAMGrid, SAMGrid-OSG (& SAMGrid-LCG) – 1 st large scale use of SAMGrid-OSG –Up to 10M events produced / merged remote daily (initial goal was 3M/day) –Successful reprocessing

13 EPS-HEP 2007 Manchester 13 Integration of a “grid” (OSG) P20 reprocessing Such exercises ‘debug’ a grid –Revealed some teething troubles –Solved quickly thanks to GOC, OSG and LCG partners SAMGrid-LCG experience –Up to 3M/day at full speed “LCG” OSG (initially) A lot of green A lot of red

14 EPS-HEP 2007 Manchester 14 Next steps / issues Complete endgame development –Additional functionality /usage – skimming, primary processing on the grid as default (& at multiple sites?) –Additional resources - Completing the forwarding nodes Full data / MC functionality for both LCG & OSG Scaling issues to access the full LCG &OSG worlds –Data analysis – how gridified do we go? – an open issue Need to be ‘interoperable’ (Fermigrid, LCG sites, OSG, …) Will need development, deployment and operations effort “Steady” state – goal to reach by end of CY 07 (≥ 2yrs running) –Maintenance of existing functionality –Continued experimental requests –Continued evolution as grid standard’s evolve Manpower –Development, integration and operation handled by the dedicated few

15 EPS-HEP 2007 Manchester 15 Summary / plans Tevatron & DØ performing very well –A lot of data & physics, with more to come SAM & SAMGrid critical to DØ –Grid computing model as important as any sub-detector Without LCG and OSG partners would not have worked either –Largest grid ‘data challenges’ in HEP (I believe) –Learnt a lot about the technology, and especially how it scales –Learnt a lot about organisation / operation of such projects –Some of these can be abstracted and of benefit to others… –Accounting model evolved in parallel (~$4M/yr) Baseline: Ensure (scaling for) production tasks –Further improving operational robustness / efficiency underway In parallel open question of data analysis – will need to go part way

16 EPS-HEP 2007 Manchester 16 Back-ups

17 EPS-HEP 2007 Manchester 17 SAMGrid Architecture

18 EPS-HEP 2007 Manchester 18 Interoperability architecture Network Boundaries Forwarding Node LCG/OSG Cluster VO-Service (SAM) Job Flow Offers Service


Download ppt "DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics."

Similar presentations


Ads by Google