DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics.

Slides:



Advertisements
Similar presentations
GridPP July 2003Stefan StonjekSlide 1 SAM middleware components Stefan Stonjek University of Oxford 7 th GridPP Meeting 02 nd July 2003 Oxford.
Advertisements

1Oxford eSc – 1 st July03 GridPP2: Application Requirement & Developments Nick Brook University of Bristol ALICE Hardware Projections Applications Programme.
Andrew McNab - Manchester HEP - 22 April 2002 EU DataGrid Testbed EU DataGrid Software releases Testbed 1 Job Lifecycle Authorisation at your site More.
Amber Boehnlein, FNAL D0 Computing Model and Plans Amber Boehnlein D0 Financial Committee November 18, 2002.
JIM Deployment for the CDF Experiment M. Burgon-Lyon 1, A. Baranowski 2, V. Bartsch 3,S. Belforte 4, G. Garzoglio 2, R. Herber 2, R. Illingworth 2, R.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Stuart Wakefield Imperial College London1 How (and why) HEP uses the Grid.
F Run II Experiments and the Grid Amber Boehnlein Fermilab September 16, 2005.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
CDF Grid Status Stefan Stonjek 05-Jul th GridPP meeting / Durham.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
CHEP'07 September D0 data reprocessing on OSG Authors Andrew Baranovski (Fermilab) for B. Abbot, M. Diesburg, G. Garzoglio, T. Kurca, P. Mhashilkar.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
SAMGrid as a Stakeholder of FermiGrid Valeria Bartsch Computing Division Fermilab.
SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
GridPP18 Glasgow Mar 07 DØ – SAMGrid Where’ve we come from, and where are we going? Evolution of a ‘long’ established plan Gavin Davies Imperial College.
DØ Computing Model & Monte Carlo & Data Reprocessing Gavin Davies Imperial College London DOSAR Workshop, Sao Paulo, September 2005.
International Workshop on HEP Data Grid Nov 9, 2002, KNU Data Storage, Network, Handling, and Clustering in CDF Korea group Intae Yu*, Junghyun Kim, Ilsung.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
22 nd September 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
16 September GridPP 5 th Collaboration Meeting D0&CDF SAM and The Grid Act I: Grid, Sam and Run II Rick St. Denis – Glasgow University Act II: Sam4CDF.
1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5 November 2001.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
The Experiments – progress and status Roger Barlow GridPP7 Oxford 2 nd July 2003.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Data reprocessing for DZero on the SAM-Grid Gabriele Garzoglio for the SAM-Grid Team Fermilab, Computing Division.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Run II Review Closeout 15 Sept., 2005 FNAL. Thanks! …all the hard work from the reviewees –And all the speakers …hospitality of our hosts Good progress.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
May Donatella Lucchesi 1 CDF Status of Computing Donatella Lucchesi INFN and University of Padova.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Frank Wuerthwein, UCSD Update on D0 and CDF computing models and experience Frank Wuerthwein UCSD For CDF and DO collaborations October 2 nd, 2003 Many.
The GridPP DIRAC project DIRAC for non-LHC communities.
Run II Review Closeout 15 Sept., 2004 FNAL. Thanks! …all the hard work from the reviewees –And all the speakers …hospitality of our hosts Good progress.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
The GridPP DIRAC project DIRAC for non-LHC communities.
Victoria A. White Head, Computing Division, Fermilab Fermilab Grid Computing – CDF, D0 and more..
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
July 26, 2007Parag Mhashilkar, Fermilab1 DZero On OSG: Site And Application Validation Parag Mhashilkar, Fermi National Accelerator Laboratory.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
CDF SAM Deployment Status Doug Benjamin Duke University (for the CDF Data Handling Group)
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
DØ Computing Model and Operational Status Gavin Davies Imperial College London Run II Computing Review, September 2005.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
5/12/06T.Kurca - D0 Meeting FNAL1 p20 Reprocessing Introduction Computing Resources Architecture Operational Model Technical Issues Operational Issues.
Bob Jones EGEE Technical Director
Overview of the Belle II computing
Monte Carlo Production and Reprocessing at DZero
Production Resources & Issues p20.09 MC-data Regeneration
Readiness of ATLAS Computing - A personal view
DØ MC and Data Processing on the Grid
Presentation transcript:

DØ Grid Computing Gavin Davies, Frédéric Villeneuve-Séguier Imperial College London On behalf of the DØ Collaboration and the SAMGrid team The 2007 Europhysics Conference on High Energy Physics Manchester, England July 2007

EPS-HEP 2007 Manchester 2Outline Introduction  DØ Computing Model  SAMGrid Components  Interoperability Activities  Monte Carlo Generation  Data Processing Conclusion  Next Steps / Issues  Summary

EPS-HEP 2007 Manchester 3Introduction Tevatron –Running experiments (Less data than LHC, but still PBs/experiment) –Growing - great physics & better still to come.. Have >3fb -1 of data and expect up 5fb -1 more by end 2009 Computing model: Datagrid (SAM) for all data handling & originally distributed computing with evolution to automated use of common tools/solutions on the grid (SAMGrid) for all tasks –Started with production tasks –Started with production tasks eg MC generation, data processing Greatest need & easiest to ‘gridify’ - ahead of wave & a running expt. –Base on SAMGrid, but have program of interoperability from v. early on Initially LCG and then OSG –Increased automation, user analysis considered last SAM gives remote data analysis

EPS-HEP 2007 Manchester 4 Computing Model Remote Analysis Systems Data Handling Services Central Analysis Systems Remote Farms Central Farms User Desktops Central Storage Raw Data RECO Data RECO MC User Data

EPS-HEP 2007 Manchester 5 Components - Terminology SAM (Sequential Access via Metadata) –Well developed metadata & distributed data replication system –Originally developed by DØ & FNAL-CD, now used by CDF & MINOS JIM (Job Information and Monitoring ) –handles job submission and monitoring (all but data handling) –SAM + JIM → SAMGrid – computational grid Runjob –handles job workflow management Automation –d0repro tools, automc (UK Role – Project leadership, key technology and operations)

EPS-HEP 2007 Manchester 6 SAMGrid Interoperability Long programme of interoperability – LCG 1 st and then OSG Step 1: Co-existence – use shared resources with SAM(Grid) headnode –Widely done for both MC and p /5 data reprocessing Step 2: SAMGrid interface –SAM does data handling & JIM job submission –Basically forwarding mechanism SAMGrid-LCG –1 st used early 2006 for data fixing –MC & p20 data reprocessing since SAMGrid-OSG –Learnt from SAMGrid-LCG –p20 data reprocessing (spring 07) Replicate as needed

EPS-HEP 2007 Manchester 7 SAM plots Over 10 PB (250B evts) last yr Up to 1.6 PB moved per month (x5 increase over 2 yrs ago) SAM TV - monitor SAM and SAM stations Continued success: SAM shifters – often remote 1PB / month

EPS-HEP 2007 Manchester 8 SAMGrid plots - I JIM: > 10 active execution sites “Moving to forwarding nodes” “No longer add red dots” name=samgrid.fnal.gov

EPS-HEP 2007 Manchester 9 SAMGrid plots - II “native” SAMGrid (Europe) SAMGrid-LCG forwarding mechanism (Europe) SAMGrid-OSG forwarding Mechanism (US) “native” SAMGrid (China !)

EPS-HEP 2007 Manchester 10 Monte Carlo Massive increase with spread of SAMGrid use & LCG (OSG later) p17/p20 – 550M events since 09/05 Up to 12M events/week –Downtimes due to software transition, p20 reprocessing and site availability 80% in Eu –30% in Fr UKRAC –Full details on web – /d0_uk_rac/d0_uk_rac.htmlhttp:// /d0_uk_rac/d0_uk_rac.html LCG gridwide submission reached scaling problem

EPS-HEP 2007 Manchester 11 p14 Reprocessing: Winter 2003/04 –100M events remotely, 25M in UK –Distributed computing rather than Grid p17 Reprocessing: Spring – Autumn 05 –x 10 larger ie 1B events, 250TB, from raw –SAMGrid as default p17 Fixing: Spring 06 –All RunIIa – 1.4B events in 6 weeks –SAMGrid-LCG ‘burnt-in’ Increasing functionality –Primary processing tested, will become default Data – reprocessing & fixing - I Site certification

EPS-HEP 2007 Manchester 12 Data – reprocessing & fixing - II p20 (Run IIb) reprocessing –Spring 2007 –Improved reconstruction & detector calibration for RunIIb data (2006 and early 2007) –~ 500M events (75TB) –Reprocessing using native SAMGrid, SAMGrid-OSG (& SAMGrid-LCG) – 1 st large scale use of SAMGrid-OSG –Up to 10M events produced / merged remote daily (initial goal was 3M/day) –Successful reprocessing

EPS-HEP 2007 Manchester 13 Integration of a “grid” (OSG) P20 reprocessing Such exercises ‘debug’ a grid –Revealed some teething troubles –Solved quickly thanks to GOC, OSG and LCG partners SAMGrid-LCG experience –Up to 3M/day at full speed “LCG” OSG (initially) A lot of green A lot of red

EPS-HEP 2007 Manchester 14 Next steps / issues Complete endgame development –Additional functionality /usage – skimming, primary processing on the grid as default (& at multiple sites?) –Additional resources - Completing the forwarding nodes Full data / MC functionality for both LCG & OSG Scaling issues to access the full LCG &OSG worlds –Data analysis – how gridified do we go? – an open issue Need to be ‘interoperable’ (Fermigrid, LCG sites, OSG, …) Will need development, deployment and operations effort “Steady” state – goal to reach by end of CY 07 (≥ 2yrs running) –Maintenance of existing functionality –Continued experimental requests –Continued evolution as grid standard’s evolve Manpower –Development, integration and operation handled by the dedicated few

EPS-HEP 2007 Manchester 15 Summary / plans Tevatron & DØ performing very well –A lot of data & physics, with more to come SAM & SAMGrid critical to DØ –Grid computing model as important as any sub-detector Without LCG and OSG partners would not have worked either –Largest grid ‘data challenges’ in HEP (I believe) –Learnt a lot about the technology, and especially how it scales –Learnt a lot about organisation / operation of such projects –Some of these can be abstracted and of benefit to others… –Accounting model evolved in parallel (~$4M/yr) Baseline: Ensure (scaling for) production tasks –Further improving operational robustness / efficiency underway In parallel open question of data analysis – will need to go part way

EPS-HEP 2007 Manchester 16 Back-ups

EPS-HEP 2007 Manchester 17 SAMGrid Architecture

EPS-HEP 2007 Manchester 18 Interoperability architecture Network Boundaries Forwarding Node LCG/OSG Cluster VO-Service (SAM) Job Flow Offers Service