Presentation is loading. Please wait.

Presentation is loading. Please wait.

28 April 2003Lee Lueking, PPDG Review1 BaBar and DØ Experiment Reports DOE Review of PPDG January 28-29, 2003 Lee Lueking Fermilab Computing Division D0.

Similar presentations


Presentation on theme: "28 April 2003Lee Lueking, PPDG Review1 BaBar and DØ Experiment Reports DOE Review of PPDG January 28-29, 2003 Lee Lueking Fermilab Computing Division D0."— Presentation transcript:

1 28 April 2003Lee Lueking, PPDG Review1 BaBar and DØ Experiment Reports DOE Review of PPDG January 28-29, 2003 Lee Lueking Fermilab Computing Division D0 liaison to PPDG

2 28 April 2003Lee Lueking, PPDG Review2 BaBar Introduction DØ BaBar's PPDG effort concentrating on: –Data Distribution on the Grid (SRB, Bdbserver++). –Job submission on the Grid (EDG,LCG). People involved: –Tim Adye (RAL) –Andy Hanushevsky (SLAC) –Adil Hasan (SLAC) –Wilko Kroeger (SLAC). Interactions with other Grid efforts that are part of BaBar: –GridPP (UK), EDG (Europe through Dominique Boutigny), GridKA, Italian Grid groups etc. BaBar Grid applications are being designed to be data-format neutral –BaBar's new computing model should have little impact on the apps. D Ø’ s PPDG effort concentrating on: –Data Distribution on the Grid (SAM). –Job submission on the Grid (JIM w/Condor-G and Globus). People involved: –Igor Terekhov (FNAL; JIM Team Lead) –Gabriele Garzoglio (FNAL) –Andrew Baranovski (FNAL) –Parag Mhashilkar & Vijay Murthi (via Contr. w/ UTA CSE) –Lee Lueking (FNAL; D0 Liaison to PPDG) Interactions with other Grid efforts that are part of D0: –GridPP (UK), GridKA (DE), NIKHEF (NL), CCIN2P3 (FR) Very closely working with the Condor team to achieve –Grid Job & Resource Matchmaking service –Other robustness and usability features

3 28 April 2003Lee Lueking, PPDG Review3 Overview of BaBar and DØ Data Handling Regional Center Analysis site DØ Integrated Files Consumed Mar’02 to Mar‘03 DØ Integrated Data Consumed Mar’02 to Mar‘03 4.0 M Files 1.2 PB Mar2002 Mar2003 Both experiments have extensive distributed computing and data handling systems Significant amounts of data are processed at remote sites in the US and Europe BaBar Database Growth (TB) Jan'02 to Dec'02 BaBar Analysis Jobs (SLAC) Apr'02 to Mar'03 730 TB 140k Jobs DØ SAM Deployment BaBar Deployment Tier A Centers Monte Carlo

4 28 April 2003Lee Lueking, PPDG Review4 BaBar Bulk Data Distribution – SRB Storage Resource Broker (SRB) from SDSC being used to test out data distribution from Tier A to Tier A with view to production this summer. So far have had 2 successful demos at Super Computing 2001 (SLAC- >SLAC), 2002 (SLAC->ccin2p3). Have been testing SRB V2 (released Feb 2003), new features Bulk registering in RDBMS, parallel stream file replication. Busy incorperating newly designed BaBar metadata tables to SRB's RDBMS tables. Looking to improve file replication performance (playing with streams, etc).

5 28 April 2003Lee Lueking, PPDG Review5 BaBar User-driven data distribution: BdbServer++ Attempts to address use-case: user wants to copy a collection of sparse events with little space overhead (mainly Tier A to Tier C). BdbServer++ essentially a set of scripts that: –Submit a job to the Grid to make a deep-copy of the sparse collection (ie copy objects for events of interest only). –Then copy the files back to user's institution through Grid (can use globus-url-copy). –Poster at CHEP2003 Currently have tested Deep-copy through the grid using EDG and pure Globus. Just completed test of extracting data using globus-url-copy (pure Globus request). To do: incorperate with BaBar bookeeping. Robustness, reliability tests, production-level scripts for submission, copying.

6 28 April 2003Lee Lueking, PPDG Review6 BaBar Job Submission on the Grid Many production-like activities could take advantage of using compute resources at more than one site. –Analysis Production: ccin2p3 (France), UK, SLAC – using EDG installations. –Simulation Production: Ferrara (Italy) Grid Group, Ohio – using EDG and VDT installations. –Also very useful for data distribution (BdbServer++), ccin2p3 (France), SLAC. Proposed BaBar Grid Architecture

7 28 April 2003Lee Lueking, PPDG Review7 BaBar Job Submission on the Grid There was a CHEP 2003 talk and Poster, a grid demo set up in UK (run BaBar jobs on UK grid) and have managed to run Simulation Production and data distribution tests on Grid. Plan: test new EDG2/LCG installations, increase users as releases stabilize. BbgUtils.pl – perl script to allow easier client-side installation of Globus + CA's (currently works for Sun, Linux). –Script copies all tar files and signing-policies etc necessary for client installation for that expt. –Can be readily extended to include SRB client-side installation, EDG/LCG client side installation, etc.

8 28 April 2003Lee Lueking, PPDG Review8 DØ Objectives of SAMGrid Bring standard grid technologies (including Globus and Condor) to the Run II experiments. Enable globally distributed computing for DØ and CDF. JIM (Job and Information Management) complements SAM by adding job management and monitoring to data handling. Together, JIM + SAM = SAMGrid

9 28 April 2003Lee Lueking, PPDG Review9 JOB Computing Element Submission Client User Interface Queuing System JIM Job Management User Interface Broker Match Making Service Information Collector Execution Site #1 Submission Client Match Making Service Computing Element Grid Sensors Execution Site #n Queuing System Grid Sensors Storage Element Computing Element Storage Element Data Handling System Storage Element Informatio n Collector Grid Sensor s Computin g Element Data Handling System

10 28 April 2003Lee Lueking, PPDG Review10 A site can join SAM-Grid with combos of services: –Monitoring, and/or –Execution, and/or –Submission May 2003: Expect 5 initial execution sites for SAMGrid deployment, and 20 submission sites. –GrkdKa (Karlsruhe) – Analysis site –Imperial College and Lancaster – MC sites –U. Michigan (NPACI) – Reconstruction center. –FNAL - CLueD0 as a submission site. Summer 2003: Continue to add execution and submission sites. Second round of execution site deployments include Lyon (ccin2p3), Manchester, MSU, Princeton, UTA, FNAL – CAB system. Hope to grow to dozens execution and hundreds of submission sites over next year(s). Use grid middleware for job submission within a site too! –Administrators will have general ways of managing resources. –Users will use common tools for submitting and monitoring jobs everywhere. DØ JIM Deployment

11 28 April 2003Lee Lueking, PPDG Review11 What’s Next for SAMGrid? After JIM version 1 Improve scheduling jobs and decision making. Improved monitoring, more comprehensive, easier to navigate. Execution of structured jobs Simplifying packaging and deployment. Extend the configuration and advertising features of the uniform framework built for JIM that employs XML. CDF is adopting SAM and SAMGrid for their Data Handling and Job Submission. CDF also has asked to join PPDG. Interoperability, interoperability, interoperability –Working with EDG and LCG to move in common directions –Moving to Web services, Globus V3, and all the good things OGSA will provide. In particular, interoperability by expressing SAM and JIM as a collection of services, and mixing and matching with other Grids

12 28 April 2003Lee Lueking, PPDG Review12 Challenges Meeting the challenges of real data handling and job submission BaBar and DØ have confronted real-life issues, including… Troubleshooting is an important and time consuming activity in distributed computing environments, and many tools are needed to do this effectively. Operating these distributed systems on a 24/7 basis involves coordination, training, and worldwide effort. Standard middleware is still hard to use, and requires significant integration, testing, and debugging. –File replication integrity –Preemptive distributed caching –Private networks –Routing data in a worldwide system. –Reliable network file transfers, timeouts, and retries –Simplifying complex installation procedures –Username clashing issues, moving to GSI and Grid Certificates –Interoperability with many MSS. –Security issues, firewalls, site policies –Robust job submission on the grid

13 28 April 2003Lee Lueking, PPDG Review13

14 28 April 2003Lee Lueking, PPDG Review14 PPDG Benefits to BaBar and DØ PPDG has provided very useful collaboration with, and feedback to, other Grid and Computer Science Groups. Development of tools and middleware that should be of general interest to the Grid community, e.g. –BbgUtils.pl –Condor-G enhancements Deploying and testing grid middleware under battlefield conditions of operational experiments hardens the software and helps CS learn what is needed. The CS groups enable the experiments to examine problems in new, innovative ways, and provide important new technologies for solving them.

15 28 April 2003Lee Lueking, PPDG Review15 The End


Download ppt "28 April 2003Lee Lueking, PPDG Review1 BaBar and DØ Experiment Reports DOE Review of PPDG January 28-29, 2003 Lee Lueking Fermilab Computing Division D0."

Similar presentations


Ads by Google