Presentation is loading. Please wait.

Presentation is loading. Please wait.

HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab.

Similar presentations


Presentation on theme: "HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab."— Presentation transcript:

1 HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

2 2 Overview l High altitude survey of contributions –group, application, testbed, services/tools l Discuss common and recurring issues –grid building, services development, use l Concluding thoughts –Acknowledgement to all the speakers who gave fine presentations, and my apologies in advance for not providing this *very limited* sampling

3 3 Testbeds, applications, and development of tools and services l Testbeds: –Alien grids –BaBar Grid –CrossGrid –DataTAG –EDG Testbed(s) –Grid Canada –IGT Testbed (US CMS) –Korean DataGrid –NorduGrid(s) –SAMGrid –US ATLAS Testbed –WorldGrid l Evaluations –EDG testbed evaluations and experience in multiple exps. –Testbed management experience l Applications –ALICE production –ATLAS production –BaBar analysis, file replication –CDF/D0 analysis –CMS production –LHCb production –Medical applications in Italy –Phenix –Sloan sky survey l Tools development –Use cases (HEPCAL) –Proof/Grid analysis –LCG Pool and grid catalogs –SRM, Magda –Clarens, Ganga, Genius, Grappa, JAS

4 4 EDG TB History VersionDate 1.1.227Feb2002 1.1.302Apr2002 1.1.404Apr2002 1.2.a111Apr2002 1.2.b131May2002 1.2.012Aug2002 1.2.104Sep2002 1.2.209Sep2002 1.2.325Oct2002 1.3.008Nov2002 1.3.119Nov2002 1.3.220Nov2002 1.3.321Nov2002 1.3.425Nov2002 1.4.006Dec2002 1.4.107Jan2003 1.4.209Jan2003 1.4.314Jan2003 1.4.418Jan2003 1.4.526Feb2003 1.4.64Mar2003 1.4.78Mar2003 Successes Matchmaking/Job Mgt. Basic Data Mgt. Known Problems: High Rate Submissions Long FTP Transfers Known Problems: GASS Cache Coherency Race Conditions in Gatekeeper Unstable MDS Intense Use by Applications! Limitations: Resource Exhaustion Size of Logical Collections Successes Improved MDS Stability FTP Transfers OK Known Problems: Interactions with RC ATLAS phase 1 start CMS stress test Nov.30 - Dec. 20 CMS, ATLAS, LHCB, ALICE Emanuele Leonardi

5 5 Resumé of experiment DC use of EDG-see experiment talks elsewhere at CHEP l ATLAS were first, in August 2002. The aim was to repeat part of the Data Challenge. Found two serious problems which were fixed in 1.3 l CMS stress test production Nov-Dec 2002 – found more problems in area of job submission and RC handling – led to 1.4.x l ALICE started on Mar 4: production of 5,000 central Pb-Pb events - 9 TB; 40,000 output files; 120k CPU hours –Progressing with similar efficiency levels to CMS –About 5% done by Mar 14 –“Pull” architecture l LHCb started mid Feb –~70K events for physics –Like ALICE, using a pull architecture l BaBar/D0 –Have so far done small scale tests –Larger scale planned with EDG 2 No. of evts – 250k Time – 21 days Stephen Burke

6 6 CMS Data Challenge 2002 on Grid Two “official” CMS productions on the grid in 2002 –CMS-EDG Stress Test on EDG testbed + CMS sites >~260K events CMKIN and CMSIM steps >Top-down approach: more functionality but less robust, large manpower needed –USCMS IGT Production in the US >1M events Ntuple-only (full chain in single job) >500K up to CMSIM (two steps in single job) >Bottom-up approach: less functionality but more stable, little manpower needed –See talk by P.Capiluppi C. Grande

7 7 CMS production components interfaced to EDG Four submitting UIs: Bologna/CNAF (IT), Ecole Polytechnique (FR), Imperial College (UK), Padova/INFN (IT) Several Resource Brokers (WMS), CMS-dedicated and shared with other Applications: one RB for each CMS UI + “backup” Replica Catalog at CNAF, MDS (and II) at CERN and CNAF, VO server at NIKHEF SE CE CMS software BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS Replica Manager CE CMS software CE CMS software CE WN SE CE CMS software SE Read Write CMSEDG CMS Prod Tools on UI

8 8 CMS/EDG Production ~260K events produced ~7 sec/event average ~2.5 sec/event peak (12- 14 Dec) # Events 30 Nov 20 Dec CMS Week Upgrade of MW Hit some limit of implement. P. Capiluppi talk

9 9 US-CMS IGT Production 25 Oct 28 Dec > 1 M events 4.7 sec/event average 2.5 sec/event peak (14-20 Dec 2002) Sustained efficiency: about 44% P. Capiluppi talk

10 10 Grid in ATLAS DC1 * US-ATLAS EDG Testbed Prod NorduGrid part of Phase 1 reproduce part of full phase 1 & 2 production phase 1 data production Full Phase 2 several tests production [ * See other ATLAS talks for more details] G.Poulard

11 11 ATLAS DC1 Phase 1 : July-August 02 3200 CPU‘s 110 kSI95 71000 CPU days 5*10* 7 events generated 1*10* 7 events simulated 3*10* 7 single particles 30 Tbytes 35 000 files 39 Institutes in 18 Countries 1.Australia 2.Austria 3.Canada 4.CERN 5.Czech Republic 6.France 7.Germany 8.Israel 9.Italy 10.Japan 11.Nordic 12.Russia 13.Spain 14.Taiwan 15.UK 16.USA grid tools used at 11 sites G.Poulard

12 12 Meta Systems l MCRunJob approach by CMS production team l Framework for dealing with multiple grid resources and testbeds (EDG, IGT) G.Graham

13 13 Hybrid production model MCRunJob Site Manager starts an assignment RefDB Phys.Group asks for an official dataset User starts a private production Production Manager defines assignments DAG job JDL shell scripts DAGMan (MOP) Local Batch Manager EDG Scheduler Computer farm LCG-1 testbe d User’s Site Resources Chimera VDL Virtual Data Catalogue Planner C. Grande

14 14 Interoperability: glueCE UI SE RB VDT Client VDT Server RC IS

15 15 Integrated Grid Systems l Two examples of integrating advanced production and analysis to multiple grids SamGrid AliEn

16 16 SamGrid Map CDF – Kyungpook National University, Korea – Rutgers State University, New Jersey, US – Rutherford Appelton Laboratory, UK – Texas Tech, Texas, US – University of Toronto, Canada DØ – Imperial College, London, UK – Michigan State University, Michigan, US – University of Michigan, Michigan, US – University of Texas at Arlington, Texas, US

17 17 Physics with SAM-Grid Standard CDF analysis job submitted via SAM-Grid and executed somewhere z0(µ 1 )z0(µ 2 ) J/ψ => µ + µ - S. Stonjek

18 18 VO RC RB CE SE WN CE SE WN CE SE WN CE SE WN CE SE WN The BaBar Grid as of March 2003 D. Boutigny special challenges faced by a running experiment with heterogeneous data requirements, root, Objy

19 19 Grid Applications, Interfaces, Portals l Clarens l Ganga l Genius l Grappa l JAS-Grid l Magda l Proof-Grid l and higher level services –Storage Resource Manager (SRM) –Magda data management –POOL-Grid interface

20 20 PROOF and Data Grids l Many services are a good fit –Authentication –File Catalog, replication services –Resource brokers –Monitoring  Use abstract interfaces l Phased integration –Static configuration –Use of one or multiple Grid services –Driven by Grid infrastructure Fons Rademakers

21 21 Different PROOF–GRID Scenarios l Static stand-alone –Current version, static config file, pre-installed l Dynamic, PROOF in control –Using grid file catalog and resource broker, pre- installed l Dynamic, ALiEn in control –Idem, but installed and started on the fly by AliEn l Dynamic, Condor in control –Idem, but allowing in addition slave migration in a Condor pool Fons Rademakers

22 22 RB/JSS II SE input data location Replica Catalog TOP GIIS... CE Executable = "/usr/bin/env"; Arguments = "zsh prod.dc1_wrc 00001"; VirtualOrganization="datatag"; Requirements=Member(other.GlueHostApplicationSoftware RunTimeEnvironment,"ATLAS-3.2.1" ); Rank = other.GlueCEStateFreeCPUs; InputSandbox={"prod.dc1_wrc",“rc.conf","plot.kumac"}; OutputSandbox={"dc1.002000.test.00001.hlt.pythia_jet_17.log","dc1.002000.test.00001.hlt.pythia_jet_17. his","dc1.002000.test.00001.hlt.pythia_jet_17.err","plot.kumac"}; ReplicaCatalog="ldap://dell04.cnaf.infn.it:9211/lc=ATLAS,rc=GLUE,dc=dell04,dc=cnaf,dc=infn,dc=it"; InputData = {"LF:dc1.002000.evgen.0001.hlt.pythia_jet_17.root"}; StdOutput = " dc1.002000.test.00001.hlt.pythia_jet_17.log"; StdError = "dc1.002000.test.00001.hlt.pythia_jet_17.err"; DataAccessProtocol = "file"; JDL GLUE-aware files WN ATLAS sw data registration GLUE-Schema based Information System GLUE Testbed JDL Job GENIUS UI see WorldGrid Poster this conf.

23 23 Ganga: ATLAS and LHCb Server Bookkeeping DB Production DB EDG UI PYTHON SW BUS XML RPC server XML RPC module GANGA Core Module OS Module Athena\ GAUDI GaudiPython PythonROOT PYTHON SW BUS GUI Job Configuration DB Remote user (client) Local Job DB LAN/WAN GRID LRMS C. Tull

24 24 Ganga EDG Grid Interface Job class JobsRegistry class Job Handler class Data management service Job submission Job monitoring Security service dg-job-list-match dg-job-submit dg-job-cancel dg-job-list-match dg-job-submit dg-job-cancel grid-proxy-init MyProxy grid-proxy-init MyProxy dg-job-status dg-job-get-logging-info GRM/PROVE dg-job-status dg-job-get-logging-info GRM/PROVE edg-replica-manager dg-job-get-output globus-url-copy GDMP edg-replica-manager dg-job-get-output globus-url-copy GDMP EDG UI C. Tull

25 25 Comment: Building Grid Applications l P is a dynamic configuration script l Turns abstract bundle into a concrete one l Challenge: –building integrated systems –distributed developers and support Grid Component Library CTL ATL GTL abstract bundles templates U2 P1a U1 concrete bundles P1c attributes :user info :grid info P

26 26 In summary…Common issues l Installation and configuration of MW l Application packaging, run time environments l Authentication mechanisms l Policies differing among sites l Private networks, firewalls, ports l Fragility of services, job submission chain l Inaccuracies, poor performance of information services l Monitoring and several levels l Debugging, site cleanup

27 27 Conclusions l Progress in the past 18 months has been dramatic! –lots of experience gained in building integrated grid systems –demonstrated functionality with large scale production –more attention being given to analysis l Many pitfalls exposed, areas for improvement identified –some of these are core middleware  feedback given to technology providers –Policy issues remain – using shared resources, authorization –operation of production services –user interactions, support models to be developed l Many thanks to the contributors to this session


Download ppt "HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab."

Similar presentations


Ads by Google