HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab.

Slides:



Advertisements
Similar presentations
Claudio Grandi INFN Bologna DataTAG WP4 meeting, Bologna 14 jan 2003 CMS Grid Integration Claudio Grandi (INFN – Bologna)
Advertisements

DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Physics with SAM-Grid Stefan Stonjek University of Oxford 6 th GridPP Meeting 30 th January 2003 Coseners House.
ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
Job Submission The European DataGrid Project Team
CMS-ARDA Workshop 15/09/2003 CMS/LCG-0 architecture Many authors…
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
10 April 2003Deploy Grid in Israel Universities1 Deploy Grid testbed in Israel universities Lorne Levinson David Front Weizmann Institute.
Experience with ATLAS Data Challenge Production on the U.S. Grid Testbed Kaushik De University of Texas at Arlington CHEP03 March 27, 2003.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
1 Use of the European Data Grid software in the framework of the BaBar distributed computing model T. Adye (1), R. Barlow (2), B. Bense (3), D. Boutigny.
Ganga: a User-Grid Interface for ATLAS and LHCb Motivation and background Objectives and design Low-level tools High-level tools and GUI Future plans and.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
The EDG Testbed Deployment Details The European DataGrid Project
DataGrid is a project funded by the European Commission under contract IST Status and Prospective of EU Data Grid Project Alessandra Fanfani.
EU 2nd Year Review – Feb – Title – n° 1 WP8: Progress and testbed evaluation F Harris (Oxford/CERN) (WP8 coordinator )
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
K.Harrison CERN, 21st November 2002 GANGA: GAUDI/ATHENA AND GRID ALLIANCE - Background and scope - Project organisation - Technology survey - Design -
HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
Grid Job and Information Management (JIM) for D0 and CDF Gabriele Garzoglio for the JIM Team.
K.Harrison CERN, 6th March 2003 GANGA: GAUDI/ATHENA AND GRID ALLIANCE - Aims and design - Progress with low-level software - Progress with Graphical User.
Computational grids and grids projects DSS,
SLICE Simulation for LHCb and Integrated Control Environment Gennady Kuznetsov & Glenn Patrick (RAL) Cosener’s House Workshop 23 rd May 2002.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
CHEP 2003Stefan Stonjek1 Physics with SAM-Grid Stefan Stonjek University of Oxford CHEP th March 2003 San Diego.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
WP8 Status – Stephen Burke – 30th January 2003 WP8 Status Stephen Burke (RAL) (with thanks to Frank Harris)
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
29 May 2002Joint EDG/WP8-EDT/WP4 MeetingClaudio Grandi INFN Bologna LHC Experiments Grid Integration Plans C.Grandi INFN - Bologna.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Production Tools in ATLAS RWL Jones GridPP EB 24 th June 2003.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
ARDA Prototypes Andrew Maier CERN. ARDA WorkshopAndrew Maier, CERN2 Overview ARDA in a nutshell –Experiments –Middleware Experiment prototypes (basic.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
29 Sept 2004 CHEP04 A. Fanfani INFN Bologna 1 A. Fanfani Dept. of Physics and INFN, Bologna on behalf of the CMS Collaboration Distributed Computing Grid.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
Interfacing Gaudi to the Grid (GANGA) Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K.
Planning Session. ATLAS(-CMS) End-to-End Demo Kaushik De is the Demo Czar Need to put team together Atlfast production jobs –Atlfast may be unstable over.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
The EDG Testbed Deployment Details
U.S. ATLAS Grid Production Experience
Moving the LHCb Monte Carlo production system to the GRID
PROOF – Parallel ROOT Facility
ALICE Physics Data Challenge 3
Scalability Tests With CMS, Boss and R-GMA
F Harris (OXFORD/CERN) Presented by I Augustin (CERN)
CMS report from FNAL demo week Marco Verlato (INFN-Padova)
LCG middleware and LHC experiments ARDA project
Status and plans for bookkeeping system and production tools
Presentation transcript:

HENP Grid Testbeds, Applications and Demonstrations Rob Gardner University of Chicago CHEP03 March 29, 2003 Ruth Pordes Fermilab

2 Overview l High altitude survey of contributions –group, application, testbed, services/tools l Discuss common and recurring issues –grid building, services development, use l Concluding thoughts –Acknowledgement to all the speakers who gave fine presentations, and my apologies in advance for not providing this *very limited* sampling

3 Testbeds, applications, and development of tools and services l Testbeds: –Alien grids –BaBar Grid –CrossGrid –DataTAG –EDG Testbed(s) –Grid Canada –IGT Testbed (US CMS) –Korean DataGrid –NorduGrid(s) –SAMGrid –US ATLAS Testbed –WorldGrid l Evaluations –EDG testbed evaluations and experience in multiple exps. –Testbed management experience l Applications –ALICE production –ATLAS production –BaBar analysis, file replication –CDF/D0 analysis –CMS production –LHCb production –Medical applications in Italy –Phenix –Sloan sky survey l Tools development –Use cases (HEPCAL) –Proof/Grid analysis –LCG Pool and grid catalogs –SRM, Magda –Clarens, Ganga, Genius, Grappa, JAS

4 EDG TB History VersionDate Feb Apr Apr a111Apr b131May Aug Sep Sep Oct Nov Nov Nov Nov Nov Dec Jan Jan Jan Jan Feb Mar Mar2003 Successes Matchmaking/Job Mgt. Basic Data Mgt. Known Problems: High Rate Submissions Long FTP Transfers Known Problems: GASS Cache Coherency Race Conditions in Gatekeeper Unstable MDS Intense Use by Applications! Limitations: Resource Exhaustion Size of Logical Collections Successes Improved MDS Stability FTP Transfers OK Known Problems: Interactions with RC ATLAS phase 1 start CMS stress test Nov.30 - Dec. 20 CMS, ATLAS, LHCB, ALICE Emanuele Leonardi

5 Resumé of experiment DC use of EDG-see experiment talks elsewhere at CHEP l ATLAS were first, in August The aim was to repeat part of the Data Challenge. Found two serious problems which were fixed in 1.3 l CMS stress test production Nov-Dec 2002 – found more problems in area of job submission and RC handling – led to 1.4.x l ALICE started on Mar 4: production of 5,000 central Pb-Pb events - 9 TB; 40,000 output files; 120k CPU hours –Progressing with similar efficiency levels to CMS –About 5% done by Mar 14 –“Pull” architecture l LHCb started mid Feb –~70K events for physics –Like ALICE, using a pull architecture l BaBar/D0 –Have so far done small scale tests –Larger scale planned with EDG 2 No. of evts – 250k Time – 21 days Stephen Burke

6 CMS Data Challenge 2002 on Grid Two “official” CMS productions on the grid in 2002 –CMS-EDG Stress Test on EDG testbed + CMS sites >~260K events CMKIN and CMSIM steps >Top-down approach: more functionality but less robust, large manpower needed –USCMS IGT Production in the US >1M events Ntuple-only (full chain in single job) >500K up to CMSIM (two steps in single job) >Bottom-up approach: less functionality but more stable, little manpower needed –See talk by P.Capiluppi C. Grande

7 CMS production components interfaced to EDG Four submitting UIs: Bologna/CNAF (IT), Ecole Polytechnique (FR), Imperial College (UK), Padova/INFN (IT) Several Resource Brokers (WMS), CMS-dedicated and shared with other Applications: one RB for each CMS UI + “backup” Replica Catalog at CNAF, MDS (and II) at CERN and CNAF, VO server at NIKHEF SE CE CMS software BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS Replica Manager CE CMS software CE CMS software CE WN SE CE CMS software SE Read Write CMSEDG CMS Prod Tools on UI

8 CMS/EDG Production ~260K events produced ~7 sec/event average ~2.5 sec/event peak ( Dec) # Events 30 Nov 20 Dec CMS Week Upgrade of MW Hit some limit of implement. P. Capiluppi talk

9 US-CMS IGT Production 25 Oct 28 Dec > 1 M events 4.7 sec/event average 2.5 sec/event peak (14-20 Dec 2002) Sustained efficiency: about 44% P. Capiluppi talk

10 Grid in ATLAS DC1 * US-ATLAS EDG Testbed Prod NorduGrid part of Phase 1 reproduce part of full phase 1 & 2 production phase 1 data production Full Phase 2 several tests production [ * See other ATLAS talks for more details] G.Poulard

11 ATLAS DC1 Phase 1 : July-August CPU‘s 110 kSI CPU days 5*10* 7 events generated 1*10* 7 events simulated 3*10* 7 single particles 30 Tbytes files 39 Institutes in 18 Countries 1.Australia 2.Austria 3.Canada 4.CERN 5.Czech Republic 6.France 7.Germany 8.Israel 9.Italy 10.Japan 11.Nordic 12.Russia 13.Spain 14.Taiwan 15.UK 16.USA grid tools used at 11 sites G.Poulard

12 Meta Systems l MCRunJob approach by CMS production team l Framework for dealing with multiple grid resources and testbeds (EDG, IGT) G.Graham

13 Hybrid production model MCRunJob Site Manager starts an assignment RefDB Phys.Group asks for an official dataset User starts a private production Production Manager defines assignments DAG job JDL shell scripts DAGMan (MOP) Local Batch Manager EDG Scheduler Computer farm LCG-1 testbe d User’s Site Resources Chimera VDL Virtual Data Catalogue Planner C. Grande

14 Interoperability: glueCE UI SE RB VDT Client VDT Server RC IS

15 Integrated Grid Systems l Two examples of integrating advanced production and analysis to multiple grids SamGrid AliEn

16 SamGrid Map CDF – Kyungpook National University, Korea – Rutgers State University, New Jersey, US – Rutherford Appelton Laboratory, UK – Texas Tech, Texas, US – University of Toronto, Canada DØ – Imperial College, London, UK – Michigan State University, Michigan, US – University of Michigan, Michigan, US – University of Texas at Arlington, Texas, US

17 Physics with SAM-Grid Standard CDF analysis job submitted via SAM-Grid and executed somewhere z0(µ 1 )z0(µ 2 ) J/ψ => µ + µ - S. Stonjek

18 VO RC RB CE SE WN CE SE WN CE SE WN CE SE WN CE SE WN The BaBar Grid as of March 2003 D. Boutigny special challenges faced by a running experiment with heterogeneous data requirements, root, Objy

19 Grid Applications, Interfaces, Portals l Clarens l Ganga l Genius l Grappa l JAS-Grid l Magda l Proof-Grid l and higher level services –Storage Resource Manager (SRM) –Magda data management –POOL-Grid interface

20 PROOF and Data Grids l Many services are a good fit –Authentication –File Catalog, replication services –Resource brokers –Monitoring  Use abstract interfaces l Phased integration –Static configuration –Use of one or multiple Grid services –Driven by Grid infrastructure Fons Rademakers

21 Different PROOF–GRID Scenarios l Static stand-alone –Current version, static config file, pre-installed l Dynamic, PROOF in control –Using grid file catalog and resource broker, pre- installed l Dynamic, ALiEn in control –Idem, but installed and started on the fly by AliEn l Dynamic, Condor in control –Idem, but allowing in addition slave migration in a Condor pool Fons Rademakers

22 RB/JSS II SE input data location Replica Catalog TOP GIIS... CE Executable = "/usr/bin/env"; Arguments = "zsh prod.dc1_wrc 00001"; VirtualOrganization="datatag"; Requirements=Member(other.GlueHostApplicationSoftware RunTimeEnvironment,"ATLAS-3.2.1" ); Rank = other.GlueCEStateFreeCPUs; InputSandbox={"prod.dc1_wrc",“rc.conf","plot.kumac"}; OutputSandbox={"dc test hlt.pythia_jet_17.log","dc test hlt.pythia_jet_17. his","dc test hlt.pythia_jet_17.err","plot.kumac"}; ReplicaCatalog="ldap://dell04.cnaf.infn.it:9211/lc=ATLAS,rc=GLUE,dc=dell04,dc=cnaf,dc=infn,dc=it"; InputData = {"LF:dc evgen.0001.hlt.pythia_jet_17.root"}; StdOutput = " dc test hlt.pythia_jet_17.log"; StdError = "dc test hlt.pythia_jet_17.err"; DataAccessProtocol = "file"; JDL GLUE-aware files WN ATLAS sw data registration GLUE-Schema based Information System GLUE Testbed JDL Job GENIUS UI see WorldGrid Poster this conf.

23 Ganga: ATLAS and LHCb Server Bookkeeping DB Production DB EDG UI PYTHON SW BUS XML RPC server XML RPC module GANGA Core Module OS Module Athena\ GAUDI GaudiPython PythonROOT PYTHON SW BUS GUI Job Configuration DB Remote user (client) Local Job DB LAN/WAN GRID LRMS C. Tull

24 Ganga EDG Grid Interface Job class JobsRegistry class Job Handler class Data management service Job submission Job monitoring Security service dg-job-list-match dg-job-submit dg-job-cancel dg-job-list-match dg-job-submit dg-job-cancel grid-proxy-init MyProxy grid-proxy-init MyProxy dg-job-status dg-job-get-logging-info GRM/PROVE dg-job-status dg-job-get-logging-info GRM/PROVE edg-replica-manager dg-job-get-output globus-url-copy GDMP edg-replica-manager dg-job-get-output globus-url-copy GDMP EDG UI C. Tull

25 Comment: Building Grid Applications l P is a dynamic configuration script l Turns abstract bundle into a concrete one l Challenge: –building integrated systems –distributed developers and support Grid Component Library CTL ATL GTL abstract bundles templates U2 P1a U1 concrete bundles P1c attributes :user info :grid info P

26 In summary…Common issues l Installation and configuration of MW l Application packaging, run time environments l Authentication mechanisms l Policies differing among sites l Private networks, firewalls, ports l Fragility of services, job submission chain l Inaccuracies, poor performance of information services l Monitoring and several levels l Debugging, site cleanup

27 Conclusions l Progress in the past 18 months has been dramatic! –lots of experience gained in building integrated grid systems –demonstrated functionality with large scale production –more attention being given to analysis l Many pitfalls exposed, areas for improvement identified –some of these are core middleware  feedback given to technology providers –Policy issues remain – using shared resources, authorization –operation of production services –user interactions, support models to be developed l Many thanks to the contributors to this session