CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 CMS Status Progress towards GridPP milestones Data management – the.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

CMS Applications – Status and Near Future Plans
WP2: Data Management Gavin McCance University of Glasgow November 5, 2001.
IEEE NSS 2003 Performance of the Relational Grid Monitoring Architecture (R-GMA) CMS data challenges. The nature of the problem. What is GMA ? And what.
CMS Grid Batch Analysis Framework
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
Réunion DataGrid France, Lyon, fév CMS test of EDG Testbed Production MC CMS Objectifs Résultats Conclusions et perspectives C. Charlot / LLR-École.
CMS-ARDA Workshop 15/09/2003 CMS/LCG-0 architecture Many authors…
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
RLS Production Services Maria Girone PPARC-LCG, CERN LCG-POOL and IT-DB Physics Services 10 th GridPP Meeting, CERN, 3 rd June What is the RLS -
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
A tool to enable CMS Distributed Analysis
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CMS Report – GridPP Collaboration Meeting VIII Peter Hobson, Brunel University22/9/2003 CMS Applications Progress towards GridPP milestones Data management.
Dave Newbold, University of Bristol24/6/2003 CMS MC production tools A lot of work in this area recently! Context: PCP03 (100TB+) just started Short-term.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Use of R-GMA in BOSS Henry Nebrensky (Brunel University) VRVS 26 April 2004 Some slides stolen from various talks at EDG 2 nd Review (
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
RLS Tier-1 Deployment James Casey, PPARC-LCG Fellow, CERN 10 th GridPP Meeting, CERN, 3 rd June 2004.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 Plans for the integration of grid tools in the CMS computing environment Claudio.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
CMS Report – GridPP Collaboration Meeting V Peter Hobson, Brunel University16/9/2002 CMS Status and Plans Progress towards GridPP milestones Workload management.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
CMS Stress Test Report Marco Verlato (INFN-Padova) INFN-GRID Testbed Meeting 17 Gennaio 2003.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
13 May 2004EB/TB Middleware meeting Use of R-GMA in BOSS for CMS Peter Hobson & Henry Nebrensky Brunel University, UK Some slides stolen from various talks.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
Claudio Grandi INFN Bologna CHEP'03 Conference, San Diego March 27th 2003 BOSS: a tool for batch job monitoring and book-keeping Claudio Grandi (INFN Bologna)
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
GridPP11 Liverpool Sept04 SAMGrid GridPP11 Liverpool Sept 2004 Gavin Davies Imperial College London.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
David Stickland CMS Core Software and Computing
EGEE is a project funded by the European Union under contract IST Information and Monitoring Services within a Grid R-GMA (Relational Grid.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
CMS Production Management Software Julia Andreeva CERN CHEP conference 2004.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
EDG Project Conference – Barcelona 13 May 2003 – n° 1 A.Fanfani INFN Bologna – CMS WP8 – Grid Planning in CMS Outline  CMS Data Challenges  CMS Production.
BaBar-Grid Status and Prospects
Real Time Fake Analysis at PIC
(on behalf of the POOL team)
BOSS: the CMS interface for job summission, monitoring and bookkeeping
BOSS: the CMS interface for job summission, monitoring and bookkeeping
INFN-GRID Workshop Bari, October, 26, 2004
ALICE Physics Data Challenge 3
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
BOSS: the CMS interface for job summission, monitoring and bookkeeping
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Scalability Tests With CMS, Boss and R-GMA
Integrating SRB with the GIGGLE framework
Presentation transcript:

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 CMS Status Progress towards GridPP milestones Data management – the Data Challenge 2004 Batch analysis framework Monitoring using the EDG R-GMA middleware The GRIDPP funded cast list Tim Barrass Barry MacEvoy Owen Maroney JJ Henry Nebrensky Hugh Tallini Bristol, Brunel and Imperial College

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 CMS and LCG2 CMS Data Challenge DC04 has three components Tier-0 challenge. Reconstruction at CERN Complex enough. Doesnt need grid per se But will publish catalog to CERN RLS service Distribution challenge. Push/Pull data to Tier-1s Want to use LCG tools, can use SRB. Questions of MCAT/RLS coherence, SRB pool issues etc. Analysis/Calibration Aspects At Tier-1/2 centers (not at CERN during DC04 proper) Encourage use of LCG2 and GRID3 to run these Aim to complete first two in March Expect last one to continue and be repeated over next 6 months as LCG matures. Factorized from Tier0 and distribution challenges CMS expert manpower is saturated with work for DC04.

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 DC04 Production Data Challenge: March 2004 nominal An end-to-end test of the CMS offline computing system 25% of full world-wide system to be run flat-out for one month Key test of our Grid-enabled software components Play back digitized data, emulating CMS DAQ -> storage, reconstruction, calibration, data reduction and analysis at T0 & external T1 Some T2 involvement as clients of local T1 centres T0 to T1 data transfer New transfer management database Refinements to schema CASTOR issues to be resolved (SRM export, 3Tb buffer needed)

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004

DC04: Catalogue Deployment Synchronised RLS (LRC & RMC) deployment Expect to have Oracle DB deployed at CERN and CNAF Deployment at RAL and FZK could follow this May not be achieved on timescale of DC04 Cannot afford to plan on this being in place Tier 1s without RLS will need POOL MySQL catalogue FNAL, RAL, Lyon, FZK Catalogue should be updated by Tier 1 agent FCatalog tool copies POOL data from CERN RLS to local MySQL catalogue Catalogue updated as files are transferred from CERN

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Catalogue Use CERN RLS Initial registration of POOL data by reconstruction jobs Registration of files and replicas in SRB by GMCat RAL, Lyon Registration of files produced by analysis jobs outside LCG-2 Only files which are made globally accessible in an SRB or SRM server CNAF RLS Replication of files to LCG-2 SE Queries by LCG-2 analysis jobs Registration of files produced by LCG-2 analysis jobs

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Distribution of Data from T1 to T2 LCG-2 sites Distribution through EDG replica manager Registration in CNAF RLS Jobs access data from CNAF RLS Non LCG-2 sites Tier 2 can access POOL data from Tier 1 MySQL catalogue using Catalo tools Creates local catalogue – XML or local MySQL

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 SRB MCat Failover Backup MCat server at Daresbury Oracle failover solution to be installed soon Maintains mirror copy of Oracle backend between RAL and Daresbury If RAL MCat has problems will switch to Daresbury No need to change SRB server or client Minimal downtime of MCat Change in DNS registration GMCat service in deployment Optimisation testing on local MySQL LRC & RLS

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Catalogue Summary RLS Catalogue deployment at CERN and CNAF still expected MCat server operational, backup service improved Likely absence of RLS Catalogues at RAL, FZK Tier 1s to install local POOL MySQL Catalogues agents to populate them Onward distribution to Tier 2s Detailed configuration requires: How data is streamed? To which Tier 2s?

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Batch Analysis Framework Gridified ORCA Submission SystemGROSS Simple UI suitable for non-expert end user Extensible architecture (as requirements change/get better defined for DC04 and beyond) No modification required to ORCA (transparent running on Grid). No additional s/w required remotely Integrates directly to BOSS

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 COMMON BOSS/AF DATABASE DATA INTERFACE MONITORING MODULE UI JOB SUBMISSION MODULE WN RB PHYSICS META-CATALOG GRID USER Schematic Architecture BOSS GROSS System Design

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 How it works User submits to AF a single analysis TASK which comprises: User additionally specifies: Submission module: ORCA executable ORCA user libraries Metadata catalogue query Which BOSS DB to use Any additional DB to write output details to Which metadata catalogue to query What to do with output data and logs (in sandbox, register somewhere, etc). Suffix for output filenames Makes data query on catalogue Splits TASK into multiple JOBS (1 job per run) Creates a JDL for each JOB Creates wrapper script and steer file for each JOB Submits each job (through BOSS)

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Wrapping the ORCA job ORCA executable wrapper running on WN will Set up appropriate ORCA environment Copy input sandbox/input data to working area Link to correct user libraries Run executable Deal with output files Wrapper is shell script + steering file One (or many) standard shell script registered in db (but easy to modify and re-register) Unique steering file created for each job by submission system

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Data Handling Data handling part missing right now What we need Definition of Physics Meta-Catalogue Ability to query this meta-catalogue to give List of GUIDs per run of data to be included in Grid submission JDL. This will direct where the job runs (Given that no movement of input data will take place – i.e. the job will always run where the data is). Where to catalogue output data for group analysis (AF can handle writing to multiple DBs – e.g. writing to private local BOSS DB and to metadata cat.)

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 GROSS Summary Tested extensively on ORCA LCG-1 Now installed on LCG-2 UI at Imperial College Build extra functionality Multiple DB support … Most important missing piece: Data interface to meta-catalogue (ready to be plugged into the rest of the framework).

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 RGMA+Boss: Overview CMS jobs are submitted via a UI node for data analysis. Individual jobs are wrapped in a BOSS executable. Jobs are then delegated to a local batch farm managed by (for example) PBS. When a job is executed, the BOSS wrapper spawns a separate process to catch input, output and error streams. Job status is then redirected to a local DB.

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Boss CMSEDG SE CE CMS software BOSS DB Workload Management System JDL RefDB parameters data registration Job output filtering Runtime monitoring input data location Push data or info Pull info UI IMPALA/BOSS Replica Manager CE CMS software CE WN SE CE CMS software SE

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Where R-GMA Fits In BOSS designed for use within a local batch farm. –If a job is scheduled on a remote compute element, status info needs to be sent back the submitters site. –Within a grid environment we want to collate job info from potentially many different farms. –Job status info should be filtered depending on where the user is located – the use of predicates would therefore be ideal. R-GMA fits in nicely. –BOSS wrapper makes use of the R-GMA API. –Job status updates are then published using a stream producer. –An archiver positioned at each UI node can then scoop up relevant job info and dump it into the locally running BOSS db. –Users can then access job status info from the UI node.

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Use of R-GMA in BOSS Job BOSS DB UI IMPALA/BOSS WN Sandbox BOSS wrapper Job Tee OutFile R-GMA API CE/GK servlets Receiver servlets Registry

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Test Motivation Want to ensure R-GMA can cope with volume of expected traffic and is scalable. CMS production load estimated at around 5000 jobs. Initial tests* with v only managed about must do better. (Note: first tests at Imperial College a year ago fell over at around 10 jobs!) *Reported at IEEE NSS Conference, Oregon, USA, October 2003

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Test Design A simulation of the CMS production system was created. –An MC simulation was designed to represents a typical job. –Each job creates a stream producer. –Each job publishes a number of tuples depending on the job phase. –Each job contains 3 phases with varying time delays. An Archiver collects published tuples. –The Archiver db used is a representation of the BOSS db. –Archived tuples are compared with published tuples to verify the test outcome.

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Topology Archiver Mon Box Archiver Client Test verification MC Sims SP Mon Boxes IC Boss DB Test Output

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Test Setup Archiver & SP mon box setup at Imperial College. SP mon box & IC setup at Brunel. Archiver and MC sim clients positioned at various nodes within both sites. Tried 1 MC sim and Archiver with variable Job submissions. Also setup similar test on WP3 test bed using 2 MC sims and 1 Archiver.

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Results 1 MC sim creating 2000 jobs and publishing 7600 tuples proven to work without glitch. Bi-directional jobs from Imperial College to Brunel (and v.v.) worked without problems. Demonstrated 2 MC sims each running 4000 jobs (with published tuples) on the WP3 test bed. Peak loading was ~1000 jobs producing data simultaneously.

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Pitfalls Encountered Lots of integration problems. –Limitation on number of open streaming sockets – 1K. –Discovered lots of OutOfMemoryErrors. –Various configurations problems at both imperial College and Brunel sites. –Usual firewall challenges Probably explained some of the poor initial performance. Scalability of test is largely dependent on the specs of the Stream Producer/Archiver Mon boxes. i.e. > 1Gb memory and fast processor

CMS Report – GridPP Collaboration Meeting IX Peter Hobson, Brunel University4/2/2004 Overall Summary Preparations for a full scale test of CMS production over a Grid (T0 + T1 + some T2) well underway. Still on target for 1 March start-up. New Batch Analysis framework GROSS being deployed (with BOSS and RGMA) via rpm for DC04 Scalability of RGMA now approaching what is needed for full production load of CMS.