Successful Distributed Analysis ~ a well-kept secret K. Harrison LHCb Software Week, CERN, 27 April 2006.

Slides:



Advertisements
Similar presentations
Overview of Ganga documentation K. Harrison University of Cambridge Ganga Developer Days CERN, 9th-13th July 2007.
Advertisements

Computing Lectures Introduction to Ganga 1 Ganga: Introduction Object Orientated Interactive Job Submission System –Written in python –Based on the concept.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
The B A B AR G RID demonstrator Tim Adye, Roger Barlow, Alessandra Forti, Andrew McNab, David Smith What is BaBar? The BaBar detector is a High Energy.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Ganga Developments Karl Harrison (University of Cambridge) 18th GridPP Meeting University of Glasgow, 20th-21st March 2007
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
K.Harrison CERN, 21st November 2002 GANGA: GAUDI/ATHENA AND GRID ALLIANCE - Background and scope - Project organisation - Technology survey - Design -
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Belle MC Production on Grid 2 nd Open Meeting of the SuperKEKB Collaboration Soft/Comp session 17 March, 2009 Hideyuki Nakazawa National Central University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
LHCb-ATLAS GANGA Workshop, 21 April 2004, CERN 1 DIRAC Software distribution A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ganga User Interface EGEE Review Jakub Moscicki.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
Overview Background: the user’s skills and knowledge Purpose: what the user wanted to do Work: what the user did Impression: what the user think of Ganga.
Using Ganga for physics analysis Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
2 June 20061/17 Getting started with Ganga K.Harrison University of Cambridge Tutorial on Distributed Analysis with Ganga CERN, 2.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Ganga development - Theory and practice - Ganga 3 - Ganga 4 design - Ganga 4 components and framework - Conclusions K. Harrison CERN, 25th May 2005.
K. Harrison CERN, 21st February 2005 GANGA: ADA USER INTERFACE - Ganga release Python client for ADA - ADA job builder - Ganga release Conclusions.
David Adams ATLAS ATLAS Distributed Analysis (ADA) David Adams BNL December 5, 2003 ATLAS software workshop CERN.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
1 DIRAC Project Status A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 10 March, DIRAC Developer meeting.
Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Jean-Philippe Baud, IT-GD, CERN November 2007
L’analisi in LHCb Angelo Carbone INFN Bologna
U.S. ATLAS Grid Production Experience
Moving the LHCb Monte Carlo production system to the GRID
INFN-GRID Workshop Bari, October, 26, 2004
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
The Ganga User Interface for Physics Analysis on Distributed Resources
Production Manager Tools (New Architecture)
Production client status
Presentation transcript:

Successful Distributed Analysis ~ a well-kept secret K. Harrison LHCb Software Week, CERN, 27 April 2006

27 April 20062/14 Shocking News! LHCb Distributed Analysis system is up and running DIRAC and Ganga working well together People with little or no knowledge of Grid technicalities are using the system for physics analysis More than 30 million events processed in past two months Fraction of jobs completing successfully averaging about 88% Extended periods with success rate >94% How can this be happening? Did he say 30 million? Who’s doing this?

27 April 20063/14 Beginnings of a success story 2nd LHCb-UK Software Course held at Cambridge, 10th-12th January 2006 Half day dedicated to Distributed Computing: presentations and 2 hours of practical sessions –U.Egede: Distributed Computing & Ganga –R.Nandakumar: UK Tier-1 Centre –S.Paterson: DIRAC –K.Harrison: Grid submission made simple Made clear to participants a number of things –Tier 1 centres have a lot of resources –Easy to submit jobs to Grid using Ganga –DIRAC ensures high success rate  Distributed analysis not just possible in theory but possible in practice Photographs by P.Koppenburg

27 April 20064/14 Cambridge HEP Group People –Theory: 14 including 4 PhD students –Experiment: 36 (10 LHCb) including 10 PhD students (4 LHCb) –Also have project students (5 LHCb about to finish) Computing resources –Condor pool of 37 Linux machines, all but 2 with single CPU These are people’s desktop machines: also used interactively –8-10 Tb disk space Local resources usually fine for LHCb analyses of 50k-100k events For larger-scale analyses rely on access to other resources –Grid is an attractive option: develop locally and run remotely without needing to copy files around manually

27 April 20065/14 Setting up at Cambridge for distributed analysis LHCb software installed locally and updated for new releases DIRAC installed together with LCG client tools, and regularly updated (currently using v2r10) –Have script to take care of post-install configuration EDG job-submission tools installed –Has allowed testing of direct submission to LCG Using Ganga beta3 (released December 2005) with additions and patches –No built-in job splitting and no graphical interface –Ganga public release installed, but needs bug fixes

27 April 20066/14 User group C.Lazzeroni: B +  D 0 (K S 0  +  - )K + J.Storey: Flavour tagging with protons Project students: –M.Dobrowolski: B +  D 0 (K S 0 K + K - )K + –S.Kelly: B 0  D + D - and B S 0  D S + D S - –B.Lum: B 0  D 0 (K S 0  +  - )K *0 –R.Dixon del Tufo: B S 0   –A.Willans: B 0  K *0  +  - R.Dixon del Tufo had previous experience of Grid, Ganga and HEP software Others encountered these for first time at LHCb-UK software course Cristina decided she preferred Cobra to Python Photograph by A.Buckley CHEP06, Mumbai

27 April 20067/14 Work model (1) Usual strategy has been to develop/test/tune algorithms using signal samples and small background samples on local disks, then process (many times) larger samples (>700k events) on Grid Job submission performed using GPI (Python) script that implements simple-minded job splitting –Users need only look at the few lines for specifying DaVinci version, master package, job options and splitting requirements –Splitting parameters are files per job and maximum total number of files (very useful for testing on a few files) –Script-based approach popular with both new users (very little to remember) and experienced users (similar to what they usually do to submit to a batch system) –Jobs submitted to both DIRAC and Condor

27 April 20068/14 Work model (2) Interactive Ganga session started to have status updates and output retrieval DIRAC monitoring page also used for checking job progress Jobs usually split so that output files were small enough to be returned in sandbox (i.e. retrieved automatically by Ganga) Large outputs placed on CERN storage element (CASTOR) by DIRAC –Outputs retrieved manually using LCG transfer command (lcg-cp) and logical-file name given by DIRAC Hbook files merged in Ganga framework using GPI script: –ganga merge 16,27, myAnalysis.hbook ROOT files merged using standalone ROOT script (from C.Jones) Excellent support from S.Patterson and A.Tsaregorodtsev for DIRAC problems/queries, and from M.Bargiotti for LCG catalogue problems

27 April 20069/14 Example plots from jobs run on distributed-analysis system J.Storey: Flavour tagging with protons Analysis run on 100k B s  J/   tagHLT events C.Lazzeroni: Evaluation of background for B +  D 0 (K 0  +  - )K + Analysis run on 400k B +  D 0 (K 0  +  - )K *0  Results presented at CP Measurements WG meeting, 16 March 2006

27 April /14 Job statistics (1) DIRAC job state outputreadystalledfailedotherall Number of jobs Statistics taken from DIRAC monitoring page for analysis jobs submitted from Cambridge (user ids: cristina, deltufo, kelly, lum, martad, storey, willans) between 20 February 2006 (week after CHEP06) and 26 April 2006 Estimated success rate: outputready/all = 2246/2892 = 88% Individual job typically processes files of 500 events each –Estimated number of events successfully processed: 25  500  2546 = 3.18  10 7

27 April /14 Job statistics (2) Stalled jobs: 95/2892 = 3.3% –Proxy expires before job completes Problem essentially eliminated by having Ganga create proxy with long lifetime –Problems staging data? Failed jobs: 209/2892 = 7.2% –73 failures where input data listed in bookkeeping database (and physically at CERN), but not in LCG file catalogue Files registered by M.Bargiotti, then jobs ran successfully –115 failures 7-20 April because of transient problem with DIRAC installation of software (associated with upgrade to v2r10)  Excluding above failures, job success rate is: 2546/2704 = 94%

27 April /14 Areas for improvement (1) More-helpful messages when something goes wrong –Ganga error messages fairly unintelligible to users, and rarely explain how to fix a known problem (e.g. spurious LOCK file) –Difficult for user to understand if he/she has done something wrong, or if there’s a problem with the system Robustness of Workload Management System –Single server being down or unreachable can halt entire system Luckily doesn’t happen often! –Add redundancies (e.g. job-server mirrors)? Control over what happens to output –Need to have way of automatically doing what user wants with larger files: they shouldn’t just end up on CASTOR

27 April /14 Areas for improvement (2) Configuration possibilities –Some things that site manager (and some users) may want to change are only possible by hacking the code Load only backend plugins relevant to site Customise information displayed when jobs are listed User support structure –Users submitting to Grid don’t make a strong distinction between Ganga and DIRAC (we have a seamless system!), so better to have single-point of entry for problems/queries Obtaining Grid certificate and registering with Virtual Organisation –Current procedure very convoluted and drawn out Any improvements much appreciated

27 April /14 Conclusions LHCb distributed-analysis system is being successfully used for physics studies Ganga makes the system easy to use DIRAC ensures system has high efficiency Extended periods with job success rate >94% More than 30 million events processed in past two months This isn’t the finished product, but is already a useful tool –No need to keep it a secret! He did say 30 million!