Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

Slides:



Advertisements
Similar presentations
User view Ganga classes and functions can be used interactively at a Python prompt, can be referenced in scripts, or can be used indirectly via a Graphical.
Advertisements

Ganga Status and Outlook K. Harrison (University of Cambridge) 16th GridPP Meeting Queen Mary, University of London, 27th-29th June 2006
Computing Lectures Introduction to Ganga 1 Ganga: Introduction Object Orientated Interactive Job Submission System –Written in python –Based on the concept.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
6/4/20151 Introduction LHCb experiment. LHCb experiment. Common schema of the LHCb computing organisation. Common schema of the LHCb computing organisation.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Analysis demos from the experiments. Analysis demo session Introduction –General information and overview CMS demo (CRAB) –Georgia Karapostoli (Athens.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Nick Brook Current status Future Collaboration Plans Future UK plans.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
Belle MC Production on Grid 2 nd Open Meeting of the SuperKEKB Collaboration Soft/Comp session 17 March, 2009 Hideyuki Nakazawa National Central University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Ganga A quick tutorial Asterios Katsifodimos Trainer, University of Cyprus Nicosia, Feb 16, 2009.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Introduction to Ganga Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
LHCb-ATLAS GANGA Workshop, 21 April 2004, CERN 1 DIRAC Software distribution A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
Successful Distributed Analysis ~ a well-kept secret K. Harrison LHCb Software Week, CERN, 27 April 2006.
EGEE is a project funded by the European Union under contract IST “Interfacing to the gLite Prototype” Andrew Maier / CERN LCG-SC2, 13 August.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 – The Ganga Evolution Andrew Maier.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
April 27, 2006 The New GANGA GUI 26th LHCb Software Week C L Tan
Distributed Computing and Ganga Karl Harrison (University of Cambridge) 3rd LHCb-UK Software Course National e-Science Centre, Edinburgh, 8-10 January.
HammerCloud Functional tests Valentina Mancinelli IT/SDC 28/2/2014.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Ganga User Interface EGEE Review Jakub Moscicki.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
K. Harrison CERN, 3rd March 2004 GANGA CONTRIBUTIONS TO ADA RELEASE IN MAY - Outline of Ganga project - Python support for AJDL - LCG analysis service.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
Using Ganga for physics analysis Karl Harrison (University of Cambridge) ATLAS Distributed Analysis Tutorial Milano, 5-6 February 2007
2 June 20061/17 Getting started with Ganga K.Harrison University of Cambridge Tutorial on Distributed Analysis with Ganga CERN, 2.
ATLAS-specific functionality in Ganga - Requirements for distributed analysis - ATLAS considerations - DIAL submission from Ganga - Graphical interfaces.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1 LHCb computing for the analysis : a naive user point of view Workshop analyse cc-in2p3 17 avril 2008 Marie-Hélène Schune, LAL-Orsay for LHCb-France Framework,
Ganga development - Theory and practice - Ganga 3 - Ganga 4 design - Ganga 4 components and framework - Conclusions K. Harrison CERN, 25th May 2005.
K. Harrison CERN, 21st February 2005 GANGA: ADA USER INTERFACE - Ganga release Python client for ADA - ADA job builder - Ganga release Conclusions.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
David Adams ATLAS ATLAS Distributed Analysis and proposal for ATLAS-LHCb system David Adams BNL March 22, 2004 ATLAS-LHCb-GANGA Meeting.
INFSO-RI Enabling Grids for E-sciencE Ganga 4 Technical Overview Jakub T. Moscicki, CERN.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
A GANGA tutorial Professor Roger W.L. Jones Lancaster University.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
ATLAS Physics Analysis Framework James R. Catmore Lancaster University.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Seven things you should know about Ganga K. Harrison (University of Cambridge) Distributed Analysis Tutorial ATLAS Software & Computing Workshop, CERN,
User view Ganga classes and functions can be used interactively at a Python prompt, can be referenced in scripts, or can be used indirectly via a Graphical.
ATLAS Distributed Analysis S. González de la Hoz 1, D. Liko 2, L. March 1 1 IFIC – Valencia 2 CERN.
LHCb D ata P rocessing S oftware J. Blouw, A. Zhelezov Physikalisches Institut, Universitaet Heidelberg DESY Computing Seminar, Nov. 29th, 2010.
L’analisi in LHCb Angelo Carbone INFN Bologna
Overview of the Belle II computing
Moving the LHCb Monte Carlo production system to the GRID
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
The Ganga User Interface for Physics Analysis on Distributed Resources
 YongPyong-High Jan We appreciate that you give an opportunity to have this talk. Our Belle II computing group would like to report on.
Production client status
Presentation transcript:

Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006

1 June 20062/20 Aims of distributed analysis Physicist defines job to analyse (large) dataset(s) Use distributed resources (computing Grid) Subjob 1Subjob 2Subjob 3Subjob n Job Distribute workload LHCb distributed-analysis system based on LCG (Grid infrastructure), DIRAC (workload management) and Ganga (user interface) Single job submitted Combined output returned

1 June 20063/20 Tier-1 centres Tier-2 centres LHCb computing model Baseline solution: analysis at Tier-1 centres Analysis at Tier-2 centres not in baseline solution, but not ruled out

1 June 20064/20 DIRAC submission to LCG : Pilot Agents Job Receiver LFC Matcher Data Optimiser Job DB Task Queue Agent Director Pilot Agent LCG WMS Computing Resource Pilot Agent Monitor DIRAC Data Optimiser queries Logical File Catalogue to identify sites for job execution Agent Director submits Pilot Agents for jobs in waiting state Agent Monitor tracks Agent status, and triggers further submission as needed

1 June 20065/20 DIRAC submission to LCG : Bond Analogy Job Receiver LFC Matcher Job DB Task Queue Agent Director Pilot Agent LCG WMS Computing Resource Agent Monitor Data Optimiser queries Logical File Catalogue to identify sites for job execution DIRAC Agent Monitor tracks Agent status, and triggers further submission as needed Agent Director submits Pilot Agents for jobs in waiting state

1 June 20066/20 Ganga job abstraction A job in Ganga is constructed from a set of building blocks, not all required for every job Merger Application Backend Input Dataset Output Dataset Splitter Data read by application Data written by application Rule for dividing into subjobs Rule for combining outputs Where to run What to run Job

1 June 20067/20 Framework for plugin handling Ganga provides a framework for handling different types of Application, Backend, Dataset, Splitter and Merger, implemented as plugin classes Each plugin class has its own schema DaVinci GangaObject IApplication IBackendIDataset ISplitterIMerger Dirac -version -cmt_user_path -masterpackage -optsfile -extraopts User System Plugin Interfaces Example plugins and schemas -CPUTime -destination -id -status

1 June 20068/20 Ganga Command-Line Interface in Python (CLIP) CLIP provides interactive job definition and submission from an enhanced Python shell (IPython) –Especially good for trying things out, and understanding how the system works # List the available application plug-ins list_plugins( “application” ) # Create a job for submitting DaVinci to DIRAC j = Job( application = “DaVinci”, backend = “Dirac” # Set the job-options file j.application.optsfile = “myOpts.txt” # Submit the job j.submit() # Search for string in job’s standard output !grep “Selected events” $j.outputdir/stdout

1 June 20069/20 Ganga scripting From the command line, a script myScript.py can be executed in the Ganga environment using: ganga myScript.py –Allows automation of repetitive tasks Scripts for basic tasks included in distribution # Create a job for submitting Gauss to DIRAC ganga make_job Gauss DIRAC test.py # Edit test.py to set Gauss properties, then submit job ganga submit test.py # Query status, triggering output retrieval if job is completed ganga query  Approach similar to the one typically used when submitting to a local batch system

1 June /20 Ganga Graphical User Interface (GUI) GUI consists of central monitoring panel and dockable windows Job definition based on mouse selections and field completion Highly configurable: choose what to display and how Job details Logical Folders Scriptor Job Monitoring Log window Job builder

1 June /20 Shocking News! LHCb Distributed Analysis system is working well DIRAC and Ganga providing complementary functionality People with little or no knowledge of Grid technicalities are using the system for physics analysis More than 75 million events processed in past three months Fraction of jobs completing successfully averaging about 92% Extended periods with success rate >95% How can this be happening? Did he say 75 million? Who’s doing this?

1 June /20 Beginnings of a success story 2nd LHCb-UK Software Course held at Cambridge, 10th-12th January 2006 Half day dedicated to Distributed Computing: presentations and 2 hours of practical sessions –U.Egede: Distributed Computing & Ganga –R.Nandakumar: UK Tier-1 Centre –S.Paterson: DIRAC –K.Harrison: Grid submission made simple Made clear to participants a number of things –Tier 1 centres have a lot of resources –Easy to submit jobs to Grid using Ganga –DIRAC ensures high success rate  Distributed analysis not just possible in theory but possible in practice Photographs by P.Koppenburg

1 June /20 Cambridge pioneers of distributed analysis C.Lazzeroni: B +  D 0 (K S 0  +  - )K + J.Storey: Flavour tagging with protons Project students: –M.Dobrowolski: B +  D 0 (K S 0 K + K - )K + –S.Kelly: B 0  D + D - and B S 0  D S + D S - –B.Lum: B 0  D 0 (K S 0  +  - )K *0 –R.Dixon del Tufo: B S 0   –A.Willans: B 0  K *0  +  - R.Dixon del Tufo had previous experience of Grid, Ganga and HEP software Others encountered these for first time at LHCb-UK software course Cristina decided she preferred Cobra to Python Photograph by A.Buckley CHEP06, Mumbai

1 June /20 Work model (1) Usual strategy has been to develop/test/tune algorithms using signal samples and small background samples on local disks, then process (many times) larger samples (>700k events) on Grid Used pre-GUI version of Ganga, with job submission performed using Ganga scripting interface –Users need only look at the few lines for specifying DaVinci version, master package, job options and splitting requirements –Splitting parameters are files per job and maximum total number of files (very useful for testing on a few files) –Script-based approach popular with both new users (very little to remember) and experienced users (similar to what they usually do to submit to a batch system) –Jobs submitted to both DIRAC and local batch system (Condor)

1 June /20 Work model (2) Interactive Ganga session started to have status updates and output retrieval DIRAC monitoring page also used for checking job progress Jobs usually split so that output files were small enough to be returned in sandbox (i.e. retrieved automatically by Ganga) Large outputs placed on CERN storage element (CASTOR) by DIRAC –Outputs retrieved manually using LCG transfer command (lcg-cp) and logical-file name given by DIRAC Hbook files merged in Ganga framework using GPI script: –ganga merge 16,27, myAnalysis.hbook ROOT files merged using standalone ROOT script (from C.Jones) Excellent support from S.Patterson and A.Tsaregorodtsev for DIRAC problems/queries, and from M.Bargiotti for LCG catalogue problems

1 June /20 Example plots from jobs run on distributed-analysis system J.Storey: Flavour tagging with protons Analysis run on 100k B s  J/   tagHLT events C.Lazzeroni: Evaluation of background for B +  D 0 (K 0  +  - )K + Analysis run on 400k B +  D 0 (K 0  +  - )K *0  Results presented at CP Measurements WG meeting, 16 March 2006

1 June /20 Project reports R.Dixon del Tufo B S 0   M.Dobrowolski B +  D 0 (K S 0 K + K - )K + B.Lum B 0  D 0 (K S 0  +  - )K *0 A.Willans B 0  K *0  +  - S.Kelly B 0  D + D - and B S 0  D S + D S - Reports make extensive use of results obtained using distributed-analysis system, especially for background estimates Aim to have all reports turned into LHCb notes

1 June /20 Job statistics (1) DIRAC job state outputreadystalledfailedotherall Number of jobs Statistics taken from DIRAC monitoring page for analysis jobs submitted from Cambridge (user ids: cristina, deltufo, kelly, lum, martad, storey, willans) between 20 February 2006 (week after CHEP06) and 15 May 2006 Estimated success rate: outputready/all = 5036/5488 = 92% Individual job typically processes 20 to 40 files of events each –Estimated number of events successfully processed: 30  500  5036 = 7.55  10 7

1 June /20 Job statistics (2) Stalled jobs: 127/5488 = 2.3% –Proxy expires before job completes Problem essentially eliminated by having Ganga create proxy with long lifetime –Problems accessing data? Failed jobs: 257/5488 = 4.7% –73 failures where input data listed in bookkeeping database (and physically at CERN), but not in LCG file catalogue Files registered by M.Bargiotti, then jobs ran successfully –115 failures 7-20 April because of transient problem with DIRAC installation of software (associated with upgrade to v2r10)  Excluding above failures, job success rate is: 5036/5300 = 95%

1 June /20 Conclusions LHCb distributed-analysis system is being successfully used for physics studies Ganga makes the system easy to use DIRAC ensures system has high efficiency Extended periods with job success rate >95% More than 75 million events processed in past three months Working on improvements, but this is already a useful tool To get started using the system, see user documentation on Ganga web site: He did say 75 million!