Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006.

Similar presentations


Presentation on theme: "Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006."— Presentation transcript:

1 Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006

2 1 June 20062/20 Aims of distributed analysis Physicist defines job to analyse (large) dataset(s) Use distributed resources (computing Grid) Subjob 1Subjob 2Subjob 3Subjob n Job Distribute workload LHCb distributed-analysis system based on LCG (Grid infrastructure), DIRAC (workload management) and Ganga (user interface) Single job submitted Combined output returned

3 1 June 20063/20 Tier-1 centres Tier-2 centres LHCb computing model Baseline solution: analysis at Tier-1 centres Analysis at Tier-2 centres not in baseline solution, but not ruled out

4 1 June 20064/20 DIRAC submission to LCG : Pilot Agents Job Receiver LFC Matcher Data Optimiser Job DB Task Queue Agent Director Pilot Agent LCG WMS Computing Resource Pilot Agent Monitor DIRAC Data Optimiser queries Logical File Catalogue to identify sites for job execution Agent Director submits Pilot Agents for jobs in waiting state Agent Monitor tracks Agent status, and triggers further submission as needed

5 1 June 20065/20 DIRAC submission to LCG : Bond Analogy Job Receiver LFC Matcher Job DB Task Queue Agent Director Pilot Agent LCG WMS Computing Resource Agent Monitor Data Optimiser queries Logical File Catalogue to identify sites for job execution DIRAC Agent Monitor tracks Agent status, and triggers further submission as needed Agent Director submits Pilot Agents for jobs in waiting state

6 1 June 20066/20 Ganga job abstraction A job in Ganga is constructed from a set of building blocks, not all required for every job Merger Application Backend Input Dataset Output Dataset Splitter Data read by application Data written by application Rule for dividing into subjobs Rule for combining outputs Where to run What to run Job

7 1 June 20067/20 Framework for plugin handling Ganga provides a framework for handling different types of Application, Backend, Dataset, Splitter and Merger, implemented as plugin classes Each plugin class has its own schema DaVinci GangaObject IApplication IBackendIDataset ISplitterIMerger Dirac -version -cmt_user_path -masterpackage -optsfile -extraopts User System Plugin Interfaces Example plugins and schemas -CPUTime -destination -id -status

8 1 June 20068/20 Ganga Command-Line Interface in Python (CLIP) CLIP provides interactive job definition and submission from an enhanced Python shell (IPython) –Especially good for trying things out, and understanding how the system works # List the available application plug-ins list_plugins( “application” ) # Create a job for submitting DaVinci to DIRAC j = Job( application = “DaVinci”, backend = “Dirac” # Set the job-options file j.application.optsfile = “myOpts.txt” # Submit the job j.submit() # Search for string in job’s standard output !grep “Selected events” $j.outputdir/stdout

9 1 June 20069/20 Ganga scripting From the command line, a script myScript.py can be executed in the Ganga environment using: ganga myScript.py –Allows automation of repetitive tasks Scripts for basic tasks included in distribution # Create a job for submitting Gauss to DIRAC ganga make_job Gauss DIRAC test.py # Edit test.py to set Gauss properties, then submit job ganga submit test.py # Query status, triggering output retrieval if job is completed ganga query  Approach similar to the one typically used when submitting to a local batch system

10 1 June 200610/20 Ganga Graphical User Interface (GUI) GUI consists of central monitoring panel and dockable windows Job definition based on mouse selections and field completion Highly configurable: choose what to display and how Job details Logical Folders Scriptor Job Monitoring Log window Job builder

11 1 June 200611/20 Shocking News! LHCb Distributed Analysis system is working well DIRAC and Ganga providing complementary functionality People with little or no knowledge of Grid technicalities are using the system for physics analysis More than 75 million events processed in past three months Fraction of jobs completing successfully averaging about 92% Extended periods with success rate >95% How can this be happening? Did he say 75 million? Who’s doing this?

12 1 June 200612/20 Beginnings of a success story 2nd LHCb-UK Software Course held at Cambridge, 10th-12th January 2006 Half day dedicated to Distributed Computing: presentations and 2 hours of practical sessions –U.Egede: Distributed Computing & Ganga –R.Nandakumar: UK Tier-1 Centre –S.Paterson: DIRAC –K.Harrison: Grid submission made simple Made clear to participants a number of things –Tier 1 centres have a lot of resources –Easy to submit jobs to Grid using Ganga –DIRAC ensures high success rate  Distributed analysis not just possible in theory but possible in practice Photographs by P.Koppenburg

13 1 June 200613/20 Cambridge pioneers of distributed analysis C.Lazzeroni: B +  D 0 (K S 0  +  - )K + J.Storey: Flavour tagging with protons Project students: –M.Dobrowolski: B +  D 0 (K S 0 K + K - )K + –S.Kelly: B 0  D + D - and B S 0  D S + D S - –B.Lum: B 0  D 0 (K S 0  +  - )K *0 –R.Dixon del Tufo: B S 0   –A.Willans: B 0  K *0  +  - R.Dixon del Tufo had previous experience of Grid, Ganga and HEP software Others encountered these for first time at LHCb-UK software course Cristina decided she preferred Cobra to Python Photograph by A.Buckley CHEP06, Mumbai

14 1 June 200614/20 Work model (1) Usual strategy has been to develop/test/tune algorithms using signal samples and small background samples on local disks, then process (many times) larger samples (>700k events) on Grid Used pre-GUI version of Ganga, with job submission performed using Ganga scripting interface –Users need only look at the few lines for specifying DaVinci version, master package, job options and splitting requirements –Splitting parameters are files per job and maximum total number of files (very useful for testing on a few files) –Script-based approach popular with both new users (very little to remember) and experienced users (similar to what they usually do to submit to a batch system) –Jobs submitted to both DIRAC and local batch system (Condor)

15 1 June 200615/20 Work model (2) Interactive Ganga session started to have status updates and output retrieval DIRAC monitoring page also used for checking job progress Jobs usually split so that output files were small enough to be returned in sandbox (i.e. retrieved automatically by Ganga) Large outputs placed on CERN storage element (CASTOR) by DIRAC –Outputs retrieved manually using LCG transfer command (lcg-cp) and logical-file name given by DIRAC Hbook files merged in Ganga framework using GPI script: –ganga merge 16,27,32-101 myAnalysis.hbook ROOT files merged using standalone ROOT script (from C.Jones) Excellent support from S.Patterson and A.Tsaregorodtsev for DIRAC problems/queries, and from M.Bargiotti for LCG catalogue problems

16 1 June 200616/20 Example plots from jobs run on distributed-analysis system J.Storey: Flavour tagging with protons Analysis run on 100k B s  J/   tagHLT events C.Lazzeroni: Evaluation of background for B +  D 0 (K 0  +  - )K + Analysis run on 400k B +  D 0 (K 0  +  - )K *0  Results presented at CP Measurements WG meeting, 16 March 2006

17 1 June 200617/20 Project reports R.Dixon del Tufo B S 0   M.Dobrowolski B +  D 0 (K S 0 K + K - )K + B.Lum B 0  D 0 (K S 0  +  - )K *0 A.Willans B 0  K *0  +  - S.Kelly B 0  D + D - and B S 0  D S + D S - Reports make extensive use of results obtained using distributed-analysis system, especially for background estimates Aim to have all reports turned into LHCb notes

18 1 June 200618/20 Job statistics (1) DIRAC job state outputreadystalledfailedotherall Number of jobs 5036127257685488 Statistics taken from DIRAC monitoring page for analysis jobs submitted from Cambridge (user ids: cristina, deltufo, kelly, lum, martad, storey, willans) between 20 February 2006 (week after CHEP06) and 15 May 2006 Estimated success rate: outputready/all = 5036/5488 = 92% Individual job typically processes 20 to 40 files of 500-1000 events each –Estimated number of events successfully processed: 30  500  5036 = 7.55  10 7

19 1 June 200619/20 Job statistics (2) Stalled jobs: 127/5488 = 2.3% –Proxy expires before job completes Problem essentially eliminated by having Ganga create proxy with long lifetime –Problems accessing data? Failed jobs: 257/5488 = 4.7% –73 failures where input data listed in bookkeeping database (and physically at CERN), but not in LCG file catalogue Files registered by M.Bargiotti, then jobs ran successfully –115 failures 7-20 April because of transient problem with DIRAC installation of software (associated with upgrade to v2r10)  Excluding above failures, job success rate is: 5036/5300 = 95%

20 1 June 200620/20 Conclusions LHCb distributed-analysis system is being successfully used for physics studies Ganga makes the system easy to use DIRAC ensures system has high efficiency Extended periods with job success rate >95% More than 75 million events processed in past three months Working on improvements, but this is already a useful tool To get started using the system, see user documentation on Ganga web site: http://cern.ch/gangahttp://cern.ch/ganga He did say 75 million!


Download ppt "Distributed Analysis K. Harrison LHCb Collaboration Week, CERN, 1 June 2006."

Similar presentations


Ads by Google