Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &

Similar presentations


Presentation on theme: "Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &"— Presentation transcript:

1 Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing & Software Workshop, München, 26-30 March 2007 http://cern.ch/ganga Karl Harrison / University of Cambridge

2 29 March 20072/31 Ganga basics Depending on context, Ganga can be any of: (A) a Hindu goddess (B) an hallucinogenic drug (C) a job-management framework (Gaudi/Athena and Grid Alliance), implemented in Python, that simplifies running jobs on the Grid Anyone expecting a presentation on (A) or (B) is going to be disappointed Some have suggested: A + B = C Sculpture of Ganga in cave temple, Elephanta Island, Mumbai harbour Ganga, or ganja, is prepared from the plant cannabis sativa

3 29 March 20073/31 Ganga as a job-management framework (1) Ganga is developed as ATLAS-LHCb common project Ganga 4.2.12 (current release), has built-in support for applications based on Athena framework, for JobTransforms and for DQ2 data-management system - Ganga 4.3 (release early in April) will additionally be interfaced with AMI and TNT Component model allows customisations for other types of application, e.g. ROOT Ganga provides a uniform interface for accessing different types of processing system - Allow trivial switching between testing on local batch system and running full- scale analysis on the Grid Job definition Job submission

4 29 March 20074/31 Ganga as a job-management framework (2) Whenever started, Ganga runs a monitoring loop in the background - Track progress of submitted jobs - Retrieve outputs of completed jobs - Check validity of user credentials: Grid proxy and/or AFS token Ganga stores job information locally or (Ganga 4.3) on a remote server with certificate-based authentication Job inputs and outputs are kept in Ganga workspace until moved or deleted by user User can modify code without affecting a submitted job Monitoring Archival

5 29 March 20075/31 Ganga job abstraction A job in Ganga is constructed from a set of building blocks, not all required for every job Merger Application Backend Input Dataset Output Dataset Splitter Data read by application Data written by application Rule for dividing into subjobs Rule for combining outputs Where to run What to run Job

6 29 March 20076/31 Plugin classes Athena GangaObject IApplication IBackendIDataset ISplitterIMerger LCG -CE -requirements -jobtype -middleware -id -status -reason -actualCE -exitcode -atlas_release -max_events -options -option_file -user_setupfile -user_area User System Plugin Interfaces Example plugins and schemas Ganga handles many types of Application, Backend, Dataset, Splitter and Merger, implemented as plugin classes Each plugin class has its own schema New plugin classes can readily be added: the system is extensible

7 29 March 20077/31 Applications and Backends Running of a particular Application on a given Backend is enabled by implementing an appropriate adapter component or Runtime Handler –Can often use same Runtime Handler for more than one Backend: less coding PBSOSG NorduGrid LocalLSFPANDA US-ATLAS WMS LHCb WMS Executable Athena (Simulation/Digitisation/ Reconstruction/Analysis) AthenaMC (Production) Gauss/Boole/Brunel/DaVinci (Simulation/Digitisation/ Reconstruction/Analysis) LHCbExperiment neutralATLAS Available in Ganga 4.2 Work in progress New in Ganga 4.3

8 29 March 20078/31 Help with Ganga Ganga documentation can be found in the User Guides section of the Ganga web side: http://cern.ch/ganga/http://cern.ch/ganga/ –Most relevant items are: Installation Working with Ganga - general introduction to functionality GUI manual - introduction to graphical interface Link to ATLAS Wiki page for distributed analysis using Ganga –https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427 –Tomorrow’s hands-on sessions will use this For problems or feature requests, do any of the following: –Use hypernews forum for Ganga users and developers: https://hypernews.cern.ch/HyperNews/Atlas/get/GANGAUserDeveloper.html https://hypernews.cern.ch/HyperNews/Atlas/get/GANGAUserDeveloper.html –Send e-mail to hn-atlas-GANGAUserDeveloper@cern.chhn-atlas-GANGAUserDeveloper@cern.ch –Submit a report via Ganga’s bug-submission page in Savannah: https://savannah.cern.ch/bugs/?func=additem&group=ganga https://savannah.cern.ch/bugs/?func=additem&group=ganga Should either login to Savannah first, or give e-mail address

9 29 March 20079/31 Installation for distributed analysis with Ganga Software for distributed analysis with Ganga is already installed at CERN and a number of other sites If needed, you can perform your own installation –Install the ATLAS software See: https://twiki.cern.ch/twiki/bin/view/Atlas/InstallingAtlasSoftwarehttps://twiki.cern.ch/twiki/bin/view/Atlas/InstallingAtlasSoftware –To be able to access LCG resources, install LCG user interface See: https://twiki.cern.ch/twiki/bin/view/LCG/TarUIInstallhttps://twiki.cern.ch/twiki/bin/view/LCG/TarUIInstall –Install DQ2 client See: https://twiki.cern.ch/twiki/bin/view/Atlas/DDMClientDQ2https://twiki.cern.ch/twiki/bin/view/Atlas/DDMClientDQ2 –Install Ganga Download installation script: http://cern.ch/ganga/download/ganga-installhttp://cern.ch/ganga/download/ganga-install Perform installation of latest release using: With Ganga 4.3, will be able to add GangaNorduGrid to package list –Automatically install NorduGrid client software python ganga-install --extern=GangaAtlas,GangaGUI,GangaPlotter last

10 29 March 200710/31 Setting up for distributed analysis with Ganga Setup sequence is as follows –Ensure that you have a Grid certificate installed, and that you are registered with the ATLAS Virtual Organisation –Setup environment for Athena, then checkout and build UserAnalysis package (or equivalent) –Setup the environment for using LCG client tools –Setup the environment for using DQ2 –Setup the environment for using Ganga On an lxplus account at CERN, Ganga setup is performed using: Ganga setup at other sites should ensure the following: –Directory containing ganga executable is added to PATH – Detailed setup instructions given as part of hands-on exercises source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh Optional, but sometimes useful GANGA_CONFIG_PATH is set to GangaAtlas/Atlas.ini

11 29 March 200711/31 Using Ganga Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython) –Especially good for trying things out, and seeing how the system works Scripts, which may contain any Python/IPython or CLIP commands, allow automation of repetitive tasks Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system Graphical User Interface (GUI) allows job management based on mouse selections and field completion –Lots of configuration possibilities Ganga allows users to work in a variety of ways

12 29 March 200712/31 Ganga startup and configuration files ganga --help ganga -g ganga --gui & ganga Before processing.gangarc, Ganga processes, in the order they are specified, any configuration files pointed to by the environment variable GANGA_CONFIG_PATH – This makes possible the use of group configuration files, but allows settings to be overridden on a user-by-user basis  print Ganga help information  start GUI session  run specified script in Ganga environment ganga  start CLIP session Ganga can be invoked in any of the following ways: – If user doesn’t have a valid proxy then his/her Grid passphrase is requested When Ganga is first run, a configuration file.gangarc is created in the user’s home directory – The file includes comments on the configuration possibilities – The latest default configuration file can always be obtained with:

13 29 March 200713/31 Ganga workspace Ganga creates a directory gangadir in your home directory and uses this for storing job-related files and information –You can’t move this directory but, before running Ganga, you can create ~/gangadir as a link to another location –Should delete jobs when they are no longer needed, so that Ganga input/output files don’t exhaust disk quota gangadir repository input Local templates output workspace Remote gui jobs6667

14 29 March 200714/31 Python commands Ganga is developed in Python, making use of IPython extensions All Python/IPython commands can be used at the prompt in a Ganga CLIP session, and the syntax for CLIP and Python commands is the same Information about Python can be found at: http://www.python.org/http://www.python.org/ –If you’re new to Python, the on-line tutorial is very helpful The following are often useful # A hash (#) marks the start of a comment # A slash (\) at the end of a line indicates that # the following line is a continuation dir() # List currently available objects help() # Give help help( item ) # Give help on specified item x = 5 # Assign value to variable print x # Print value of variable ctrl-D # Exit from session

15 29 March 200715/31 IPython commands Information about IPython extensions can be found at: http://ipython.scipy.org/ http://ipython.scipy.org/ One useful extension is the possibility to use shell commands from Python, together with both shell variables and Python variables # Use ! before shell commands # Use $ before Python variables # Use $$ before shell variables here = ‘where the heart is’ !echo $$HOME is $here !ls $$HOME/mySubdir !emacs # Start emacs session !zsh # Give shell prompt Exit # Exit from session

16 29 March 200716/31 Ganga CLIP commands (1) Ganga commands are explained in the guide Working with Ganga: http://cern.ch/ganga/user/html/GangaIntroduction http://cern.ch/ganga/user/html/GangaIntroduction From a CLIP session, available classes, objects and functions may be listed, and help can be requested for each Useful commands include the following plugins( ‘type’) # List plugins of specified type: # ‘applications’, ‘backends’, etc j1 = Job( backend =LSF() ) # Create a new job for LSF a1 = Executable() # Create Executable application j1.application = a1 # Set value for job’s application j1.backend = LCG() # Change job’s backend to LCG export( j1, ‘myJob.py’ ) # Write job to specified file load( ‘myJob.py’ ) # Load job(s) from specified file j2 = j1.copy() # Create j2 as a copy of job j1 jobs # List jobs jobs[ i ].subjobs # List subjobs for split job i

17 29 March 200717/31 Ganga CLIP commands (2) When a job j has been defined, the following methods can be used Once a job has been submitted, it can no longer be modified, and it cannot be resubmitted, but the job can be copied and the copy can be modified/submitted Ganga supports use of templates, which can be used as the basis of a job definition j.submit() # Submit the job j.kill() # Kill the job (if running) j.remove() # Kill the job and delete associated files j.peek() # List files in job’s output directory t = JobTemplate() # Create template templates # List templates j3 = Job( templates[ i ] ) # Create job from template i

18 29 March 200718/31 CLIP: “Hello World” example From a Ganga CLIP session, a job that writes “Hello World” can be created, and then submitted to LCG, as follows app = Executable() app.exe = ‘/bin/echo’ app.env = {} app.args = [‘Hello World’ ] # Property values set above are in fact the defaults # for Executable application j = Job( application = app, backend = LCG() ) j.submit() # Check on job progress jobs # When job has completed, check the output j.peek( ‘stdout’ )

19 29 March 200719/31 Using Ganga commands from a Linux shell Ganga includes scripts that can be used from a Linux shell (i.e. outside of CLIP) # Create a job for submitting Executable to LCG ganga make_job Executable LCG test.py [ Edit test.py to set Executable and/or LCG properties ] # Submit job ganga submit test.py # Query status, triggering output retrieval if job is completed, # but not recommended because of risk of time-outs for status queries ganga query # Kill job ganga kill id # Remove job from Ganga repository and workspace ganga remove id Given job name or id as returned by query, also have possibilities such as Same syntax can be used from inside CLIP, with no overheads for startup

20 29 March 200720/31 Ganga plugins for ATLAS jobs Athena GangaObject IApplication IBackend IDatasetISplitterIMerger LCG ATLASCastorDataset DQ2Dataset ATLASDataset ATLASLocalDataset ATLASOutputDataset DQ2OutputDataset AthenaMC AthenaMCpyJY AthenaSplitterJob AthenaMCSplitterJob AthenaMCpyJTSplitterJob AthenaOutputMerger LSF Other Analysis Production Input data Output data Dataset in DQ2/DDM Files on local storage Old mc10 data in old LFC Older data on CASTOR at CERN Dataset in DQ2/DDM Files on local storage

21 29 March 200721/31 Starting point for using Ganga to run ATLAS applications Need usual setup for running Athena For analysis: –Need steering package that defines the physics analysis This is any package where cmt/requirements defines all dependencies In the hands-on exercises, and for anyone who’s followed the analysis examples in the ATLAS Workbook, the steering package is UserAnalysis –Work from /run subdirectory of steering package For user-level production –Should download JobTransform archive to directory where Ganga is run –Archive used in hands-on exercises is: http://cern.ch/atlas-computing/links/kitsDirectory/Production/kits/ AtlasProduction_12_0_4_1_noarch.tar.gz http://cern.ch/atlas-computing/links/kitsDirectory/Production/kits/ AtlasProduction_12_0_4_1_noarch.tar.gz

22 29 March 200722/31 Using Ganga’s athena script to submit analysis job to LCG From the Linux shell, job can be submitted to LCG using the syntax: ganga athena \ --inDS misalg_csc11.005300.PythiaH130zz4l.recon.AOD.v12003104 \ --outputdata AnalysisSkeleton.aan.root \ --split 3 \ --maxevt 100 \ --lcg \ --ce ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas \ AnalysisSkeleton_topOptions.py Use Ganga’s athena script Input dataset Output data Split job into 3 subjobs Limit analysis to 100 events per subjob Submit to LCG Force use of particular compute element Job options Replace --lcg with --lsf, and omit --ce, to submit to LSF –Trivial switching between running locally and running on Grid Help available on options accepted by Ganga’s athena script ganga athena --help

23 29 March 200723/31 Monitoring job progress and retrieving output To monitor job progress, you should start a Ganga CLIP or GUI session In CLIP, changes in the status of jobs/subjobs are buffered, and are printed when you hit return At any time, you can also explicitly request status information # print status information for all jobs jobs # Print status information for particular subjob print jobs[5].subjobs[27].status When a job completes, the Ganga monitoring loop takes care of storing the output, and registers it with DQ2 with a dataset name of the form user.username.ganga.jobid Output can be listed and retrieved using DQ2 client tools dq2_ls -f user.username.ganga.jobid dq2_get -r user.username.ganga.jobid

24 29 March 200724/31 Running an analysis job from CLIP (1) Create application object, set job options and prepare tar file of user area –Other properties filled automatically, based on user setup app = Athena() app.application.option_file = ‘myOpts.py’ app.prepare( athena_compile = False ) Define the input dataset inData = DQ2Dataset() inData.dataset = ‘interestingDataset.AOD.v12003104’ inData.type = ‘DQ2_Local’ Define the output dataset outData = AthenaOutputDataset() outData.outputdata = ‘myOutput.root’

25 29 March 200725/31 Running an analysis job from CLIP (2) Define splitter, merger and backend splitter = AthenaSplitterJob( numsubjobs = 2 ) merger = AthenaOutputMerger() backend = LCG( CE = ‘reliableCE’ ) Create job template from defined objects t = JobTemplate( name = ‘TestAnalysis’ ) t.application = app t.backend = backend t.inputdata = inData t.outputdata = outData t.splitter = splitter t.merger = merger

26 29 March 200726/31 Running an analysis job from CLIP (3) Create job from the template and submit the job j = Job( t ) j.submit() Check job status jobs When job has completed, check standard outputs of subjobs, then retrieve and merge ROOT output files j.subjobs[0].peek( “stdout” ) j.subjobs[1].peek( “stdout” ) j.outputdata.retrieve() j.merge()

27 29 March 200727/31 User-level production Event production is broken down into three steps: –evgen: generate particle kinematics –simul+digit: simulate particles passing through detector - RDO output –recon: event reconstruction - AOD, ESD, CBNT output With Ganga 4.3, submission of production jobs from Linux shell will be possible using Ganga’s athena script As CLIP example, consider generation of 30 events containing single electron with E T > 40 GeV –Same example used in hands-on exercises

28 29 March 200728/31 Running user-level production from CLIP (1) Create application object, and set properties app = AthenaMC() app.atlas_release = ‘12.0.4’ app.transform_archive = ‘AtlasProduction_12_0_4_1_noarch.tar.gz’ app.production_name = ‘tutorial’ app.mode = ‘evgen’ app.evgen_job_option = ‘DC3.007004.singlepart_e_Et40.py’ app.process_name = ‘single_e_Et40’ app.run_number = ‘000001’ app.firstevent = ‘1’ app.random_seed = ‘1102362401’ app.number_events_job = ‘30’ app.se_name = ‘NIKHEF’

29 29 March 200729/31 Running user-level production from CLIP (2) Define the output dataset –The output is stored at the site specified by app.se_name –Naming convention explained in hands-on exercises Define LCG backend, with execution forced at a particular site backend = LCG() backend.CE = ‘tbn20.nikhef.nl:2119/jobmanager-pbs-atlas’ Create job template from defined objects t = JobTemplate( name = ‘TestGeneration’ ) t.application = app t.backend = backend t.outputdata = outData outData = DQ2OutputDataset() Create job from template and submit Job( t ).submit()

30 29 March 200730/31 Ganga Graphical User Interface (GUI) GUI consists of central monitoring panel and dockable windows Essentially everything that can be done in CLIP can be done with the GUI –More details in presentation tomorrow Job details Logical Folders Scriptor Job Monitoring Log window Job builder

31 29 March 200731/31 Conclusions Have given an overview of: –the ideas behind Ganga –getting started with Ganga, running a “Hello World” job –using Ganga to run ATLAS applications Have probably made it seem more complicated than it is in practice To see that Ganga is quite easy to use, you just have to try it –Chance for this, and more detailed explanations of the functionality, in the Ganga hands-on sessions tomorrow


Download ppt "Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS applications Distributed Analysis Tutorial ATLAS Computing &"

Similar presentations


Ads by Google