Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

Similar presentations


Presentation on theme: "Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128."— Presentation transcript:

1 Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128 cores –Processor = ‘Intel(R) Xeon(R) CPU E5335 @ 2.00GHz’ –Each core is roughly 50% faster than my desktop (yonne = Intel(R) Pentium(R) D CPU 2.80GHz). Metric is not precise, performance difference is different for integer and FP processing, so effective speed ratio depends on application What is ‘stoomboot’ – Software –OS ‘Scientific Linux CERN SLC release 4.6 (Beryllium)’ –Can access same NFS disks as desktops (/data/atlas, /project/atlas etc…) –Can run all Atlas software & compile all Atlas software What is stoomboot – ATLAS Computing model –Tier3 computing facility. Not organized by ATLAS, not controlled by Atlas –Can access T1 AOD data (in progress, multiple protocols possible e.g. dcap, xrootd), but also not organized by ATLAS –Note stoomboot is shared by ATLAS,LHCb,ALICE.

2 Wouter Verkerke, NIKHEF 2 Running jobs on stoomboot –No direct login on nodes stbc-01 through stbc-16 –Access to stoomboot through Torque/Mauii batch system (formerly known as PBS) –Batch commands available on all desktops Submitting batch jobs – qsub –Simplest example: submit script for batch execution unix> qsub test.sh 9714.allier.nikhef.nl Returned string is job identifier Checking status of batch jobs – qstat –Simplest example: unix> qstat Job id Name User Time Use S Queue ------------------- ---------------- --------------- -------- - ----- 9714.allier test.sh verkerke 00:00:00 C test –Status code: Q = queued, R = running, C=completed Only jobs that completed in the last 10 minutes are listed

3 Wouter Verkerke, NIKHEF 3 Running jobs – Default settings Examining output –Appears in file.o, e.g. test.sh.o9714 in example of previous page Default settings for jobs –Job runs in home directory ($HOME) –Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script –Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file ‘test.sh.o9714’ contains stdout and file ‘test.sh.e9714’ contains stderr. If there is no stdout or stderr, an empty file is created –A mail is sent to you the output files cannot be created

4 Wouter Verkerke, NIKHEF 4 Running jobs – Some useful qstat options Merge stdout and stderr in a single file –Add option ‘-j oe’ to qsub command (single file *.o* is written) Choose batch queue –Right now there are two queues: test (30 min) and qlong (48h) –Add option ‘-q ’ to qsub command Choose different output file for stdout –Add option ‘-o ’ to qsub command Pass all environment variables of submitting shell to batch job (with exception of $PATH) –Add option ‘-V’ to qsub command

5 Wouter Verkerke, NIKHEF 5 Running ATLAS software in batch Setup environment in submitted script –Following Manuels wiki instructions for 13.0.30 here Note that SLC4 hosts of stoomboot can run both SLC3 and SLC4 compiled executables #! /bin/sh # 1 -- setup Athena 13.0.30 source $HOME/cmthome/13.0.30_slc3/setup.sh -tag=setup,13.0.30 # 2 -- setup working area export USERPATH=/project/atlas/users/ / export CMTPATH=${USERPATH}:${CMTPATH} cd /project/atlas/users/ /13.0.30 athena.py

6 Wouter Verkerke, NIKHEF 6 Compiling ATLAS software in batch / for use in batch Compiling ATLAS software for use in batch –If your project areas is on /project/atlas or on /data/atlas it is visible to jobs running in batch –No need to compile your executables in batch job (as is often required for GRID jobs) –Compile interactively on your desktop (SLC3 or SLC4) and set up your batch job to use the compiled executables and libraries of your project area Compiling ATLAS software in batch –You can create script to drive compilation –But easier to submit interactive batch job –Command ‘ qsub –X –I –q qlong ’. Submits batch job connected to your terminal (seems very much interactive login using ssh) –Compile software (SLC4 only option) and exit shell when done (terminates interactive batch job) –Then submit separate job(s) to run executables

7 Wouter Verkerke, NIKHEF 7 Are the queues full? The ‘qstat’ command only lists your own jobs –The lower level command ‘showq --host=allier’ can show jobs of all users –For now seems only available on login The ‘qstat –Q’ command summarizes number of currently pending/running/completed jobs per queue for all users Queue Max Tot Ena Str Que Run Hld Wat Trn Ext T ---------------- --- --- --- --- --- --- --- --- --- --- - qlong 0 603 yes yes 534 65 0 0 0 0 E test 0 0 yes yes 0 0 0 0 0 0 E Web page with load graph vs time: http://www.nikhef.nl/grid/stats/stbc/ Current configuration of scheduler –Max 96 jobs for ATLAS, max 64 jobs per user

8 Wouter Verkerke, NIKHEF 8 A wrapper script for simple interactive use I have written a small utility script ‘bsub’ that wraps the qsub command to allow to directly submit a command line argument –E.g. You have a ROOT analysis macro that you can run in your current as unix> root –l –b –q analyzeNtuples.C –Hassle to write small script that initializes batch shell with same ROOT version, moves to PWD and executates same command. The bsub wrapper does all this for you. Unix> bsub -l –b –q analyzeNtuples.C submits batch job that does exactly the same as interactive command. –Caveat: command line arguments cannot contain quotes, parentheses etc.. as these get mangled by scripts

9 Wouter Verkerke, NIKHEF 9 Next time: AOD data access from stoomboot You can access files on /data/atlas, –However performance will not scale as NFS performance and network bandwith will quickly saturate as more than O(few) jobs on stoomboot are reading from /data/atlas –OK for debugging & testing, but need better solution to be able to use full stoomboot capacity. Several better solutions (e.g. dcap, xrootd) exist in princinple –Different tradeoff in performance –Work in progress (Folkert is trying dcap). More in next meeting.


Download ppt "Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128."

Similar presentations


Ads by Google