Presentation is loading. Please wait.

Presentation is loading. Please wait.

Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.

Similar presentations


Presentation on theme: "Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005."— Presentation transcript:

1 Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005

2 2 Topics Interactive –Serial –Parallel –Limits Batch –Serial –Parallel –Queues and Policies Charging Comparison with Seaborg

3 3 Execution Environment Four login nodes –Serial jobs only –CPU limit: 60 minutes –Memory limit: 64 MB 320 compute nodes –“Interactive” parallel jobs –Batch serial and parallel jobs –Scheduled by PBSPro Queue limits and policies established to meet system objectives –User input is critical!

4 4 Interactive Jobs Serial jobs run on login nodes – cd, ls, pathf90, etc. –./a.out Parallel jobs run on compute nodes –Controlled by PBSPro mpirun -np 16./a.out qsub -I -q interactive -l nodes=8:ppn=2 % cd $PBS_O_WORKDIR % mpirun -np 16./a.out qsub -I -q batch -l nodes=32:ppn=2,walltime=18:00:00

5 5 PBSPro Marketed by Altair Engineering –Based on open source Portable Batch System developed for NASA –Also installed on DaVinci Batch scripts contain directives: #PBS -o myjob.out Directives may also appear as command- line options: qsub -o myjob.out …

6 6 Simple Batch Script #PBS -l nodes=8:ppn=2,walltime=00:30:00 #PBS -N myjob #PBS -o myjob.out #PBS -e myjob.err #PBS -A mp999 #PBS -q debug #PBS -V cd $PBS_O_WORKDIR mpirun -np 16./a.out

7 7 Useful PBS Options (1) -A repo Charge this job to repository repo Default: Your default repository -N jobname Provide name for job; up to 15 printable, non- whitespace characters Default: Name of batch script -q qname Submit job to batch queue qname Default: batch

8 8 Useful PBS Options (2) -S shell Specify shell as the scripting language Default: Your login shell -V Export current environment variables into the batch job environment Default: Do not export

9 9 Useful PBS Options (3) -o outfile Write STDOUT to outfile Default:.o -e errfile Write STDERR to errfile Default:.e -j [ eo | oe ] Join STDOUT and STDERR on STDOUT ( eo ) or STDERR ( oe ) Default: Do not join

10 10 Useful PBS Options (4) -m [ a | b | e | n ] E-main notification a = send mail when job aborted by system b = send mail when job begins e = send mail when job ends n = do not send mail Options a, b, and e may be combined Default: a

11 11 Batch Queues SubmitExecuteNodesWalltime interactive 1 – 1630 mins debug 1 – 3230 mins batch batch16 1 – 1648 hours batch32 17 – 3224 hours batch64 33 – 6412 hours batch128 65 – 1286 hours batch256 129 – 2566 hours low 1 – 646 hours

12 12 Batch Queue Policies Each user may have: –One running interactive job –One running debug job –Four jobs running over entire system Only one batch128 job is allowed to run at a time. The batch256 queue usually has a run limit of zero. NERSC staff will arrange to run jobs of this size.

13 13 Submitting Batch Jobs % qsub myjob 93935.jacin03 % Record jobid for tracking!

14 14 Deleting Batch Jobs % qdel 93935.jacin03 %

15 15 Monitoring Batch Jobs (1) PBS command qstat % qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 93894.jacin03 EV80fl02_3 legendre 0 H batch16 93330.jacin03 test.script laplace 00:00:23 R batch32 93897.jacin03 runlu8x8 rasputin 0 Q batch32 93334.jacin03-m mtp_mg_3wat_o2a fibonacci 00:00:11 R batch16... Use -u option for single-user output % qstat -u einstein Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 93295.jacin03-ib job5 einstein 00:00:00 R batch16 %

16 16 Monitoring Batch Jobs (2) NERSC command qs % qs JOBID ST USER NAME NDS REQ USED SUBMIT 93939 R gauss STDIN 1 00:30:00 00:10:43 Oct 2 16:47:00 93891 R einstein runlu4x8 16 01:00:00 00:38:48 Oct 2 15:23:36 93918 R inewton r4_16 8 01:00:00 00:10:37 Oct 2 15:36:35... 93785 Q inewton r4_64 32 01:00:00 - Oct 2 08:42:36 93828 Q rasputin nodemove 64 00:05:00 - Oct 2 12:00:11 93897 Q einstein runlu8x8 32 01:00:00 - Oct 2 15:24:27... 93893 H legendre EV80fl02_2 4 03:00:00 - Oct 2 15:24:23 93894 H legendre EV80fl02_3 4 03:00:00 - Oct 2 15:24:24 93917 H legendre EV80fl98_5 4 03:00:00 - Oct 2 15:26:06... Also provides -u option

17 17 Monitoring Batch Jobs (3) NERSC website has current queue look: http://www.nersc.gov/nusers/status/jacquard/qstat Also has completed jobs list: http://www.nersc.gov/nusers/status/jacquard/pbs_summary Numerous filtering options available –Owner –Account –Queue –Jobid

18 18 Charging Machine charge factor (cf) = 4 –Based on benchmarks and user applications –Currently under review Serial interactive –Charge = cf cputime –Always charged to default repository All parallel –Charge = cf 2 nodes walltime –Charged to default repo unless -A specified

19 19 Things To Look Out For (1) Do not set group write permission for your home directory; it will prevent PBS from running your jobs. Library modules must be loaded at runtime as well as linktime. Propagation of environment variables to remote processes is incomplete; contact NERSC consulting for help.

20 20 Things To Look Out For (2) Do not run more that one MPI program in a single batch script. If your login shell is bash, you may see: accept: Resource temporarily unavailable done. In this case, specify a different shell using the -S directive, such as: #PBS -S /usr/bin/ksh

21 21 Things To Look Out For (3) Batch jobs always start in $HOME. To get to directory where job was submitted: cd $PBS_O_WORKDIR For jobs that work with large files: cd $SCRATCH/some_subdirectory PBS buffers output and error files until job completes. To view files (in home directory) while running: -k oe

22 22 Things To Look Out For (3) The following is just a warning and can be ignored: Warning: no access to tty (Bad file descriptor). Thus no job control in this shell.

23 23 LoadLeveler vs. PBS LLPBSLLPBS #@ node #PBS -l nodes #@ notification #PBS -m #@ tasks_per_node #PBS -l ppn #@ shell #PBS -S #@ wall_clock_limit #PBS -l walltime #@ output #PBS -o #@ class #PBS -q #@ error #PBS -e #@ job_name #PBS -N #@ environment #PBS -V #@ account_no #PBS -A

24 24 Resources NERSC Website http://www.nersc.gov/nusers/resources/jacquard/running_jobs.php http://www.nersc.gov/vendor_docs/altair/PBSPro_7.0_User_Guide.pdf NERSC Consulting 1-800-66-NERSC, menu option 3, 8 am - 5 pm, Pacific time (510) 486-8600, menu option 3, 8 am - 5 pm, Pacific time consult@nersc.gov http://help.nersc.gov/consult@nersc.govhttp://help.nersc.gov/


Download ppt "Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005."

Similar presentations


Ads by Google