Presentation on theme: "ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health."— Presentation transcript:
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health Applications 1
ISG We build general capability Learning Objectives After this tutorial, you should: Be comfortable submitting and monitoring jobs through the batch queuing system on Olympus Be able to modify the supplied job scripts for your own work Understand how to efficiently use the filesystems on Olympus Know where to go for help… 2
ISG We build general capability Resources The Olympus git repository contains the source code for the examples used in this tutorial; all of the presentations today are available for downloadOlympus git repository The Olympus cluster wiki is provides specific documentation for OlympusOlympus cluster wiki The website of the Pittsburgh Supercomputing Center contains additional training materials and account management tools.Pittsburgh Supercomputing Centertrainingaccount Email email@example.com with questions about available software, problems with your account, and requests for advanced firstname.lastname@example.org 3
ISG We build general capability Types of Jobs Serial Jobs: individual, independent jobs that run using a single core of a single processor on a single node Multicore Parallel Jobs: use multiple cores on a single node e.g., OpenMPOpenMP Message Passing Jobs: can use multiple cores distributed over multiple nodes e.g., Open MPIOpen MPI 4 Notes: These categories are fuzzy; Jobs that fit into more than one (or all) aren’t uncommon Boils down to what resources are needed: How many nodes? How many cores?
ISG We build general capability Job Scripts A job script is a step-by-step recipe for completing work on a compute cluster The recipe is written in a scripting language; we will use bash in our examples In order to submit this job script on the Olympus compute cluster, we will use the qsub command 5 Examples can be found on the Olympus gitlab site: https://git.isg.pitt.edu/depasse/olympus/blob/master/examples/fred/fred.bash
ISG We build general capability PBS Directives in a Job Submission Script 6 #!/bin/bash –f # Remarks: A line beginning with # is a comment. # A line beginning with #PBS is a PBS directive. # PBS directives must come first; any directives after the first executable statement are ignored. #PBS -N test.bash # #PBS -o stdout_file # #PBS -e stderr_file The “hash-bang” or “shebang” specifies the scripting language used An active PBS directive Commented-out PBS directives
ISG We build general capability Simple Job Submission Script 7 #!/bin/bash –f # Set PBS Directives… #PBS -N test.bash #PBS –l nodes=1:ppn=1 # Get your input files together cp ~/inputs.txt myInput.txt #Run your program myProgram –i myInput.txt –o myOutput.txt #Collect the output cp myOutput.txt ~/outputs
ISG We build general capability Submitting a Job and Monitoring Progress After submitting your script with qsub it will be entered into the queue A queue is a prioritized list of jobs to be completed Once submitted, the status of your job can be viewed with qstat “qstat –a” gives you more verbose output After your job completes the output of your job will be available in your home directory 8
ISG We build general capability Example Jobs Clone the git repository with the command: “git clone https://git.isg.pitt.edu/depasse/olympus.git”https://git.isg.pitt.edu/depasse/olympus.git Enter the examples directory: “cd olympus.git/examples” View the directories by typing “ls”, you should see: “sanity”: a basic diagnostic sanity check “mpihello”: a simple example of a parallel multinode mpi code “flute”: a basic, real-world example of parallel MPI code “fred”: a basic, real-world example of OpenMP multithreaded code 9
ISG We build general capability Example: “sanity” Go to the examples/sanity directory View the contents using “less”: “less sanity.bash” Navigate with up and down arrows, exit by pressing ‘q’ The script is heavily commented, explaining each step Submit your job! “qsub sanity.bash” 10 Can also view here: https://git.isg.pitt.edu/depasse/olympus/tree/master/examples/sanity
ISG We build general capability Using Olympus File Systems Each node in Olympus has a “local” disk, physically located inside the node. Fast, reliable for work on its own node Olympus has a “shared” file system that is accessible to all nodes via the network. This is where your home directory is Home directory is persistent, and contents are never deleted While running, jobs should write output to the “local” disks. Local disks are for temporary work, will be periodically “scrubbed” For convenience, the “local” directory can be accessed on the head node through the path /net/ /tmp Example: if you want to go to the node n002’s local disk, it the path would be /net/n002/tmp. The local directory’s location is stored in the $LOCAL environment variable 11
ISG We build general capability What does this look like in a job submission script? 12 local_scratch_path="/net/$execution_compute_node$LOCAL“ # make a directory for this job; name created using job id local_working_dir_name="$PBS_JOBID.output.directory" local_working_dir_net_path="$local_scratch_path/$local_working_dir_name" # create the directory mkdir -p $local_working_dir_net_path # dump all environment variables to a compressed file env | gzip > $local_working_dir_net_path/$PBS_JOBID.env.gzip # create a symlink to the local working dir, available through the execution # compute nodes NFS export ln -s $local_working_dir_net_path $PBS_O_WORKDIR/$local_working_dir_name Set an environment variable defining the path to the “local” directory. Define a directory name that is unique to your job. Make that directory. Make the output of your job go to that directory Create a shortcut to your output so you can access it on the head node.
ISG We build general capability Try the other examples Navigate to the other directories (flute, fred, mpihello) Each contains a “README” text file with instructions for submission Each job should take only a few minutes, and will produce output in the same directory that you run qsub 13
ISG We build general capability Working in the shell “man”: Most important command of all. Opens the manual page for a command. “man man” to start. Type “q” to quit. “ls”: List files in a directory. Similar to “dir” in Windows/DOS “cd”: Change directory. Move up and down the directory tree. “less”: A pager that allows you to view (but not edit) a file’s contents. “vi”: The ubiquitous CLI text editor. 14
ISG We build general capability Text Editing with VI Type “i” to enter insert mode. Now you can navigate, delete, and type much the same as in other editors. Type “ESC” to exit insert mode Type “:w” to write you changes to disk Type “:q” to quit the vi editor 15