Bigben Pittsburgh Supercomputing Center J. Ray Scott

Slides:



Advertisements
Similar presentations
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Advertisements

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Network for Computational Nanotechnology (NCN) Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP Advanced Portable Batch System.
Job Submission Using PBSPro and Globus Job Commands.
Network for Computational Nanotechnology (NCN) Purdue, Norfolk State, Northwestern, UC Berkeley, Univ. of Illinois, UTEP Basic Portable Batch System (PBS)
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
PBS Job Management and Taskfarming Joachim Wagner
Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.
Using Clusters -User Perspective. Pre-cluster scenario So many different computers: prithvi, apah, tejas, vayu, akash, agni, aatish, falaq, narad, qasid.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
IT MANAGEMENT OF FME, 21 ST JULY  THE HPC FACILITY  USING PUTTY AND WINSCP TO ACCESS THE SERVER  SENDING FILES TO THE SERVER  RUNNING JOBS 
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
High Performance Computing
DCC/FCUP Grid Computing 1 Resource Management Systems.
Job Submission on WestGrid Feb on Access Grid.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh ssh.fsl.byu.edu You will be logged in to an interactive node.
More Shell Basics CS465 - Unix. Unix shells User’s default shell - specified in /etc/passwd file To show which shell you are currently using: $ echo $SHELL.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.
JGI/NERSC New Hardware Training Kirsten Fagnan, Seung-Jin Sul January 10, 2013.
Critical Flags, Variables, and Other Important ALCF Minutiae Jini Ramprakash Technical Support Specialist Argonne Leadership Computing Facility.
Introduction to UNIX/Linux Exercises Dan Stanzione.
 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
An Introduction to Unix Shell Scripting
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.
Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.
Linux Operations and Administration
Linux+ Guide to Linux Certification Chapter Eight Working with the BASH Shell.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.
Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
Process Models, Creation and Termination Reference –text: Tanenbaum ch. 2.1.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
PTA Linux Series Copyright Professional Training Academy, CSIS, University of Limerick, 2006 © Workshop VI Scheduling & Process Management Professional.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Agenda Managing Processes (Jobs) Command Grouping Running jobs in background (bg) Bringing jobs to foreground (fg), Background job status (jobs) Suspending.
Agenda The Bourne Shell – Part I Redirection ( >, >>,
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Processes Todd Kelley CST8207 – Todd Kelley1.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
GRID COMPUTING.
Auburn University
Welcome to Indiana University Clusters
PARADOX Cluster job management
Unix Scripts and PBS on BioU
HPC usage and software packages
OpenPBS – Distributed Workload Management System
CPU SCHEDULING.
Welcome to Indiana University Clusters
Architecture & System Overview
Compiling and Job Submission
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Queueing System Peter Wad Sackett.
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

Bigben Pittsburgh Supercomputing Center J. Ray Scott

Outline Running A Job –Scheduling Policies –Batch Access –Interactive Access –Packing Jobs Monitoring And Killing Jobs Programming Tools

Workshop Scheduling For the workshop, users should submit jobs to the "training" queue: qsub -q training or in their job scripts as: #PBS -q training At the end of a user job, PBS extracts the relevant lines from the system console logs (the console output from each cpu) for the user's job and places these lines in a file in the user's $HOME as job_ _console.log where is the PBS job id. Viewing this file post-run can provide some details as to job failure.

Scheduling Policies The Portable Batch Scheduler (PBS) controls all access to bigben's compute processors, for both batch and interactive jobs. PBS on bigben currently has two queues. Interactive and batch jobs compete in these queues for scheduling. The two queues are "batch" and "debug" which are controlled through two different modes during a 24 hour day. The "batch" or default queue (does not need to be explicitly named in a job submission) is active during both day and night modes discussed next. The "debug" queue must be explicitly named in a job script: #PBS -q debug and is limited to 32 cpus and 15 minutes of wall-clock time. PBS specifications are discussed below. Day Mode During the day, defined to be 8am-8pm, 64 cpus will be reserved for debugging jobs (jobs run from the "debug" queue). Jobs submitted to the "debug" queue may request no more than 32 cpus and 15 minutes of wall-clock time. Jobs submitted to the "batch" (default) queue may be any size up to the limit of the machine but only jobs of 1024 cpus or less will be scheduled to start during Day Mode. "batch" jobs are limited to 6 wall-clock hours in duration. Jobs in the "debug" and "batch" queues will be ordered FIFO and also in a way to keep any one user from dominating usage and to ensure fair turnaround. Jobs started during the Day Mode must finish by 8pm at which time the machine will be rebooted. Night Mode During the night, defined to be 8pm-8am (starts following a machine reboot), jobs of 2048 cpus or less will be allowed to run and are limited to 6 wall-clock hours in duration. Jobs will be ordered largest to smallest and in a way to keep any one user from dominating usage. Jobs in the "debug" queue will not be allowed to run during Night Mode.

Scheduling Queues Debug 32 cpus, 15min Batch 1024 cpus, 6hrs Batch 2048 cpus, 6hrs ModeHoursQueuesLimits Day8a – 8pdebug batch 32 cpus, 15min 1024 cpus, 6hrs Night8p – 8abatch2048 cpus, 6hrs

Batch Access You use the qsub command to submit a job script to PBS. A PBS job script consists of PBS directives, comments and executable commands. The last line of your job script must end with a newline character. A sample job script is #!/bin/csh #PBS -l size=4 #PBS -l walltime=5:00 #PBS -j oe set echo # move to my /scratch directory cd /scratch/myscratchdir # run my executable pbsyod./hellompi The first line in the script cannot be a PBS directive. Any PBS directive in the first line is ignored. Here, the first line identifies which shell should be used. The next three lines are PBS directives.

Batch Access (cont’d) #PBS -l size=4 –The first directive requests 4 processors. #PBS -l walltime=5:00 –The first directive requests 5 minutes of wallclock time. Specify the time in the format HH:MM:SS. At most two digits can be used for minutes and seconds. Do not use leading zeroes in your walltime specification. – #PBS -j oe –The final PBS directive combines your.o and.e output into one file, in this case your.o file. This will make your program easier to debug. –The remaining lines in the script are comments or command lines. set echo –This command causes your batch output to display each command next to its corresponding output. This will make your program easier to debug. If you are using the Bourne shell or one of its descendants use 'set -x' instead of 'set echo'. Comment lines –The other lines in the sample script that begin with '#' are comment lines. The '#' for comments and PBS directives must begin in column one of your script file. The remaining lines in the sample script are executable commands. pbsyod –The pbsyod command is used to launch your executable on your compute processors. Only programs executed with pbsyod are executed on your compute processors. All other commands are executed on the front end processor. Thus, you must use pbsyod to run your executable or it will run on the front end, where it will probably not work. If it does work it will degrade system performance.

Batch Access (cont’d) Within your batch script the variable PBS_O_WORKDIR is set to the directory from which you issued your qsub command. The variable PBS_O_SIZE is set to the number of processors you requested. After you create your script you must make it executable with the chmod command. chmod 755 myscript.job Then you can submit it to PBS with the qsub command. qsub myscript.job Your batch output--your.o and.e files--is returned to the directory from which you issued the qsub comand after your job finishes. You can also specify PBS directives as command-line options to qsub. Thus, you could omit the PBS directives in the sample script above and submit the script with qsub -l size=4 -l walltime=5:00:00 -j oe Command-line options override PBS directives included in your script. The -M and -m options can be used to have the system send you when your job undergoes specified state transitions.

Interactive Access A form of interactive access is available by using the -I option to qsub. For example, the command qsub -I -l walltime=10:00 -l size=2 requests interactive access to 2 processors for 10 minutes. The system will respond with a message similar to qsub: waiting for job 54.bigben.psc.edu to start Your qsub request will wait until it can be satisfied. If you want to cancel your request you should type ^C. When your job starts you will receive the message qsub: job 54.bigben.psc.edu ready and then your shell prompt. You can use the -M and -m options to qsub to have the system send you when your job has started. At this point any commands you enter will be run as if you had entered them in a batch script. To run on the compute processors allocated to your interactive job you must use the pbsyod command. Stdin, stdout, and stderr are all connected to your terminal, although you will need to use input direction for your MPI code to read stdin. When you are finished with your interactive session type ^D. The system will respond qsub: job 54.bigben.psc.edu completed When you use qsub -I you are charged for the total time that you hold your processors and your memory whether you are computing or not. Thus, as soon as you are done running executables you should type ^D.

Packing Jobs You can pack several pbsyod commands into a single job and have each of them run on a distinct set of processors. This will allow you to increase the number of total processors your job asks for, which will become important once the scheduler is changed to favor large jobs. For example, the job #!/bin/csh #PBS -l size=12 #PBS -l walltime=30:00 #PBS -j oe set echo cd /scratch/myscratchdir pbsyod -size 4 -base 0./mympi < mpi1.in pbsyod -size 4 -base 4./mympi < mpi2.in pbsyod -size 4 -base 8./mympi < mpi3.in will launch three executions, each on a distinct set of 4 processors. The -size option to pbsyod indicates how many processors a pbsyod is to use. The default is to use all of your compute processors. The -base option indicates on which processor a pbsyod should begin executing, with your first processor having a base of 0. Thus, the first pbsyod above will begin executing on your first processor and use 4 processors, the second will run on the next 4 processors starting with your fifth processor and the third pbsyod will run on your final 4 processors. If you do not use the -base option all of your executions will run on top of each other on the same set of processors.

Monitoring and Killing Jobs The qstat -a command is used to display the status of the PBS queue. It includes running and queued jobs. For each job in the queue it shows the amount of walltime and number of processors requested. This information can be useful in predicting when your job might run. The -f option to qstat provides you with more extensive status information for a single job. The shownids command, located in /usr/local/bin, shows you the status of all the compute processors on bigben. A nid is a node id or processor. The output of shownids shows the number of processors in certain types of states. Enabled processors are all processors available to PBS for scheduling. Allocated processors are those enabled processors that are currently running jobs. Free processors are those enabled processors that are currently free. You can use the output from shownids and qstat -a to determine when your jobs might start. The qdel command is used to kill queued and running jobs. qdel 54 The argument to qdel is the jobid of the job you want to kill. If you cannot kill a job that you want to kill send to