Presentation is loading. Please wait.

Presentation is loading. Please wait.

PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005.

Similar presentations


Presentation on theme: "PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005."— Presentation transcript:

1 PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005

2 Using PBSpro Advanced, IS&T Advanced Campus Services2 Syllabus Environment Variables Checkpointing

3 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services3 Environment variables Environment Variables Taken from the user ’ s environment Created by PBS Created by users All names start with “ PBS_ ” Some names start with “ PBS_O_ ” Indicating the variable is from the job ’ s originating environment

4 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services4 Important variables PBS_O_HOME Value of HOME from submission environment PBS_O_HOST Host name on which the qsub command was executed PBS_O_PATH Value of path from submission environment PBS_O_QUEUE original queue name to which the job was submitted PBS_O_SHELL Value of shell from submission environment PBS_O_SYSTEM Operation system name where qsub was executed PBS_O_WORKDIR Absolute path of directory where qsub was executed

5 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services5 Important variables (cont1) PBS_DEFAULT Name of the default PBS server PBS_EVIRONMENT Indicate job types: PBS_BATCH or PBS_INTERACTIVE PBS_JOBID Job identify assigned to the job or job array PBS_JOBNAME Job name supplied by the user PBS_MOMPORT Port number on which this job ’ s MOMs will communicate

6 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services6 Important variables (cont2) PBS_NODEFILE Filename containing a list of nodes assigned to the job PBS_NODENUM Logical node number of this node allocated to the job PBS_QUEQUE Name of the queue from which the job is executed PBS_TASKNUM Tasks (process) number for the job on this node TMPDIR Job-specific temporary directory for this job

7 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services7 Checkpointing Two methods of checkpoint / restart: OS-specific method SGI IRIX and Cray UNICOS Generic site-specific method Specify the checkpointing directory “ -C path ” command line option to pbs_mom PBS_CHECKPOINT_PATH environment variable “ $checkpoint_path path ” option in MOM ’ s config file default value

8 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services8 Checkpointing (cont) Manually checkpointing a job Use the qhold command Checkpointing jobs during PBS shutdown Append the -t immediate option to the qterm statement in the PBS start/stop script Suspending/checkpointing multi-node jobs Save the complete session state in a file A open socket will cause the operation to fail

9 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services9 Site-specific method Modify file mom_priv/config “ periodic ” job checkpoint action (during job execution) $action checkpoint TIME_OUT SCRIPT_PATH ARGS [...] Checkpoint just before the job is to be terminated $action checkpoint_abort TIME_OUT SCRIPT_PATH ARGS [...] Job restart action $action restart TIME_OUT SCRIPT_PATH ARGS [...]

10 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services10 Site-specific method (cont) $restart_background (true|false) A boolean flag that modifies how MOM performs a restart “ false ” (the default), MOM runs the restart operation and waits for the result “ true ”, restart operations are done by a child of MOM which only returns when all the restarts for all the local tasks of a job are done, while the parent (main) MOM continue processing without being blocked $restart_transmogrify (true|false) A boolean flag that controls how MOM launches the restart script/program “ false ” (the default), MOM will run the restart script and block until the restart operation is complete “ true ”, MOM will run the restart script/program in such a way that the script will “ become ” the task it is restarting.

11 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services11 Specify checkpoint in job “ -c interval ” option defines the checkpoint interval (in minutes) The interval argument is specified as: n No checkpointing is to be performed. s Checkpointing is to be performed only when the server executing the job is shutdown. c Checkpointing is to be performed at the default minimum time for the Server executing the job. c=minutes Checkpointing is to be performed at an interval of minutes u Checkpointing is unspecified, thus resulting in the same behavior as “ s ”. If “ -c ” is not specified, the checkpoint attribute is set to the value “ u ”. qsub – c c=10 myjob

12 Fall 2005 Using PBSpro Advanced, IS&T Advanced Campus Services12 References PBS Professional 7 Quick Start PBS Professional 7 User Guide PBS Professional 7 Administration Guide www.pbspro.com

13 Thank you! Contacts: Bill Xiecxie@cs.gsu.edu Victor Boletvbolet@gsu.edu


Download ppt "PBSpro Advanced Information Systems & Technology Advanced Campus Services Prepared by Chao “Bill” Xie, PhD student Computer Science Fall 2005."

Similar presentations


Ads by Google