Presentation is loading. Please wait.

Presentation is loading. Please wait.

OpenPBS – Distributed Workload Management System

Similar presentations


Presentation on theme: "OpenPBS – Distributed Workload Management System"— Presentation transcript:

1 OpenPBS – Distributed Workload Management System
Presented by - Anup Bhatt and William Schneble

2 PBS – Portable Batch System[2,3]
Distributed workload management system – has threading and MPI integration Job management Queuing Scheduling Monitoring Uses Message orientated Middleware (MoM) that runs on each execution host Manages jobs by starting, tracking, stages files, returns output and runs cleanup on host

3 What is a job? [1] Job: task in form of shell script or batch file with PBS directives and application to run Directives come before program and commands and specify shell, resources, time and paths Shell specification PBS directive for nodes and number of CPUs PBS directive request for time Program directory Analyze program Run program Submit job to PBS server using qsub Request time, nodes, and other resources with submission

4 Typical Lifecycle of Job[3]
Write job script and submit PBS accepts job and returns ID Scheduler finds execution host and sends job Licenses obtained PBS creates job-specific staging and execution directory on hosts Input files and directories copied to primary execution host CPUsets created (if needed) Job runs Output files and directories copied to specified locations Temp files and directories cleaned up Licenses returned CPUsets deleted

5 Checkpointing[4] So PBS shutdown or crashed…
PBS automatically checkpoints jobs during shutdown Checkpoint creates image of job – has all necessary information for recovery Will not checkpoint jobs submitted with qsub –cn option If checkpoint space is exceeded then un-checkpointed jobs are lost Need manual checkpointing for crashes Use the qsub –c interval or write explicit code in program Why checkpoint? If the system crashes, PBS cannot automatically checkpoint the running processes. You might have a job you want to run in stages. Even if PBS checkpoints a running job, the PBS recovery of the job could fail. This could result from a lack of disk space when PBS is writing  a checkpoint file for your job; or it could fail if you modify or remove one of the files associated with one of the processes of the job before the job is recovered. Your job might exceed the CPU time limit specified when you submitted it for processing (a recovered job has the same time left that it had when it was checkpointed). You might have a calculation within a job and want to change the parameters of this calculation as the job proceeds. When checkpointing a job for this purpose, you may have to save more data.

6 Virtual Nodes[2,3] Virtual nodes: abstract object representing set of resources on a machine Can be an entire host, nodeboard or a blade and a single host can have multiple vnodes Host with multiple vnodes must have a natural vnode Defines invariant host information and dynamic and shared resources Chunks can be broken up across vnodes on same host

7 Message orientated Middleware[2,3]
MoM: daemon that runs on each execution host and manages jobs Runs any prologs scripts and epilogue scripts At job start, creates new session that is identical to user’s Performs communication with job and other MoMs MoM on first node on which job is running manages all communication with others and is called Mother Superior

8 PBS Server[2,3] PBS Server: maintains file of nodes managed by PBS
On startup MoM sends list of managed vnodes to server Scheduling: basic policy is job placement at host that can satisfy resource requirements – works down the lists of host in order Priority can be modified for execution, when to run each job, and preemption, which queued job can preempt running jobs

9 OpenPBS Extensions/Utilities[5]
Maui Scheduler Updates state information Refreshes reservations Schedules reserved jobs Schedules priority jobs Backfills jobs Updates statistic Handles user requests: Any call to the system Perform next scheduling cycle Each iteration, the scheduler contacts the resource manager and requests up to date information on computer resources, workload, and policy configuration.  6. pestat - Node Resource Monitor collects node stats to check for availability 7. Any call requesting state information, configuration changes, or job or resource manipulation commands. Underlord 1. The system administrator can set thresholds to vary the significance of this stage in relationship with other stages.

10 UnderLord Scheduler Job Age Stage:  This stage considers the time that each job has been waiting in the queue and then assigns a weight based on that value. Job Duration Stage: This stage considers the projected duration of each job and assigns a weight based on that value. In systems running parallel jobs, the administrator can configure this stage to optionally multiply the duration by the number of processors requested. Queue Priority Stage: This stage evaluates the priority specified for each queue on the PBS server, as well as that queues historical system utilization. Each job from the corresponding queue is then provided a weight based on this value.

11 Underlord Scheduler - User Stages
User Share Stage: Each user's fair share of system resources is specified within the configuration file. User Priority Stage: A user can specify a priority for each job submitted to the system. 1.The scheduler will consider the user's historic utilization (in terms of CPU time , wall time, and job count) and will provide the job with an appropriate weight. In the absence of other more significant weight factors, the user should receive his fair share of system resources over time.  2. If a user has multiple weighting jobs, this stage will alter their weight to favor the job with the highest priority. This stage should have no impact on the ordering of jobs submitted by different users. 

12 OpenPBS Industry Usage[7]
NASA’s Pleiades supercomputer used OpenPBS for workload management Each NASA mission has a certain % of CPUs on Pleiades supercomputer A job cannot start if the action causes the mission to exceed its share of computing power PBS was originally developed for NASA under a contract project that began on June 17, 1991 Altair Engineering currently owns and maintains the intellectual property associated with PBS, and also employs the original development team from NASA.

13 Variants of PBS[6] There are several variants of PBS in the industry
One such system is Altair PBS, used to schedule jobs in the CRAY supercomputer company to simulate crash/safety parameter evaluation in vehicles Other versions include TORQUE and PBS Professional They demonstrate the versatility of PBS

14 Work Cited


Download ppt "OpenPBS – Distributed Workload Management System"

Similar presentations


Ads by Google