Presentation is loading. Please wait.

Presentation is loading. Please wait.

J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System.

Similar presentations


Presentation on theme: "J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System."— Presentation transcript:

1 J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System

2 J. Skovira 5/05 v12 Agenda l Batch Scheduling Basics l LoadLeveler basics l LoadLeveler configuration Basic Commands l Job Submission l Job cancellation l Job monitoring l Job command files l Advanced Functions l Questions and Answers

3 J. Skovira 5/05 v13 Who Needs a Job Scheduler? Single Machine Job 1 Job 2 …. Job N HPC Machine OS multi-tasks single CPU: time-shared scheduling User 1: Job 1 Job 2 …. Job N User 2: Job 1 Job 2 …. Job N User 3: Job 1 Job 2 …. Job N Parallel Dimension Many Machines and Users: More Jobs Parallel Dimension User may impact a distant job Scheduler runs jobs according to: Scheduling Theory Site-defined Policy

4 J. Skovira 5/05 v14 Scheduling Terms HPC Cluster Resource manager Scheduler Start jobs on specific resources at specific times Job Queue Job 1 Job 2 Job 3 …. Batch Scheduler

5 J. Skovira 5/05 v15 More Tasks for User? Job Command File is a small set of job directives Job Command files can be “borrowed” from samples Simple Command files take predefined defaults Experienced users may enhance command files Application Code Job Meta Data Once control is handed to the job, scheduler is out of the way

6 J. Skovira 5/05 v16 LoadLeveler Components Loadleveler Central Manager Negotiator Daemon IBM Cluster Worker Nodes Startd daemon Schedd Machine High Performance Switch

7 J. Skovira 5/05 v17 LoadLeveler Components

8 J. Skovira 5/05 v18 Priority and Scheduling Jobs arrive: from different users at different time in different job classes with different priorities Job A 82 Job B121 Job C101 Job D 41 Job E 45 JobE JobA JobC JobD JobB Loadleveler sorts the job queue Loadleveler schedules the jobs in queue order

9 J. Skovira 5/05 v19 Reservation vs Backfill Reservation (standard) Scheduling Top job waits a short time for resources to free Defer if not available Backfill Top job starts if it can If not enough resources, compute when available which resources job will use Backfill jobs onto available nodes Backfill superior for parallel machines

10 J. Skovira 5/05 v110 Backfill Job Queue JobNodesTime Job A 82 Job B121 Job C101 Job D 41 Job E 45

11 J. Skovira 5/05 v111 Backfill Job Queue JobNodesTime Job A 82 Job B121 Job C101 Job D 41 Job E 45

12 J. Skovira 5/05 v112 Job Command File Basics Command file contains job “directives” Basic items include: Shell Class Input/output directories Notification control Queue keyword 2 ways to specify job executable: Executable keyword Script invocation after the keyword Application Code Job Command File

13 J. Skovira 5/05 v113 Basic Job Command File #!/bin/ksh # @ class = demo # @ queue perlspin2 > /tmp

14 J. Skovira 5/05 v114 More Job Command File Keywords Requirements allow you to select: I/O directives Node requirements Wallclock limit Locally defined requirements Etc… notification controls what LL sends about the job From never to always notify_user tells LL where to send job info An email address

15 J. Skovira 5/05 v115 Serial Job Command File #!/bin/ksh # @ error =./out/job2.$(jobid).err # @ output =./out/job2.$(jobid).out # @ wall_clock_limit = 180 # @ class = demo # @ notification = complete # @ notify_user = skoviraj@us.ibm.com # @ queue perlspin2

16 J. Skovira 5/05 v116 Communication on the System Each node has a connection to the high-performance switch There are 2 ways to use the switch ip mode "unlimited" channels slower communication performance User space mode limited number of channels faster than ip mode Can be selected in job command file

17 J. Skovira 5/05 v117 Parallel Job Command File Keywords node How many nodes your job requires tasks_per_node How many tasks will run on each node network How your job will communicate wall_clock_limit An estimate of how long your job runs

18 J. Skovira 5/05 v118 The Network Keyword network.protocol = network_type, usage, mode protocol: MPI, LAPI, PVM network_type: sn_single or sn_all for switch adapter usage: shared or not_shared mode: IP, US An example: # @ network.MPI = sn_single, shared, us

19 J. Skovira 5/05 v119 Parallel Job Command File #!/bin/ksh # @ job_type = parallel # @ node = 1 # @ tasks_per_node = 4 # @ error =./out/job3.$(jobid).err # @ output =./out/job3.$(jobid).out # @ wall_clock_limit = 05:00 # @ class = demo # @ notification = complete # @ notify_user = skovira@tc.cornell.edu # @ network.MPI = sn_all,shared,us # @ queue poe perlspin2

20 J. Skovira 5/05 v120 Basic Loadleveler Commands llsubmit – submits a job to Loadleveler llcancel – cancels a submitted job llq – queries the status of jobs in the job queue llstatus – queries the status of machines in the cluster

21 J. Skovira 5/05 v121 llq example v01n08:/u/skoviraj $ llsubmit mybasic.cmd llsubmit: The job "v01n08.vendor.pok.ibm.com.205" has been submitted Id Owner Submitted ST PRI Class Running On ------------------------ ---------- ----------- -- --- ------------ ----------- v01n08.204.0 skoviraj 11/11 22:29 R 50 No_Class v01n02 v01n08.205.0 skoviraj 11/11 22:30 R 50 No_Class v01n02 v01n08.203.0 skoviraj 11/11 22:28 I 50 No_class 3 job steps in queue, 1 waiting, 0 pending, 2 running, 0 held v01n08:/u/skoviraj $ llq

22 J. Skovira 5/05 v122 llstatus example v01n08:/u/skoviraj/suspender1.0/suspender_stuff $ llstatus v01n02 Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys v01n02.vendor.pok.ibm.com Avail 0 0 Run 1 0.00 9999 R6000 AIX43 v01n08:/u/skoviraj/suspender1.0/suspender_stuff $ llstatus | more Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys v01n01.vendor.pok.ibm.com Avail 0 0 Idle 0 0.05 9999 R6000 AIX43 v01n02.vendor.pok.ibm.com Avail 0 0 Run 1 0.00 9999 R6000 AIX43 v01n03.vendor.pok.ibm.com Avail 0 0 Idle 0 0.00 9999 R6000 AIX43 v01n04.vendor.pok.ibm.com Avail 0 0 Idle 0 0.00 9999 R6000 AIX43 v01n05.vendor.pok.ibm.com Avail 0 0 Idle 0 0.02 9999 R6000 AIX43 v01n06.vendor.pok.ibm.com Avail 0 0 Idle 0 0.05 9999 R6000 AIX43 v01n07.vendor.pok.ibm.com Avail 1 0 Idle 0 0.06 155 R6000 AIX43 v01n08.vendor.pok.ibm.com Avail 1 0 Idle 0 0.00 83 R6000 AIX43 v01n09.vendor.pok.ibm.com Avail 0 0 Idle 0 0.00 9999 R6000 AIX43

23 J. Skovira 5/05 v123 llctl Examples llctl -h hostname command Useful Commands: reconfig - Forces all daemons to reread the configuration files. start - Starts the LoadLeveler daemons on the specified machine. stop - Stops the LoadLeveler daemons on the specified machine. Commands sometimes used: flush - Terminates running jobs on this machine, places jobs in idle recycle - Stops all LoadLeveler daemons and restarts them.

24 J. Skovira 5/05 v124 llctl Example drain [schedd|startd [classlist |allclasses]] With no options: (1) no more LoadLeveler jobs can begin running on this machine, (2) no more LoadLeveler jobs can be submitted through this machine. When you issue drain schedd, the following happens: (1) the schedd machine accepts no more LoadLeveler jobs for submission. (2) jobs in the Starting or Running state in the queue are allowed to continue running. (3) jobs in the Idle state in the schedd queue are drained When you issue drain startd, the following happens: (1) the startd machine accepts no more LoadLeveler jobs to be run (2) jobs already running on the startd machine are allowed to complete.

25 J. Skovira 5/05 v125 More Loadleveler Commands llclass - returns information about available classes llprio - changes the user priority of a job step

26 J. Skovira 5/05 v126 llclass Example v60n129:/u/skoviraj $ llclass -l X_Class =============== Class X_Class =============== Name: X_Class Priority: 0 Exclude_Users: Include_Users: Exclude_Groups: Include_Groups: Admin: NQS_class: F NQS_submit: NQS_query: Max_processors: -1 Maxjobs: -1 Resource_requirement: Class_comment: Class_ckpt_dir: Ckpt_limit: undefined, undefined Wall_clock_limit: 11+13:46:39, 11+13:46:39 (999999 seconds, 999999 seconds) Job_cpu_limit: undefined, undefined … v60n129:/u/skoviraj $ llclass Name MaxJobCPU MaxProcCPU Free Max Description d+hh:mm:ss d+hh:mm:ss Slots Slots --------------- -------------- -------------- ----- ----- --------------------- inter_class undefined undefined 192 192 X_Class undefined undefined 192 192

27 J. Skovira 5/05 v127 llprio Example v01n07:/u/skoviraj/suspender1.0/suspender_stuff $ llq Id Owner Submitted ST PRI Class Running On v01n07.137.0 skoviraj 11/11 22:51 I 50 No_class 1 job steps in queue, 1 waiting, 0 pending, 0 running, 0 held v01n07:/u/skoviraj/suspender1.0/suspender_stuff $ llprio -p 100 v01n07.137.0 llprio: Priority command has been sent to the central manager. v01n07:/u/skoviraj/suspender1.0/suspender_stuff $ llq Id Owner Submitted ST PRI Class Running On v01n07.137.0 skoviraj 11/11 22:51 I 100 No_class 1 job steps in queue, 1 waiting, 0 pending, 0 running, 0 held

28 J. Skovira 5/05 v128 Advanced Topics Job Preemption Job Checkpointing Submit filter Loadleveler APIs (data access, scheduling) Workload Manager (WLM) integration Advance Reservation Consumable resource control

29 J. Skovira 5/05 v129 Job Suspension 4 way restarts 16 way job runs 4 Node job runs 4 Node suspended 16 way job completes

30 J. Skovira 5/05 v130 Job Checkpoint 4 way restarts from saved state 16 way job runs 4 Node job runs 4 Node Checkpoints and ends 16 way job completes 4 Node job state saved GPFS

31 J. Skovira 5/05 v131 Submit Filter $NetKey = FALSE; while ( ) { chomp($value = $_); if ( $value =~ /network/ ) { # If we find the network keyword.... $NetKey = TRUE; # remember it! } if ( $value =~ /queue/ ) { # If at the end of LL keywords for this job step... if ( $NetKey eq FALSE ) { # if No network keyword... # Add one which uses the switch print "# @ network.MPI = sn_all,not_shared,US\n" } $NetKey = FALSE; # Reset network keyword memory } print "$value\n"; # Copy a single ll cmd file line to new cmd file }

32 J. Skovira 5/05 v132 Tips for Efficient Job Processing Assumptions: One task per CPU Classes Configured Get your job to the TOP of the queue: Short run Small number of nodes Use ip communication over the switch Priority? Submit during low use periods (evening) These are FREE! all above tips (except priority) will impact no other job

33 J. Skovira 5/05 v133 More Tips for Efficient Job Processing Allow your job to run as QUICKLY as possible: Balance node operations Keep data entirely in physical memory Use processors of similar types (system admin?) Use distributed data load and store Profile your applications for efficient compiler use This could be an entirely new presentation!

34 J. Skovira 5/05 v134 Questions and Answers


Download ppt "J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System."

Similar presentations


Ads by Google