GangLL Gang Scheduling on the IBM SP Andy B. Yoo and Morris A. Jette Lawrence Livermore National Laboratory.

Slides:



Advertisements
Similar presentations
CPU Scheduling.
Advertisements

Processes Management.
 Basic Concepts  Scheduling Criteria  Scheduling Algorithms.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 5: CPU Scheduling.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
CS 311 – Lecture 23 Outline Kernel – Process subsystem Process scheduling Scheduling algorithms User mode and kernel mode Lecture 231CS Operating.
Chapter 8 – Processor Scheduling Outline 8.1 Introduction 8.2Scheduling Levels 8.3Preemptive vs. Nonpreemptive Scheduling 8.4Priorities 8.5Scheduling Objectives.
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Chapter 5-CPU Scheduling
Chapter 11 Operating Systems
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
Chapter 10 Operating Systems.
 Scheduling  Linux Scheduling  Linux Scheduling Policy  Classification Of Processes In Linux  Linux Scheduling Classes  Process States In Linux.
Chapter 5: CPU Scheduling (Continuation). 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Determining Length.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
CPU Scheduling Chapter 6 Chapter 6.
Scheduling. Objectives – Fairness – Maximize throughput – Maximize the number of users receiving acceptable response times – Minimize overhead – Balance.
Chapter 6 CPU SCHEDULING.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Scheduling. Alternating Sequence of CPU And I/O Bursts.
Multiprocessor and Real-Time Scheduling Chapter 10.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
Operating Systems Process Management.
CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 5: CPU Scheduling Basic Concepts Scheduling Criteria.
10-1 Software Categories Application software Software written to address specific needs—to solve problems in the real world Word processing programs,
Chapter 5: Process Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Basic Concepts Maximum CPU utilization can be obtained.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Chapter 5: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 5: CPU Scheduling Basic.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
6.1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 3.
1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.
1 Uniprocessor Scheduling Chapter 3. 2 Alternating Sequence of CPU And I/O Bursts.
Chapter 4 CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
Chapter 2.2 : Process Scheduling
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Omega: flexible, scalable schedulers for large compute clusters
CPU Scheduling G.Anuradha
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 5: CPU Scheduling
3: CPU Scheduling Basic Concepts Scheduling Criteria
Chapter5: CPU Scheduling
Chapter 5: CPU Scheduling
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Operating System Concepts
Chapter 5: CPU Scheduling
Lecture 2 Part 3 CPU Scheduling
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
Lecture Topics: 11/1 Hand back midterms
CPU Scheduling: Basic Concepts
Module 5: CPU Scheduling
Presentation transcript:

GangLL Gang Scheduling on the IBM SP Andy B. Yoo and Morris A. Jette Lawrence Livermore National Laboratory {yoo2, Jose Moreira and Liana Fong IBM T. J. Watson Research Center {jmoreira,

Gang Scheduling Overview  Permits time-sharing or preemption of parallel jobs »All tasks of a parallel job are grouped into a “gang” then suspended and resumed synchronously  Adds another dimension to scheduling »Virtual machines created as needed  Responsiveness improved by 30% in our tests »More users/jobs can make make progress (at a reduced rate) »High priority work started quickly  Utilization improved »Large jobs can be started without accumulating resources piecemeal while smaller jobs complete »Well utilized virtual machines can be allocated more resources

Job A Job B Job C Job D Job E Job X High Priority Time 0:00 High Priority Job Arrives Running Time 0:01 Job A Job B Empty Job D Job E Time 0:30 Running Empty Job B Empty Job D Job E Time 1:00 Running Empty Job D Job E Time 1:30 Running Empty Job X High Priority Job E Running Time 2:00 LoadLeveler Scheduling Example Poor Responsiveness and System Utilization Initiation delayed for 2 hours

Job A Job B Job C Job D Job E Job X High Priority Initial State New Arrival Job A Job X High Priority Job E New Configuration (seconds later) Job B Job C Job D Running Stopped Until Job X Completes Running GangLL Scheduling: Preemption Good Responsiveness and System Utilization

Job A Job B Job C Job D Job E Time 0 Job A Job X High Priority Job F Time 1 Job A Job X High Priority Job E Time 2 Job A Job B Job C Job D Job F Time 3 Job A Job X High Priority Job E Time 4 Job A Job X High Priority Job F Time 5Etc.... GangLL Scheduling: Timesharing Good Responsiveness and System Utilization

GangLL Design  Built into LoadLeveler  Can schedule multiple jobs per node  Global time scheduling performed by GangLL central manager »Scheduling matrix changes when jobs initiated or terminated »Scheduling matrix distributed to nodes as needed »Each node follows its individual schedule »Nodes must have synchronized clocks  Context switching »All processes of a job stop (SIGSTOP) or resume (SIGCONT) synchronously »User Space switch window state saved/resumed at context switch time by Communications Sub-System (CSS)  Joint LLNL and IBM design and development effort

Scheduling Matrix Distribution Scheduling Matrix from GangLL Central Manager Node 1 Node 2 Node 3 Node 5 Node 4Node 6Node 7 New matrix is divided and propagated through nodes with acknowledgement and commit before taking effect Each node keeps only its column of the scheduling matrix and operates according to that

GangLL Configurability  Each LoadLeveler job class has several scheduling parameters »Which job classes it can time-share with –How large of a slice should it get relative to other job classes in a time-sharing mode? »Which job classes it can preempt (stop)  Each node has a multi-programming level (number of concurrent jobs allowed)  Duration of time-slice configurable »Typical values 15 seconds to 15 minutes

 Only thread-safe User Space communications support preemption (due to manpower constraints) »MPL not supported (impacts MPICH-G)  Jobs must link with thread safe libraries OR use IP communications OR use non-preemptable job class »Size and time limits on non-preemptable job classes may be restricted »Jobs will be killed if a GangLL preemption request is ignored GangLL User Considerations

 Ptrace, DPCL, and TotalView jobs need to be made non-preemptable »TotalView modifies application with time-critical connections »LLNL version of TotalView integrated with GangLL tool  Real-time clock no longer reflects actual run time »Run time clock to be added at a later time »Use clock() function for CPU time used for now  The xgang tool will show real-time scheduling activities GangLL User Considerations

Sample xgang Displays

 Context switches are fast if paging is not induced »Paging is painfully slow (79 minutes for 2.5GB in+out) »Most LLNL applications have modest memory demands »Development underway to avoid paging by preventing large memory jobs from being time-shared  Need to avoid oversubscribing disk space »Does not appear to be a problem at LLNL  It is configurable which LoadLeveler job classes can preempt or time-share with other specific job classes  Job can also explicitly be preempted or made non- preemptable Context Switch Issues

LLNL Configuration Before GangLL  Parallel debug class - fast response for development »Only have access to a small portion of computer (8 nodes) »Small (4 node) and short (1 hour) jobs  Parallel batch class - “production” jobs »Majority of machine resources (315 nodes) »Relatively short time limits to provide daytime responsiveness –8AM-5PM 2 hours (OK for development, but too short for production work) –5PM-8AM weeknights <=8 hours (gradually lowered through morning) –5PM Friday to 8AM Monday <=12 hours (gradually lowered through morning) »Scheduled with backfill algorithm »Scheduling large node count jobs wastes significant resources due to limited selection of jobs for backfill  Expedited jobs - To be run ASAP »Uses parallel batch class nodes »Other jobs are manually terminated to free resources pdebug pbatch + expedited

LLNL Configuration After Gang Scheduling (preliminary)  Parallel debug class - responsive development runs »Separate partition eliminated »Larger job sizes (32 node) and longer run times (2 hours) permitted, access to all nodes permitted on demand  Parallel batch class - “production” jobs »Longer run times permitted without reducing responsiveness –Always 12 hours (or more?)  Expedited class - To be run ASAP »Other jobs are automatically preempted to free resources  Non-stop class - Jobs which can not be preempted (new) »Access to limited node count and shorter run times  Large job class - Jobs with large node counts »Can preempt jobs as desired, no need to accumulate resources, backfill less critical (LLNL workload has few short running jobs)

GangLL Status  In production use at LLNL since November 1999  Not an IBM supported product, but limited support is available  Management of memory to avoid paging still needs work »Essential for fully operational system »Need accurate estimate of job’s memory need (user issue) »Need enforcement of memory limits (GangLL + AIX issue)  Biggest technical problem is slow AIX paging »Being addressed