1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit:

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
CPU Scheduling Questions answered in this lecture: What is scheduling vs. allocation? What is preemptive vs. non-preemptive scheduling? What are FCFS,
1 Multiprocessor and Real-Time Scheduling Chapter 10.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
1 Multiprocessor and Real-Time Scheduling Chapter 10.
1 Multiprocessor and Real-Time Scheduling Chapter 10.
Scheduling in Batch Systems
Uniprocessor Scheduling Chapter 9. Aim of Scheduling The key to multiprogramming is scheduling Scheduling is done to meet the goals of –Response time.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Tao Yang, UCSB CS 240B’03 Unix Scheduling Multilevel feedback queues –128 priority queues (value: 0-127) –Round Robin per priority queue Every scheduling.
Multiprocessor and Real-Time Scheduling Chapter 10.
Chapter 11 Operating Systems
Computer Organization and Architecture
1 When to Switch Processes 3 triggers –System call, Interrupt and Trap System call –when a user program invokes a system call. e.g., a system call that.
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
1 Chapter 4 Threads Threads: Resource ownership and execution.
1 Process Description and Control Chapter 3 = Why process? = What is a process? = How to represent processes? = How to control processes?
A. Frank - P. Weisberg Operating Systems Introduction to Tasks/Threads.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Chapter 10 Multiprocessor and Real-Time Scheduling
Multiprocessor and Real-Time Scheduling
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
Operating System 10 MULTIPROCESSOR AND REAL-TIME SCHEDULING
Computer System Architectures Computer System Software
CSC 360- Instructor: K. Wu CPU Scheduling. CSC 360- Instructor: K. Wu Agenda 1.What is CPU scheduling? 2.CPU burst distribution 3.CPU scheduler and dispatcher.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:
Chapter 10 Multiprocessor and Real-Time Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Chapter 5 – CPU Scheduling (Pgs 183 – 218). CPU Scheduling  Goal: To get as much done as possible  How: By never letting the CPU sit "idle" and not.
IT320 OPERATING SYSTEM CONCEPTS Unit 6: Processor Scheduling September 2012 Kaplan University 1.
Chapter 10 Multiprocessor and Real-Time Scheduling Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community.
Multiprocessor and Real-Time Scheduling Chapter 10.
Multiprocessor and Real-Time Scheduling
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
CPU Scheduling CSCI 444/544 Operating Systems Fall 2008.
ICOM Noack Scheduling For Distributed Systems Classification – degree of coupling Classification – granularity Local vs centralized scheduling Methods.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Threads-Process Interaction. CONTENTS  Threads  Process interaction.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Copyright © Curt Hill More on Operating Systems Continuation of Introduction.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
1 Uniprocessor Scheduling Chapter 3. 2 Alternating Sequence of CPU And I/O Bursts.
Lecturer 5: Process Scheduling Process Scheduling  Criteria & Objectives Types of Scheduling  Long term  Medium term  Short term CPU Scheduling Algorithms.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Multiprocessor, Multicore, and Real-Time Scheduling Chapter 10
Principles of Operating Systems Lecture 16
Processes and threads.
Process Management Process Concept Why only the global variables?
Operating Systems (CS 340 D)
Processes and Threads Processes and their scheduling
William Stallings Computer Organization and Architecture
Operating Systems (CS 340 D)
Department of Computer Science University of California, Santa Barbara
Chapter 5: CPU Scheduling
Symmetric Multiprocessing (SMP)
Multiprocessor and Real-Time Scheduling
Operating System 10 MULTIPROCESSOR AND REAL-TIME SCHEDULING
CS703 – Advanced Operating Systems
Chapter 10 Multiprocessor and Real-Time Scheduling
CS703 - Advanced Operating Systems
CS510 Operating System Foundations
Department of Computer Science University of California, Santa Barbara
Operating System Overview
Concurrency: Threads, Address Spaces, and Processes
Presentation transcript:

1 Multiprocessor Scheduling Module 3.1 For a good summary on multiprocessor and real-time scheduling, visit:

2 Classifications of Multiprocessor Systems Loosely coupled multiprocessor, or clusters –Each processor has its own memory and I/O channels. Functionally specialized processors –Such as I/O processor Nvidia GPGPU –Controlled by a master processor Tightly coupled multiprocessing –MCMP –Processors share main memory –Controlled by operating system –More economical than clusters

3 Types of parallelism Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism  our focus

4 Synchronization Granuality Refers to frequency of synchronization or parallelism among processes in the system Five classes exist –Independent (SI is not applicable) –Very coarse (2000 < SI < 1M) –Course (200 < SI < 2000) –Medium (20 < SI < 200) –Fine (SI < 20) SI is called the Synchronization Interval, and measured in instructions.

5 Wikipedia on Fine-grained, coarse- grained, and embarrassing parallelism Applications are often classified according to how often their subtasks need to synchronize or communicate with each other. An application exhibits fine-grained parallelism if its subtasks must communicate many times per second; it exhibits coarse-grained parallelism if they do not communicate many times per second, and it is embarrassingly parallel if they rarely or never have to communicate. Embarrassingly parallel applications are considered the easiest to parallelize.

6 Independent Parallelism Multiple unrelated processes Separate application or job, e.g. spreadsheet, word processor, etc. No synchronization More than one processor is available –Average response time to users is less

7 Coarse and Very Coarse-Grained Parallelism Very coarse: distributed processing across network nodes to form a single computing environment Coarse: Synchronization among processes at a very gross level Good for concurrent processes running on a multiprogrammed uniprocessor –Can by supported on a multiprocessor with little change

8 Medium-Grained Parallelism Parallel processing or multitasking within a single application Single application is a collection of threads Threads usually interact frequently, leading to a medium-level synchronization.

9 Fine-Grained Parallelism Highly parallel applications Synchronization every few instructions (on very short events). Fill the gap between ILP (instruction level parallelism) and Medium-grained parallelism. Can be found in small inner loops –Use of MPI and OpenMP programming languages OS should not intervene. Usually done by HW In practice, this is very specialized and fragmented area

10 Scheduling Scheduling on a multiprocessor involves 3 interrelated design issues: –Assignment of processes to processors –Use of multiprogramming on individual processors Makes sense for processes (coarse-grained) May not be good for threads (medium-grained) –Actual dispatching of a process What scheduling policy should we use: FCFS, RR, etc. Sometimes a very sophisticated policy becomes counter productive.

11 Assignment of Processes to Processors Treat processors as a pooled resource and assign process to processors on demand Permanently assign process to a processor –Dedicate short-term queue for each processor –Less overhead. Each does its own scheduling on its queue. –Disadvantage: Processor could be idle (has an empty queue) while another processor has a backlog.

12 Assignment of Processes to Processors Global queue –Schedule to any available processor –During the lifetime of the process, process may run on different processors at different times. –In SMP architecture, context switching can be done with small cost. Master/slave architecture –Key kernel functions always run on a particular processor –Master is responsible for scheduling –Slave sends service request to the master –Synchronization is simplified –Disadvantages Failure of master brings down whole system Master can become a performance bottleneck

13 Assignment of Processes to Processors Peer architecture –Operating system can execute on any processor –Each processor does self-scheduling from a pool of available processes –Complicates the operating system Make sure two processors do not choose the same process Needs lots of synchronization

14 Process Scheduling in Today’s SMP M/M/M/K Queueing system –Single queue for all processes –Multiple queues are used for priorities –All queues feed to the common pool of processors Specific scheduling disciplines is less important with more than one processor –A simple FCFS discipline with a static priority may suffice for a multi- processor system. –Illustrate using graph, p460 –In conclusion, specific scheduling discipline is much less important with SMP than UP

15 Threads Executes separate from the rest of the process An application can be a set of threads that cooperate and execute concurrently in the same address space Threads running on separate processors yields a dramatic gain in performance

16 Multiprocessor Thread Scheduling -- Four General Approaches (1/2) Load sharing –Processes are not assigned to a particular processor Gang scheduling –A set of related threads is scheduled to run on a set of processors at the same time

17 Multiprocessor Thread Scheduling -- Four General Approaches (2/2) Dedicated processor assignment –Threads are assigned to a specific processor. (each thread can be run on a processor.) –When program terminates, processors are returned to the processor available pool Dynamic scheduling –Number of threads can be altered during course of execution

18 Load Sharing Load is distributed evenly across the processors No centralized scheduler required –OS runs on every processor to select the next thread. Use global queues for ready threads –Usually FCFS policy

19 Disadvantages of Load Sharing Central queue needs mutual exclusion –May be a bottleneck when more than one processor looks for work at the same time –A noticeable problem when there are many processors. Preemptive threads will unlikely resume execution on the same processor –Cache use is less efficient If all threads are in the global queue, all threads of a program will not gain access to the processors at the same time. –Performance is compromised if coordination among threads is high.

20 Gang Scheduling Simultaneous scheduling of threads that make up a single process Useful for applications where performance severely degrades when any part of the application is not running Threads often need to synchronize with each other

21 Advantages Closely-related threads execute in parallel Synchronization blocking is reduced Less context switching Scheduling overhead is reduced, as a single sync to signal() may affect many threads

Scheduling Groups – two time slicing divisions

23 Dedicated Processor Assignment – Affinitization (1/2) When application is scheduled, each of its threads is assigned a processor that remains dedicated/affinitized to that thread until the application runs to completion No multiprogramming of processors, i.e. one processor per a specific thread, and no other thread. Some processors may be idle as threads may block Eliminates context switches; with certain types of applications this can save enough time to compensate for the possible idle time penalty. However, in a highly parallel environment with 100 of processors, utilization of processors is not a major issue, but performance is. The total avoidance of switching results in substantial speedup.

24 Dedicated Processor Assignment – Affinitization (2/2) In Dedicated Assignment, when the number of threads exceeds the number of available processors, efficiency drops. –Due to thread preemption, context switching, suspending of other threads, and cache pollution Notice that both gang scheduling and dedicated assignment are more concerned with allocation issues than scheduling issues. The important question becomes "How many processors should a process be assigned?" rather than "How shall I choose the next process?"

25 Dynamic Scheduling Threads within a process are variable Sharing the work between OS and application. When a job originates, it requests a certain number of processors. The OS grants some or all of the request, based on the number of processors currently available. Then the application itself can decide which threads run when on which processors. This requires language support as would be provided with thread libraries. When processors become free due to the termination of threads or processes, the OS can allocate them as needed to satisfy pending requests. Simulations have shown that this approach is superior to gang scheduling and or dedicated scheduling.