Chapter 5: CPU Scheduling

Chapter 5: CPU Scheduling

Chapter 5: CPU Scheduling
Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time CPU Scheduling Operating Systems Examples Algorithm Evaluation

Objectives To introduce CPU scheduling, which is the basis for multiprogrammed operating systems To describe various CPU-scheduling algorithms To discuss evaluation criteria for selecting a CPU-scheduling algorithm for a particular system To examine the scheduling algorithms of several operating systems

Shortest-Job-First (SJF) Scheduling
Associate with each process the length of its next CPU burst Use these lengths to schedule the process with the shortest time SJF is optimal – gives minimum average waiting time for a given set of processes The difficulty is knowing the length of the next CPU request Could ask the user If two processes have same length next CPU burst, FCFS scheduling is used to break the tie. Two schemes: nonpreemptive – once CPU given to the process it cannot be preempted until it completes its CPU burst. Preemptive – if a new process arrives with CPU burst length less than remaining time of current executing process, preempt. This scheme is know as the shortest-Remaining-Time-First (SRTF).

Example of Non-preemptive SJF
ProcessArriva l Time Burst Time P P P P SJF scheduling chart Average waiting time = ( ) / 4 = 7

Determining Length of Next CPU Burst
Long-term scheduling (batch system) process time limit specified by the user SJF scheduling is used frequently in long-term scheduling SJF cannot be implemented at the level of short-term CPU scheduling Can only estimate the length – should be similar to the previous one Then pick process with shortest predicted next CPU burst Can be done by using the length of previous CPU bursts, using exponential averaging 1. tn = actual length of nth CPU burst 2. n+1 = predicted value for the next CPU burst 3. , 0    1 3. Define: n+1 =  tn + (1 - ) n Commonly, α set to ½ Preemptive version called shortest-remaining-time-first

Prediction of the Length of the Next CPU Burst

Examples of Exponential Averaging
 =0 n+1 = n Recent history does not count  =1 n+1 = tn Only the actual last CPU burst counts  = 1/2 recent and past history are equally weighted 0 can be a constant or overall system average If we expand the formula, we get: n+1 =  tn+(1 - ) tn -1 + … +(1 -  )j  tn -j + … +(1 -  )n +1 0 Since both  and (1 - ) are less than or equal to 1, each successive term has less weight than its predecessor

Example of Shortest-remaining-time-first
Now we add the concepts of varying arrival times and preemption to the analysis ProcessA arri Arrival TimeT Burst Time P1 0 8 P2 1 4 P3 2 9 P4 3 5 Preemptive SJF Gantt Chart Average waiting time = [(10-1)+(1-1)+(17-2)+5-3)]/4 = 26/4 = 6.5 msec

SRTF discussion SRTF Pros & Cons Optimal (average response time) (+)
Hard to predict future (-) Unfair (-)

Priority Scheduling A priority number (integer) is associated with each process The CPU is allocated to the process with the highest priority (smallest integer  highest priority) Preemptive Nonpreemptive Equal-priority processes are scheduled in FCFS order SJF is priority scheduling where priority is the inverse of predicted next CPU burst time Problem  Starvation – low priority processes may never execute Solution  Aging – as time progresses increase the priority of the process

Example of Priority Scheduling
ProcessA arri Burst TimeT Priority P1 10 3 P2 1 1 P3 2 4 P4 1 5 P5 5 2 Priority scheduling Gantt Chart Average waiting time = 8.2 msec

Round Robin (RR) Each process gets a small unit of CPU time (time quantum q), usually milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue. RR scheduling is implemented as a FIFO queue of processes. New processes are added to the tail of the ready queue. A process may itself leave the CPU if its CPU burst is less than 1 quantum, otherwise timer will cause an interrupt and the process will be put at the tail of the queue

Example: RR with Time Quantum = 20
Process Burst Time P P P P The Gantt chart is: Waiting time for P1=(77-20)+(121-97)= P2=(20-0)= P3=(37-0)+(97-57)+( )= P4=(57-0)+(117-77)= 97 Average waiting time = ( )/4=73 P1 P2 P3 P4 20 37 57 77 97 117 121 134 154 162

Round Robin (RR) Typically, higher average turnaround than SJF, but better response. Round-Robin Pros and Cons: Better for short jobs, Fair (+) Context-switching time adds up for long jobs (-)

Round Robin (RR) If there are n processes in the ready queue and the time quantum is q, then each process gets 1/n of the CPU time in chunks of at most q time units at once. No process waits more than (n-1)q time units. Example 5 processes, time quantum = 20 each process will get up to 20 ms every 80 ms. Performance q large  FCFS q small  processor sharing appears as though each of n processes has its own processor running at 1/n the speed of the real processor q must be large with respect to context switch, otherwise overhead is too high. q usually 10ms to 100ms, context switch < 10 sec

Time Quantum and Context Switch Time

Turnaround Time Varies With The Time Quantum
80% of CPU bursts should be shorter than q

Turnaround Time Varies With The Time Quantum
The average turnaround time can be improved if most processes finish their next CPU burst in a single time quantum. Example 3 processes, CPU burst = 10 for each quantum = 1, average turnaround time = 29 quantum = 10, average turnaround time = 20 If context-switch time is added, average turnaround time will increase for smaller time quantum as there will be more number of context switches. A rule of thumb 80 percent of the CPU bursts should be shorter than the time quantum.

Multilevel Queue Ready queue is partitioned into separate queues, eg:
foreground (interactive) background (batch) Process permanently in a given queue Each queue has its own scheduling algorithm: foreground – RR background – FCFS Scheduling must be done between the queues: Fixed priority scheduling; (i.e., serve all from foreground then from background). Possibility of starvation. Time slice – each queue gets a certain amount of CPU time which it can schedule amongst its processes; i.e., 80% to foreground in RR 20% to background in FCFS

Multilevel Queue Scheduling

Multilevel Feedback Queue
A process can move between the various queues; aging can be implemented this way CPU-bound processes are moved to lower-priority queue. I/O-bound and interactive processes get the higher-priority queue. A process that waits too long in a lower-priority queue is moved to a higher-priority queue. Multilevel-feedback-queue scheduler defined by the following parameters: number of queues scheduling algorithms for each queue method used to determine when to upgrade a process method used to determine when to demote a process method used to determine which queue a process will enter when that process needs service

Example of Multilevel Feedback Queue
Three queues: Q0 – RR with time quantum 8 milliseconds Q1 – RR time quantum 16 milliseconds Q2 – FCFS Scheduling A new job enters queue Q0 which is served FCFS When it gains CPU, job receives 8 milliseconds If it does not finish in 8 milliseconds, job is moved to queue Q1 At Q1 job is again served FCFS and receives 16 additional milliseconds If it still does not complete, it is preempted and moved to queue Q2 Processes in Q2 are run on an FCFS basis, only when Q0 and Q1 are empty

Fairness Unfair in scheduling: STCF: *must* be unfair to be optimal
Favor I/O jobs? Penalize CPU. How do we implement fairness? Could give each queue a fraction of the CPU. But this isn’t always fair. What if there’s one long-running job, and 100 short-running ones? Could adjust priorities: increase priority of jobs, as they don’t get service. This is what’s done in UNIX. Problem is that this is ad hoc – what rate should you increase priorities? And, as system gets overloaded, no job gets CPU time, so everyone increases in priority. The result is that interactive jobs suffer – both short and long jobs have high priority!

Lottery scheduling: random simplicity
Give every job some number of lottery tickets On each time slice, randomly pick a winning ticket. On average, CPU time is proportional to # of tickets given to each job. How do you assign tickets? To approximate SRTF, short running jobs get more, long running jobs get fewer. To avoid starvation, every job gets at least one ticket (so everyone makes progress).

Grace under load change
Adding or deleting a job (and their tickets): affects all jobs proportionately, independent of how many tickets each job has. For example, if short jobs get 10 tickets, and long jobs get 1 each, then: # short jobs / % of CPU each short % of CPU each long # long jobs job gets job gets 1/ % % 0/ NA % 2/ % NA 10/ % % 1/ % %

Thread Scheduling Distinction between user-level and kernel-level threads When threads supported, threads scheduled, not processes Many-to-one and many-to-many models, thread library schedules user-level threads to run on LWP Known as process-contention scope (PCS) since scheduling competition is within the process Typically done via priority set by programmer and are not adjusted by the thread library Preempt the thread currently running in favor of a higher-priority thread no guarantee of time slicing among threads of equal priority Kernel thread scheduled onto available CPU is system-contention scope (SCS) – competition among all threads in system One-to-one model, such as Windows and Linux schedule threads using only SCS

Pthread Scheduling API allows specifying either PCS or SCS during thread creation PTHREAD_SCOPE_PROCESS schedules threads using PCS scheduling PTHREAD_SCOPE_SYSTEM schedules threads using SCS scheduling In the many-to-many model, the PTHREAD_SCOPE_PROCESS policy schedules user-level threads onto available LWPs The number of LWPs is maintained by the thread library The PTHREAD_SCOPE_SYSTEM scheduling policy will create and bind an LWP for each user-level thread The Pthread IPC (Interprocess Communication) provides two functions for setting and getting the contention scope policy: pthread_attr_setscope(pthread_attr_t *attr, int scope) pthread_attr_getscope(pthread_attr_t *attr, int *scope) Can be limited by OS – Linux and Mac OS X only allow PTHREAD_SCOPE_SYSTEM

Pthread Scheduling API
#include <pthread.h> #include <stdio.h> #define NUM_THREADS 5 int main(int argc, char *argv[]) { int i, scope; pthread_t tid[NUM THREADS]; pthread_attr_t attr; /* get the default attributes */ pthread_attr_init(&attr); /* first inquire on the current scope */ if (pthread_attr_getscope(&attr, &scope) != 0) fprintf(stderr, "Unable to get scheduling scope\n"); else { if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD_SCOPE_PROCESS"); else if (scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD_SCOPE_SYSTEM"); else fprintf(stderr, "Illegal scope value.\n"); }

Pthread Scheduling API
/* set the scheduling algorithm to PCS or SCS */ pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM); /* create the threads */ for (i = 0; i < NUM_THREADS; i++) pthread_create(&tid[i],&attr,runner,NULL); /* now join on each thread */ for (i = 0; i < NUM_THREADS; i++) pthread_join(tid[i], NULL); } /* Each thread will begin control in this function */ void *runner(void *param) { /* do some work ... */ pthread_exit(0);

Multiple-Processor Scheduling
CPU scheduling more complex when multiple CPUs are available Homogeneous processors within a multiprocessor: use any available processor to run any processes in the queue Asymmetric multiprocessing – only one processor – the master server, accesses the system data structures, alleviating the need for data sharing -ve: master server becomes a potential bottleneck where overall system performance may be reduced Symmetric multiprocessing (SMP) – each processor is self-scheduling, two strategies for scheduling: all processes in common ready queue; possible race condition each has its own private queue of ready processes; most common

SMP - Multicore Processors
Recent trend to place multiple processor cores on same physical chip Faster and consumes less power Complicates scheduling issues Memory stall - a processor accessing memory spends a significant amount of time waiting for the data to become available modern processors operate at much faster speeds than memory a memory stall can also occur because of a cache miss processor can spend up to 50 percent of its time waiting for data to become available from memory

SMP - Multithreaded Multicore System
Multiple threads per core also growing if one hardware thread stalls while waiting for memory, the core can switch to another thread

Chip multithreading - each hardware thread maintains its architectural state, such as instruction pointer and register set, and thus appears as a logical CPU that is available to run a software thread Hyper-threading (simultaneous multithreading or SMT) - assigning multiple hardware threads to a single processing core. Intel processors—such as the i7—support two threads per core Oracle Sparc M7 processor supports eight threads per core, with eight cores per processor, thus providing the operating system with 64 logical CPUs.

Two ways to multithread a processing core: coarse-grained and fine-grained multithreading Coarse-grained multithreading - a thread executes on a core until a long-latency event such as a memory stall occurs the core must switch to another thread to begin execution cost of switching is high, since the instruction pipeline must be flushed Fine-grained (or interleaved) multithreading - switches between threads at a much finer level of granularity, typically at the boundary of an instruction cycle the architectural design includes logic for thread switching cost of switching between threads is small Resources of the physical core (such as caches and pipelines) must be shared among its hardware threads, => a processing core can only execute one hardware thread at a time. Consequently, a multithreaded, multicore processor actually requires two different levels of scheduling

Level 1 of scheduling decisions operating system chooses which software thread to run on each hardware thread (logical CPU) Level 2 of scheduling decisions how each core decides which hardware thread to run UltraSPARC T3 – round-robin Intel Itanium - assigned to each hardware thread is a dynamic urgency value 0 - 7, with 0 representing the lowest urgency and 7 the highest five different events that may trigger a thread switch thread with the highest urgency value executes on the processor core Two different levels of scheduling are not necessarily mutually exclusive

SMP – Load Balancing If SMP, need to keep all CPUs loaded for efficiency Load balancing attempts to keep workload evenly distributed across all processors in an SMP system Two general approaches to load balancing: push migration and pull migration Push migration – a specific task periodically checks the load on each processor, and if finds imbalance, pushes task from overloaded CPU to other CPUs Pull migration – idle processor pulls a waiting task from a busy processor Push and pull migration need not be mutually exclusive and are, in fact, often implemented in parallel on load-balancing systems

SMP - Processor Affinity
Processor affinity – process has affinity for processor on which it is currently running Successive memory accesses by the thread are often satisfied in cache memory (known as a “warm cache”) Processor migration by threads due to load balancing results in invalidating and repopulating caches, a high cost operation keep a thread running on the same processor and take advantage of a warm cache soft affinity - operating system will attempt to keep a process on a single processor, but it is possible for a process to migrate between processors during load balancing hard affinity - system calls allow a process to specify a subset of processors on which it can run Linux provides both soft and hard affinity Variations including processor sets

Processor Affinity in NUMA systems
The main-memory architecture of a system can affect processor affinity issues NUMA (non-uniform memory access) - two physical processor chips each with their own CPU and local memory all CPUs in a NUMA system share one physical address space; a CPU has faster access to its local memory than to memory local to another CPU Operating system's CPU scheduler and memory-placement algorithms are NUMA-aware a thread scheduled onto a particular CPU can be allocated memory closest to where the CPU resides, providing fastest possible memory access

Chapter 5: CPU Scheduling

Similar presentations

Presentation on theme: "Chapter 5: CPU Scheduling"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 5: CPU Scheduling

Similar presentations

Presentation on theme: "Chapter 5: CPU Scheduling"— Presentation transcript:

Similar presentations

About project

Feedback