Scheduling.

Scheduling

Review: Process Manager
Program Process Abstract Computing Environment Process Description File Manager Process Mgr Protection Deadlock Synchronization Device Manager Memory Manager Resource Manager Scheduler Devices Memory CPU Other H/W

Scheduling Scheduling mechanism is the part of the process manager that handles the removal of the running process of CPU and the selection of another process on the basis of a particular strategy Scheduler chooses one from the ready threads to use the CPU when it is available Scheduling policy determines when it is time for a thread to be removed from the CPU and which ready thread should be allocated the CPU next

Thread Scheduler Organization
Ready List Scheduler CPU Resource Manager Resources Preemption or voluntary yield Allocate Request Done New Thread job “Ready” “Running” “Blocked”

Thread Scheduling New threads put into ready state and added to ready list Running thread may cease using CPU for any of four reasons Thread has completed its execution Thread requests a resource, but can’t get it Thread voluntarily releases CPU Thread is preempted by scheduler and involuntarily releases CPU Scheduling policy determines which thread gets the CPU and when a thread is preempted

The Scheduler Organization
Ready Process Enqueuer Ready List Dispatcher Context Switcher Process Descriptor CPU From Other States Running Process

Scheduling Mechanism Depends on hardware 3 logical parts
Need a clock device Rest of scheduler implemented in OS software 3 logical parts Enqueuer Dispatcher Context switcher For a conference room - the mechanism for scheduling is a chart Policy may prevent a group of warehouse workers from throwing a B-day party when the president needs the room

Enqueuer Adds processes which are ready to run to the ready list
May compute the priority for the waiting process (could also be determined by the dispatcher)

Context Switcher Saves contents of processor registers for a process being taken off the CPU If hardware has more than one set of processor registers, OS may just switch between sets Typically, one set is used for supervisor mode, others for applications Depending on OS, process could be switched out voluntarily or involuntarily

Dispatcher Dispatcher module gives control of the CPU to the process selected by the short-term scheduler; this involves: switching context From process, to dispatcher, to new process switching to user mode jumping to the proper location in the user program to restart that program

Process/Thread Context
Rn . . . Status Registers Functional Unit Left Operand Right Operand Result ALU PC IR Ctl Unit

Context Switching Old Thread Descriptor CPU New Thread Descriptor

Context Switch Timing Context switching is a time-consuming process
Assuming n general registers and m status registers, each requiring b store operations, and K time units to perform a store, the time required is (n + m) b × K time units Then a processor requiring 50 ns to store 1 unit of information, and assuming that n = 32 and m = 8, the time required is 2000 ns = 2 μs A complete context switch involves removing process, loading dispatcher, removing dispatcher, loader new process  at least 8 μs required Note that a 1 Ghz processor could have executed about 4,000 instructions in this time Some processors use several sets of registers to reduce this switching time (one set for supervisor mode, the other for user)

Invoking the Scheduler
Need a mechanism to call the scheduler Voluntary call Process blocks itself Calls the scheduler Non-preemptive scheduling Involuntary call External force (interrupt) blocks the process Preemptive scheduling

Voluntary CPU Sharing Each process will voluntarily share the CPU
By calling the scheduler periodically The simplest approach Requires a yield instruction to allow the running process to release CPU yield(pi.pc, pj.pc) { memory[pi.pc] = PC; PC = memory[pj.pc]; }

Voluntary CPU Sharing – cont.
pi can be “automatically” determined from the processor status registers So function can be written as yield(*, pj.pc) { memory[pi.pc] = PC; PC = memory[pj.pc]; }

Scheduler as CPU Resource Manager
Ready List Scheduler Ready to run Release Dispatch Release Dispatch Release Process Must schedule the use of a shared resource Process yields to scheduler who yields to new process … Process/scheduler// Process/scheduler// Process/scheduler// Process/scheduler// Process/scheduler/… Dispatch Units of time for a time-multiplexed CPU

More on Yield pi and pj can resume one another’s execution
yield(*, pj.pc); . . . yield(*, pi.pc); Suppose pj is the scheduler: // p_i yields to scheduler yield(*, pj.pc); // scheduler chooses pk yield(*, pk.pc); // pk yields to scheduler // scheduler chooses ...

Voluntary Sharing Nonpreemptive Scheduler
- Scheduler using voluntary CPU sharing - used in Xerox Alto PC - used in earlier version of Mac OS Problem: What happens if some processes do not voluntarily yield? What if some geek in a time-sharing environment decides to write a problem with any yields? Same idea as congestion control in TCP protocol - app must backoff voluntarily

Every process periodically yields to the scheduler Relies on correct process behavior Malicious Accidental Need a mechanism to override running process

Involuntary CPU Sharing
Periodic involuntary interruption Through an interrupt from an interval timer device Which generates an interrupt whenever the timer expires The scheduler will be called in the interrupt handler A scheduler that uses involuntary CPU sharing is called a preemptive scheduler

Programmable Interval Timer
InterruptCount--; if (InterruptCount <= 0) { InterruptRequest = TRUE; InterruptCount = K } SetInterval( <programmableValue>) { K = <programmableValue>; InterruptCount = K; Interrupt occurs every K clock ticks

Involuntary CPU Sharing – cont
Interval timer device handler Keeps an in-memory clock up-to-date Invokes the scheduler IntervalTimerHandler() { Time++; // update the clock TimeToSchedule--; if(TimeToSchedule <= 0) { <invoke scheduler>; TimeToSchedule = TimeSlice; }

Contemporary Scheduling
Involuntary CPU sharing – timer interrupts Time quantum determined by interval timer – usually fixed size for every process using the system Sometimes called the time slice length

Choosing a Process To Run
Mechanism never changes Strategy = policy the dispatcher uses to select a process from the ready list Different policies for different requirements

Policy Considerations
Policy can control/influence: CPU utilization Average time a process waits for service Average amount of time to complete a job Could strive for any of: Equitability Favor very short or long jobs Meet priority requirements Meet deadlines

Optimal Scheduling Suppose the scheduler knows each process pi’s service time, t(pi) -- or it can estimate each t(pi) : Policy can optimize on any criteria, e.g., CPU utilization Waiting time Deadline To find an optimal schedule: Have a finite, fixed # of pi Know t(pi) for each pi Enumerate all schedules, then choose the best

However ... The t(pi) are almost certainly just estimates
General algorithm to choose optimal schedule is O(n2) Other processes may arrive while these processes are being serviced Usually, optimal schedule is only a theoretical benchmark – scheduling policies try to approximate an optimal schedule

Strategy Selection The scheduling criteria will depend in part on the goals of the OS and on priorities of processes, fairness, overall resource utilization, throughput, turnaround time, response time, and deadlines

Process Model and Metrics
P will be a set of processes, p0, p1, ..., pn-1 S(pi) is the state of pi {running, ready, blocked} τ(pi), the service time The amount of time pi needs to be in the running state before it is completed W (pi), the waiting time The time pi spends in the ready state before its first transition to the running state TTRnd(pi), turnaround time The amount of time between the moment pi first enters the ready state and the moment the process exits the running state for the last time

Simplified Model Simplified, but still provides analysis results
Ready List Scheduler CPU Resource Manager Resources Allocate Request Done New Process job “Ready” “Running” “Blocked” Preemption or voluntary yield Simplified, but still provides analysis results Easy to analyze performance No issue of voluntary/involuntary sharing

Estimating CPU Utilization
New Process Ready List Scheduler CPU Done Let l = the average rate at which processes are placed in the Ready List, arrival rate Let μ = the average service rate  1/ μ = the average t(pi) l pi per second System Each pi uses 1/ μ units of the CPU

Estimating CPU Utilization
New Process Ready List Scheduler CPU Done Let l = the average rate at which processes are placed in the Ready List, arrival rate Let μ = the average service rate  1/ μ = the average t(pi) Let r = the fraction of the time that the CPU is expected to be busy r = # pi that arrive per unit time * avg time each spends on CPU r = l * 1/ μ = l/ μ Note: must have l < m (i.e., r < 1) What if r approaches 1?

Optimization Criteria
Max CPU utilization Max throughput Min turnaround time Min waiting time Min response time Which one to use depends on the system’s design goal

Nonpreemptive Schedulers
Blocked or preempted processes New Process Ready List Scheduler CPU Done Try to use the simplified scheduling model Only consider running and ready states Ignores time in blocked state: New process created when it enters ready state Process is destroyed when it enters blocked state Really just looking at “small phases” of a process

Everyday scheduling methods
First-come, first served (FCFS) Shorter jobs first (SJF) or Shortest job next (SJN) Higher priority jobs first Job with the closest deadline first

FCFS at the supermarket

SJF at the supermarket

Gantt Chart Used to illustrate deterministic schedules
Dependencies of a process on other processes Plots processor(s) against time Shows which processes on executing on which processors at which times Also shows idle time, so illustrates the utilization of each processor In following, will only assume one processor

First-Come-First-Served
Assigns priority to processes in the order in which they request the processor i τ(pi) 350 1 125 2 475 3 250 4 75

First-Come-First-Served – cont.
i t(pi) p0 TTRnd(p0) = t(p0) = 350 W(p0) = 0 350

i t(pi) 350 475 p0 p1 TTRnd(p0) = t(p0) = 350 TTRnd(p1) = (t(p1) +TTRnd(p0)) = = 475 W(p0) = 0 W(p1) = TTRnd(p0) = 350

i t(pi) 475 950 p0 p1 p2 TTRnd(p0) = t(p0) = 350 TTRnd(p1) = (t(p1) +TTRnd(p0)) = = 475 TTRnd(p2) = (t(p2) +TTRnd(p1)) = = 950 W(p0) = 0 W(p1) = TTRnd(p0) = 350 W(p2) = TTRnd(p1) = 475

i t(pi) 950 1200 p0 p1 p2 p3 TTRnd(p0) = t(p0) = 350 TTRnd(p1) = (t(p1) +TTRnd(p0)) = = 475 TTRnd(p2) = (t(p2) +TTRnd(p1)) = = 950 TTRnd(p3) = (t(p3) +TTRnd(p2)) = = 1200 W(p0) = 0 W(p1) = TTRnd(p0) = 350 W(p2) = TTRnd(p1) = 475 W(p3) = TTRnd(p2) = 950

i t(pi) 1200 1275 p0 p1 p2 p3 p4 TTRnd(p0) = t(p0) = 350 TTRnd(p1) = (t(p1) +TTRnd(p0)) = = 475 TTRnd(p2) = (t(p2) +TTRnd(p1)) = = 950 TTRnd(p3) = (t(p3) +TTRnd(p2)) = = 1200 TTRnd(p4) = (t(p4) +TTRnd(p3)) = = 1275 W(p0) = 0 W(p1) = TTRnd(p0) = 350 W(p2) = TTRnd(p1) = 475 W(p3) = TTRnd(p2) = 950 W(p4) = TTRnd(p3) = 1200

FCFS Average Wait Time Easy to implement Ignores service time, etc
i t(pi) p0 p1 p2 p3 p4 TTRnd(p0) = t(p0) = 350 TTRnd(p1) = (t(p1) +TTRnd(p0)) = = 475 TTRnd(p2) = (t(p2) +TTRnd(p1)) = = 950 TTRnd(p3) = (t(p3) +TTRnd(p2)) = = 1200 TTRnd(p4) = (t(p4) +TTRnd(p3)) = = 1275 W(p0) = 0 W(p1) = TTRnd(p0) = 350 W(p2) = TTRnd(p1) = 475 W(p3) = TTRnd(p2) = 950 W(p4) = TTRnd(p3) = 1200 Wavg = ( )/5 = 2975/5 = 595 1275 1200 950 475 350 Easy to implement Ignores service time, etc Not a great performer

Predicting Wait Time in FCFS
In FCFS, when a process arrives, all in ready list will be processed before this job Let μ be the service rate Let L be the ready list length Wavg(p) = L*1/μ + 0.5* 1/ μ = L/ μ +1/(2 μ) (in queue) (active process) Compare predicted wait with actual in earlier examples

Example: Process Burst Time P P P Suppose that the processes arrive in the order: P1, P2 , P3 The Gantt Chart for the schedule is: Waiting time for P1 = 0; P2 = 24; P3 = 27 Average waiting time: ( )/3 = 17 P1 P2 P3 24 27 30

Suppose that the processes arrive in the order P2 , P3 , P1 The Gantt chart for the schedule is: Waiting time for P1 = 6; P2 = 0; P3 = 3 Average waiting time: ( )/3 = 3 Much better than previous case. P1 P3 P2 6 3 30

Shortest-Job-Next Scheduling
Associate with each process the length of its next CPU burst. Use these lengths to schedule the process with the shortest time. SJN is optimal gives minimum average waiting time for a given set of processes.

Shortest-Job-Next Scheduling – cont.
Two schemes: non-preemptive – once CPU given to the process it cannot be preempted until completes its CPU burst. Preemptive – if a new process arrives with CPU burst length less than remaining time of current executing process, preempt. This scheme is know as the Shortest-Remaining-Time-Next (SRTN).

Shortest Job Next (nonpreemptive)
i t(pi) 75 p4 TTRnd(p4) = t(p4) = 75 W(p4) = 0

Shortest Job Next – cont.
i t(pi) 75 200 p4 p1 TTRnd(p1) = t(p1)+t(p4) = = 200 TTRnd(p4) = t(p4) = 75 W(p1) = 75 W(p4) = 0

i t(pi) 75 200 450 p4 p1 p3 TTRnd(p1) = t(p1)+t(p4) = = 200 TTRnd(p3) = t(p3)+t(p1)+t(p4) = = 450 TTRnd(p4) = t(p4) = 75 W(p1) = 75 W(p3) = 200 W(p4) = 0

i t(pi) 75 200 450 800 p4 p1 p3 p0 TTRnd(p0) = t(p0)+t(p3)+t(p1)+t(p4) = = 800 TTRnd(p1) = t(p1)+t(p4) = = 200 TTRnd(p3) = t(p3)+t(p1)+t(p4) = = 450 TTRnd(p4) = t(p4) = 75 W(p0) = 450 W(p1) = 75 W(p3) = 200 W(p4) = 0

i t(pi) 75 200 450 800 1275 p4 p1 p3 p0 p2 TTRnd(p0) = t(p0)+t(p3)+t(p1)+t(p4) = = 800 TTRnd(p1) = t(p1)+t(p4) = = 200 TTRnd(p2) = t(p2)+t(p0)+t(p3)+t(p1)+t(p4) = = 1275 TTRnd(p3) = t(p3)+t(p1)+t(p4) = = 450 TTRnd(p4) = t(p4) = 75 W(p0) = 450 W(p1) = 75 W(p2) = 800 W(p3) = 200 W(p4) = 0

i t(pi) Minimizes wait time May starve large jobs Must know service times 75 200 450 800 1275 p4 p1 p3 p0 p2 W(p0) = 450 W(p1) = 75 W(p2) = 800 W(p3) = 200 W(p4) = 0 TTRnd(p0) = t(p0)+t(p3)+t(p1)+t(p4) = = 800 TTRnd(p1) = t(p1)+t(p4) = = 200 TTRnd(p2) = t(p2)+t(p0)+t(p3)+t(p1)+t(p4) = = 1275 TTRnd(p3) = t(p3)+t(p1)+t(p4) = = 450 TTRnd(p4) = t(p4) = 75 Wavg = ( )/5 = 1525/5 = 305

Determining Length of Next CPU Burst
Can only estimate the length. Can be done by using the length of previous CPU bursts, using exponential averaging.

Exponential Averaging
 =0 n+1 = n Recent history does not count.  =1 n+1 = tn Only the actual last CPU burst counts. If we expand the formula, we get: n+1 =  tn+(1 - )  tn-1 + …+(1 -  )j  tn-j + …+(1 -  )n+1 0 Since both  and (1 - ) are less than or equal to 1, each successive term has less weight than its predecessor.

Priority Scheduling In priority scheduling, processes/threads are allocated to the CPU based on the basis of an externally assigned priority A commonly used convention is that lower numbers have higher priority Static priorities vs. dynamic priorities Static priorities are computed once at the beginning and are not changed Dynamic priorities allow the threads to become more or less important depending on how much service it has recently received

Priority Scheduling – cont.
There are non-preemptive and preemptive priority scheduling algorithms Preemptive nonpreemptive SJN is a priority scheduling where priority is the predicted next CPU burst time. FCFS is a priority scheduling where priority is the arrival time

Non-preemptive Priority Scheduling
i t(pi) Pri Reflects importance of external use May cause starvation Can address starvation with aging 250 375 850 925 1275 p3 p1 p2 p4 p0 TTRnd(p0) = t(p0)+t(p4)+t(p2)+t(p1) )+t(p3) = = 1275 TTRnd(p1) = t(p1)+t(p3) = = 375 TTRnd(p2) = t(p2)+t(p1)+t(p3) = = 850 TTRnd(p3) = t(p3) = 250 TTRnd(p4) = t(p4)+ t(p2)+ t(p1)+t(p3) = = 925 TTRnd = ( )/5 = 735 W(p0) = 925 W(p1) = 250 W(p2) = 375 W(p3) = 0 W(p4) = 850 Wavg = ( )/5 = 2400/5 = 480

Deadline Scheduling Allocates service by deadline May not be feasible
i t(pi) Deadline (none) p0 p1 p2 p3 p4 1275 1050 550 200 Allocates service by deadline May not be feasible 575

Real-Time Scheduling Hard real-time systems – required to complete a critical task within a guaranteed amount of time. Soft real-time computing – requires that critical processes receive priority over less fortunate ones.

Preemptive Schedulers
Ready List Scheduler CPU Preemption or voluntary yield Done New Process Highest priority process is guaranteed to be running at all times Or at least at the beginning of a time slice Dominant form of contemporary scheduling But complex to build & analyze

Preemptive Shortest Job Next
Also called the shortest remaining job next When a new process arrives, its next CPU burst is compared to the remaining time of the running process If the new arriver’s time is shorter, it will preempt the CPU from the current running process

Example of Preemptive SJF
Process Arrival Time Burst Time P P P P Average time spent in ready queue = ( )/4 = 3 P1 P3 P2 4 2 11 P4 5 7 16

Comparison of Non-Preemptive and Preemptive SJF
Process Arrival Time Burst Time P P P P SJN (non-preemptive) Average time spent in ready queue ( )/4 = 4 P1 P3 P2 7 3 16 P4 8 12

Round Robin (RR) Each process gets a small unit of CPU time (time quantum), usually milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue. If there are n processes in the ready queue and the time quantum is q, then each process gets 1/n of the CPU time in chunks of at most q time units at once. No process waits more than (n-1)q time units.

Round-robin scheduling
Good way to upset customers!

Round Robin (TQ=50) i t(pi) 0 350 1 125 2 475 3 250 4 75 W(p0) = 0 50
50 p0 W(p0) = 0

Round Robin (TQ=50) – cont.
i t(pi) 100 p0 p1 W(p0) = 0 W(p1) = 50

i t(pi) 100 p0 p1 p2 W(p0) = 0 W(p1) = 50 W(p2) = 100

i t(pi) 100 200 p0 p1 p2 p3 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150

i t(pi) 100 200 p0 p1 p2 p3 p4 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200

i t(pi) 100 200 300 p0 p1 p2 p3 p4 p0 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200

i t(pi) 100 200 300 400 475 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 TTRnd(p4) = 475 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200

i t(pi) 100 200 300 400 475 550 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 TTRnd(p1) = 550 TTRnd(p4) = 475 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200

i t(pi) 100 200 300 400 475 550 650 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 p2 p3 650 750 850 950 p0 p2 p3 p0 p2 p3 TTRnd(p1) = 550 TTRnd(p3) = 950 TTRnd(p4) = 475 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200

i t(pi) 100 200 300 400 475 550 650 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 p2 p3 650 750 850 950 1050 p0 p2 p3 p0 p2 p3 p0 p2 p0 TTRnd(p0) = 1100 TTRnd(p1) = 550 TTRnd(p3) = 950 TTRnd(p4) = 475 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200

i t(pi) 100 200 300 400 475 550 650 p0 p1 p2 p3 p4 p0 p1 p2 p3 p4 p0 p1 p2 p3 650 750 850 950 1050 1150 1250 1275 p0 p2 p3 p0 p2 p3 p0 p2 p0 p2 p2 p2 p2 TTRnd(p0) = 1100 TTRnd(p1) = 550 TTRnd(p2) = 1275 TTRnd(p3) = 950 TTRnd(p4) = 475 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200

i t(pi) p0 TTRnd(p0) = 1100 TTRnd(p1) = 550 TTRnd(p2) = 1275 TTRnd(p3) = 950 TTRnd(p4) = 475 W(p0) = 0 W(p1) = 50 W(p2) = 100 W(p3) = 150 W(p4) = 200 Wavg = ( )/5 = 500/5 = 100 475 400 300 200 100 Equitable Most widely-used Fits naturally with interval timer p4 p1 p3 p2 550 650 750 850 950 1050 1150 1250 1275 TTRnd_avg = ( )/5 = 4350/5 = 870

Round Robin – cont. Performance q large  FIFO
q small  q must be large with respect to context switch, otherwise overhead is too high.

Turnaround Time Varies With The Time Quantum

How a Smaller Time Quantum Increases Context Switches

Overhead must be considered i t(pi) p0 TTRnd(p0) = 1320 TTRnd(p1) = 660 TTRnd(p2) = 1535 TTRnd(p3) = 1140 TTRnd(p4) = 565 W(p0) = 0 W(p1) = 60 W(p2) = 120 W(p3) = 180 W(p4) = 240 Wavg = ( )/5 = 600/5 = 120 540 480 360 240 120 p4 p1 p3 p2 575 790 910 1030 1150 1270 1390 1510 1535 TTRnd_avg = ( )/5 = 5220/5 = 1044 635 670

Multi-Level Queues Each list may use a different policy FCFS SJN RR
Preemption or voluntary yield Ready List0 New Process Scheduler Ready List1 CPU Done Ready List2 Each list may use a different policy FCFS SJN RR Ready Listn

Multilevel Queues Ready queue is partitioned into separate queues
foreground (interactive) background (batch) Each queue has its own scheduling algorithm foreground – RR background – FCFS

Multilevel Queues – cont.
Scheduling must be done between the queues. Fixed priority scheduling; i.e., serve all from foreground then from background. Possibility of starvation. Time slice – each queue gets a certain amount of CPU time which it can schedule amongst its processes; i.e., 80% to foreground in RR 20% to background in FCFS

Multilevel Queue Scheduling

Multilevel Feedback Queue
A process can move between the various queues; aging can be implemented this way. Multilevel-feedback-queue scheduler defined by the following parameters: number of queues scheduling algorithms for each queue method used to determine when to upgrade a process method used to determine when to demote a process method used to determine which queue a process will enter when that process needs service

Multilevel Feedback Queues – cont.

Example of Multilevel Feedback Queue
Three queues: Q0 – time quantum 8 milliseconds Q1 – time quantum 16 milliseconds Q2 – FCFS Scheduling A new job enters queue Q0 which is served FCFS. When it gains CPU, job receives 8 milliseconds. If it does not finish in 8 milliseconds, job is moved to queue Q1. At Q1 job is again served FCFS and receives 16 additional milliseconds. If it still does not complete, it is preempted and moved to queue Q2.

Two-queue scheduling

Three-queue scheduling

Multiple-Processor Scheduling
CPU scheduling more complex when multiple CPUs are available. Homogeneous processors within a multiprocessor. Load sharing Asymmetric multiprocessing – only one processor accesses the system data structures, alleviating the need for data sharing.

Algorithm Evaluation Deterministic modeling – takes a particular predetermined workload and defines the performance of each algorithm for that workload. Queuing models Implementation

Evaluation of CPU Schedulers by Simulation

Contemporary Scheduling
Involuntary CPU sharing -- timer interrupts Time quantum determined by interval timer -- usually fixed for every process using the system Sometimes called the time slice length Priority-based process (job) selection Select the highest priority process Priority reflects policy With preemption Usually a variant of Multi-Level Queues

Operating System Examples - Scheduling

References Silberschatz et al, Chapter 5.6, Chapter 22.3

Windows 7

Windows Windows refers to a collection of graphical operating systems developed by Microsoft There are recent versions of Windows for PCs, server computers, smartphones and embedded devices There is a specialized version of Windows that runs the Xbox One game console Windows was originally designed for desktop machines

Priorities Similar to that used in Windows XP
The scheduler is called a dispatcher 32 priorities Priorities are divided into two classes: User class: priorities 1 to 15 Real-time class: priorities 16 to 31 Priority 0 is used for memory management processes There is a queue for each priority

Selecting a Process The dispatcher traverses the set of queues from highest to lowest until it finds a process that is ready to run If there are no processes ready to run the dispatcher executes the idle process Priority of a preempted process may be modified before being returned to a ready state Round robin

Adjusting Priority If process was in user class
Time quantum expires: If process is in the user class the priority is lowered Process switches from blocked to running: Priority is increased The amount depends on what the process was doing Keyboard I/O gets a large increase while disk I/O gets a moderate increase Some processes always have a low priority e.g., disk fragmenter

Adjusting Priority The priority of a process cannot be lowered passed the base priority (lower threshold value) of the process Windows 7 distinguishes between the foreground process that is currently selected on the screen and the background processes that are not currently selected Tends to give good response times to interactive processes that are using the mouse and windows

Linux As of Dec 2015 Linux is free and open-source
Webservers: W3Cook reports that 96.5% of web servers run Linux (1.5% run Windows) Desktops/laptops: 1.5% use Linux Mobile devices: Android (based on Linux kernel) is used in 80% of all mobile devices Platform of choice for film industry Linux is free and open-source Please note that Linux uses the term “task”

History of Linux Scheduler
Linux version 1.2 Used circular queue for runnable task management Round-robin Efficient for adding and removing processes Fast and Simple

Linux version 2.2 Introduced the idea of scheduling class Permitting scheduling policies for Real-time tasks Non-preemptible tasks Non-real time task

Linux version 2.4 Divided time into epochs Within each epoch, every task was allowed to execute up to its time slice Applying goodness function to determine which task to execute next Simple, O(N), inefficient, lack of scalability, weak for real-time systems

History of Linux Scheduling
Linux version 2.5 Implement scheduling algorithms in O(1) time. Scales well to multiple processors, each with many processes. Problem: Not responsive to interactive applications Complex, error prone logic No guarantee of fairness

Scheduling in Linux 2.5 kernel
Priority-based, preemptive Two priority ranges (real time and nice) Time quantum longer for higher priority processes (ranges from 10ms to 200ms) Tasks are runnable while they have time remaining in their time quantum; once exhausted, must wait until others have exhausted their time quantum

O(1) Background Briefly – the scheduler maintained two runqueues for each CPU, with a priority linked list for each priority level (140 total). Tasks are enqueued into the corresponding priority list. The scheduler only needs to look at the highest priority list to schedule the next task. Assigns timeslices for each task. Had to track sleep times, process interactivity, etc.

O(1) Background Two runqueues per CPU ... one active, one expired. If a process hasn't used its entire timeslice, it's on the active queue; if it has, it's expired. Tasks are swapped between the two as needed. Timeslice and priority are recalculated when a task is swapped. If the active queue is empty, they swap pointers, so the empty one is now the expired queue.

O(1) Background The first 100 priority lists are for real-time tasks, the last 40 are for user tasks. User tasks can have their priorities dynamically adjusted, based on their dependency. (I/O or CPU)

Current Linux Scheduling
Linux has these scheduling classes: Real-time(RT) classes Completely fair scheduler (CFS) class Tasks in RT have higher precedence than tasks in the CFS We will first start with a discussion of nice values which is related to CFS.

CFS - move The Completely Fair Scheduler (CFS) is a significant departure from the traditional UNIX process scheduler. Integrated into Linux (Oct 2007) Runs tasks with the “gravest need” Tries to guarantee fairness (CPU Usage)

Nice Values A nice value is assigned to each task
Nice values range from -20 to +19 Lower nice value indicates a higher relative priority Tasks with lower nice values receive a higher proportion of CPU processing time than tasks with higher nice values The default nice value is 0 The term nice comes from the idea that if a task increases is nice value then it is being nice other tasks by lowering is priority

CPU Scheduling as of Linux 2.6.23 Kernel: “Completely Fair Scheduler”
Goal: fairness in dividing processor time to tasks Balanced (red-black) tree to implement a ready queue; O(log n) insert or delete time Queue ordered in terms of “virtual run time” smallest value picked for using CPU small values: tasks have received less time on CPU tasks blocked on I/O have smaller values execution time on CPU added to value priorities cause different decays of values where n is number of items being scheduled: number of nodes in the tree, which isn’t exactly the number of tasks

The Completely Fair Scheduler
CFS cuts out a lot of the things previous versions tracked – no timeslices, no sleep time tracking, no process type identification... Instead, CFS tries to model an “ideal, precise multitasking CPU” – one that could run multiple processes simultaneously, giving each equal processing power. Obviously, this is purely theoretical, so how can we model it?

CFS, continued We may not be able to have one CPU run things simultaneously, but we can measure how much runtime each task has had and try and ensure that everyone gets their fair share of time. This is held in the vruntime variable for each task, and is recorded at the nanosecond level. A lower vruntime indicates that the task has had less time to compute, and therefore has more need of the processor. Furthermore, instead of a queue, CFS uses a Red-Black tree to store, sort, and schedule tasks.

Priorities and more While CFS does not directly use priorities or priority queues, it does use them to modulate vruntime buildup. In this version, priority is inverse to its effect – a higher priority task will accumulate vruntime more slowly, since it needs more CPU time. Likewise, a low-priority task will have its vruntime increase more quickly, causing it to be preempted earlier. “Nice” value – lower value means higher priority. Relative priority, not absolute...

RB Trees A red-black tree is a binary search tree, which means that for each node, the left subtree only contains keys less than the node's key, and the right subtree contains keys greater than or equal to it. A red-black tree has further restrictions which guarantee that the longest root-leaf path is at most twice as long as the shortest root-leaf path. This bound on the height makes RB Trees more efficient than normal BSTs. Operations are in O(log n) time.

The CFS Tree The key for each node is the vruntime of the corresponding task. To pick the next task to run, simply take the leftmost node.

Modular scheduling Alongside the initial CFS release came the notion of “modular scheduling”, and scheduling classes. This allows various scheduling policies to be implemented, independent of the generic scheduler. sched.c contains that generic code. When schedule() is called, it will call pick_next_task(), which will look at the task's class and call the class-appropriate method. Let's look at the sched_class struct...(sched.h L976)

Scheduling classes! Two scheduling classes are currently implemented: sched_fair, and sched_rt. sched_fair is CFS, which I've been talking about this whole time. sched_rt handles real-time processes, and does not use CFS – it's basically the same as the previous scheduler. CFS is mainly used for non-real-time tasks.

CFS – Picking the next process
Pick process with the weighted minimum runtime so far The virtual run time (vruntime) of a task is the actual runtime weighted by its niceness The value of vruntime is used by the scheduler to determine the next process to run Process with the smallest vruntime is selected to run next

CFS – Virtual Runtime High nice values should result in less CPU time allocated to a process This implies that vruntime cannot be the same as the real runtime

CFS – Virtual Runtime Example: Assume a process runs for 200 milliseconds Nice value of 0: vruntime will be 200 milliseconds Nice value < 0 : vruntime will be less than 200 milliseconds Nice value > 0 : vruntime will be greater than 200 milliseconds Smaller nice values results in values of vruntime that grows more slowly than higher nice values This means that for

CFS – Calculating vruntime
Let t represent the amount of time spent using the CPU when a process has the CPU vruntime is incremented by t*weight0/weighti where weight0 is the weight of nice value 0 weighti is the weight of nice value i We refer to weight0/weighti as the decay factor Weights of nice values are precomputed to avoid runtime overhead

CFS Example Weights Nice Value Weight -5 3121 -1 1277 1024 1 820 5 335

CFS Example Weights Nice Value Decay Factor -5 1024/3121 = .33 -1
1024/1277 = .80 1024/1024 = 1 1 1024/820 = 1.24 5 1024/335 = 3.05

CFS – Using vruntime CFS assigns each task a virtual runtime to account for how long a task has run Example: Assume two tasks t1 and t2 with nice values of 0 Assume t1 runs for 200 milliseconds and t2 runs for 100 milliseconds which is followed by a lot of other tasks t2 will be selected before t1 for execution

CFS – Using vruntime Example: Assume two tasks t1 and t2 with
nice values of 0 and 5 respectively vruntime0 and vruntime1 are initially zero Decay factors of 1 and 3.05 respectively Assume t1 runs for 200 milliseconds and t2 runs for 100 milliseconds which is followed by a lot of other tasks vruntime0 is 200 milliseconds vruntime1 is 305 milliseconds t1 will be selected before t2 for execution Let’s say that t1 runs again for 200 milliseconds; vruntime0 is now 400 milliseconds t2 will now be selected before t1 for execution

CFS Starvation Could a process starve?
No Let’s say that a task t1 doesn’t get the processor while t2 always does At some point t1 will have a smaller value for vruntime since it is never being incremented.

CFS – Process Selection
CFS selects the process with the minimum virtual runtime Avoids having run queues per priority level What about a data structure that represents the collection of tasks? A single queue would be slow Multiple queues make sense if there are relatively small number of them There are many values of virtual runtime

CFS Task Selection What if multiple tasks have the same vruntime value? You can store multiple values in a list at a node with a sequence number indicating its order in the list

CFS No static slices The switching rate depends on the system load Each process receives a proportion of the processor’s time Length depends on how many other processes are running

CFS As a user you can assign nice values greater than zero
You need root to assign nice values less than zero Optimizing nice values for applications seems rather complex Yes it can be

MAC OS X Based on MACH and Unix BSD
Priorities are categorized into priority bands Normal: Applications System high priority: Processes with higher priority then Normal Kernel mode: Reserved for kernel processes Real-time: For processes that must be guaranteed a slice of CPU time by a particular deadline

MAC OS X Priorities change dynamically
Based on wait time and amount of time that the process has had the processor Stay within the same priority band Reschedules every tenth of a second and recomputes priorities once every second Process will relinquish CPU after time quantum or when it must wait for an I/O completion Feedback prevents starvation

Android For mobile devices Today it is the most commonly used platform
Uses Linux for device managers, memory management, process management

Summary We have examined scheduling in several contemporary operating systems

Summary The scheduler is responsible for multiplexing the CPU among a set of ready processes / threads It is invoked periodically by a timer interrupt, by a system call, other device interrupts, any time that the running process terminates It selects from the ready list according to its scheduling policy Which includes non-preemptive and preemptive algorithms

Scheduling.

Similar presentations

Presentation on theme: "Scheduling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scheduling.

Similar presentations

Presentation on theme: "Scheduling."— Presentation transcript:

Similar presentations

About project

Feedback