Ongoing Research Jiyong Park

Ongoing Research Jiyong Park 2007-03-20
Real-Time Operating Systems Laboratory, Seoul National University, Korea

Contents New Interrupt Handling Mechanism for Flexible Priority Assignment Bfair: Reducing The Number of Context Switches in Pfair Scheduling Algorithm Comparison Among LFU, Aged LFU, Window LFU, and PNFU

New Interrupt Handling Mechanism for Flexible Priority Assignment

Motivation (1) In most operating systems, ISRs are assigned higher priorities than tasks. Tasks are executed only when there is no ongoing or pending ISRs. A task may experience unbounded delay due to overly long ISRs. ISR 작성하는 사람은 짧게 작성해라고 한다

schedule & context switch
Motivation (2) A real example A periodic task (hard real-time task) is released by the timer ISR. The timer ISR is preempted by other ISRs. The periodic task can not be executed until all of the ISRs are completed. IRQ y IRQ x timer IRQ ISRs schedule & context switch time unbounded delay a task ISR releases a task time

Existing Solutions Two solutions
Interrupt masking (or interrupt priority level) Interrupt service task (IST) But, these are not complete solutions.

Existing Solutions – Interrupt Masking (1)
Descriptions When an ISR or a task is executed, a predefined set of IRQs are automatically masked. In effect, this mechanism increases the preemption threshold of an ISR or a task. As a result, an ISR or a task can limit the set of IRQs that may preempt itself. timer IRQ IRQ x IRQ y ISRs ISR for timer ISR for x ISR for y schedule & context switch time delay a task ISR releases a task time masked IRQs IRQ x, IRQ y, timer IRQ IRQ x, IRQ y IRQ y none time

Existing Solutions – Interrupt Masking (2)
Limitations Still, there is unbounded delay (due to interrupt nesting). Preemption threshold of a task may higher than that of ISRs, but priority of a task is still lower than that of ISRs. So, ISR should be short. timer IRQ IRQ x IRQ y ISRs schedule & context switch time unbounded delay a task ISR releases a task time masked IRQs none IRQ y all IRQ y none IRQ x, IRQ y time

Existing Solutions – IST
Description Each IRQ is handled by a dedicated task (interrupt service task; IST). An ISR is very simple; it only releases the corresponding IST. In the viewpoint of the scheduler, ISTs and normal tasks are equal. Priority of a normal task can be higher than ISTs. Limitations Performance overhead Two context switches are required at every IRQ (at minimum). Space overhead Task management structures Task context

Comparison Between Existing Solutions (1)
t: a task t.p: priority of the task t i: an ISR i.p: priority of the ISR i x can preempt y only when x.p > y.p

No interrupt masking (and no interrupt nesting) For all t and i, i.p > t.p For all i, i.p = c can preempt ? ISR Task ISR Never Always Task Never Yes High priority c ISRs ISR priority Task priority tasks Low priority

No interrupt masking (but interrupt nesting is allowed) i.p = c when i is being requested i.p = c-1 when i is being serviced For all t, t.p < c-1 can preempt ? ISR Task ISR Yes Always Task Never Yes High priority c serviced ISRs c-1 ISR priority Task priority tasks Low priority

Interrupt masking No restrictions between i.p and t.p t.p = 0 when an IRQ is being serviced can preempt ? ISR Task ISR Yes Yes Task Never Yes High priority ISR priority Task priority Low priority IRQ is being serviced

No restrictions between i.p and t.p can preempt ? ISR Task ISR Yes Yes Task Yes Yes High priority ISR priority Task priority Low priority

Conclusion The IST is flexible in priority assignment. But, it causes performance and space overhead. The interrupt masking is also flexible in priority assignment. It is efficient compared to IST. However, it causes unnecessary delay Since a task can not preempt an ISR

Proposed Solution Goal Basic idea Eliminate the overhead of IST
Eliminate the unnecessary delay in interrupt masking Basic idea Based on the interrupt masking scheme Modifications When a task is released by an ISR, scheduling is performed. If the released task has higher priority than current ISR, context switch is performed IMMEDIATELY. Compared to existing OS implementations Context switch is delayed until nested_count == 0.

This represents the interrupt architecture of most embedded systems
System Model (1) Interrupt controller CPU IRQ 0 Device 0 ACK 0 IRQ Device 1 … IRQ enable register IRQ n Device n ACK n Pending register Mask register Service register Bus This represents the interrupt architecture of most embedded systems

System Model (2) Interrupt controller Pending register Mask register
n-th bit is read as 1: IRQ n is asserted by the device n writing 1 to n-th bit: IRQ n is acknowledged Mask register writing 1 to n-th bit: IRQ n is masked writing 1 to n-th bit: IRQ n is unmasked Service register Represents IRQs that are not masked and requested Service register = Pending register & ~Mask register

System Model (3) CPU IRQ enable register
Determines whether to accept IRQ or not Value of the register is local to each task and ISR

Data Structures t: a task cur: currently executed task i: an ISR
t.p: priority of t cur: currently executed task i: an ISR i.n: IRQ number i.p: priority of i Depending on system, i.p may be fixed by hardware. i.p_saved: priority of the task or ISR that is preempted by i i.handler: handler for i

Primitive Function mask(n)
Returns the value for the mask register for priority level n (IRQs that have the same or lower priority than n is masked) if i.n = x and i.p  n then, IRQ x is masked if i.n = x and i.p > n then, IRQ x is unmasked priority mask(i.p) 1 1 1 1 1 1 1 1 bit x bit 0

Interrupt Handling Routine
irq_handler() { disable_irq(); // IRQ is disabled automatically by CPU save_irq_context(); // save the interrupted context irq = get_irq(); // read ‘service register’ to determine highest priority IRQ irq.p_saved = cur.p cur.p = irq.p set_mask(mask(cur.p)) // mask IRQs that has the same or lower priority restore_irq(); irq.handler(); disable_irq(); cur.p = irq.p_saved; // return to the previous priority schedule(); // since cur.p is changed, schedule again set_mask(mask(cur.p)) // restore previous mask restore_irq_context(); } Interrupt nesting is not counted.

Scheduling and Context Switching Routines
schedule() next = the ready task with highest priority if (next.p > cur.p) switch(next) Interrupt nesting is not counted. switch(next) set_mask(mask(next.p)) // mask is changed to reflect the priority of next task cur = next /* context save, stack change, and context restore */

schedule & context switch
Expected Result No delay If a released task has higher priority, it is immediately executed. timer IRQ (5) IRQ x (4) IRQ y (3) ISRs ISR releases a task schedule & context switch time a task (6) time cur.p 3 4 5 6 5 4 3 time

Future Work Implement the scheme in the eOS or HEART OS
Measure its real-time performance Hopefully, a conference paper can be produced Can it be patented? Analyze the implementation of existing OSes So far, I found no OS that implement our scheme

Note Two level interrupt handling in Linux, Nucleus, … Benefit
Top half (IRQs are disabled, task can not preempt top halves) Bottom half (IRQs are enabled, task can preempt bottom halves) Benefit ISR can be long (execute most of code as a bottom half) Reduced interrupt latency (top half can preempt bottom halves) Unbounded latency for task still exists

Reducing The Number of Context Switches in Pfair Scheduling Algorithm

Motivation (1) Pfair scheduling algorithms cause too much context switches. (the algorithm runs at every ticks). Too much context switches cause various side effects. Inefficient use of cache memory Frequent bus contention … Let’s find a way to reduce the number of context switches.

Too many context switches
Motivation (2) An example 3 processors 6 identical tasks period = deadline = 10, execution time = 5 Schedule produced by PD2 Too many context switches

Motivation (3) Can we do better? Yes!
An ideal schedule with minimum number of context switches Task 1 Task 2 Proc 1 Task 3 Task 4 Proc 2 Task 5 Task 6 Proc 3 5 10

Problem Analysis Pfair algorithm is TOO fair
It tries to track the ideal processor share At any time t, the accumulated allocation for task Ti is either t*wi or t*wi. (wi: utilization of Ti) To do so, lag is maintained to be in (-1, 1). Do we need this much fairness? I think the range of a lag is too narrow. All we need is to guarantee that the lag = 0 at the end of each period. If we allow more loose range of lag, the number of context switches will be reduced.

Survey of Related Work (1)
Multiprocessor scheduling algorithms for periodic tasks best known utilization bound U = (M+1)/2 U = M job-level fixed priority job-level dynamic priority <1,1>-restricted <1,2>-restricted … <2,3>-restricted … <3,3>-restricted Deadline driven Laxity (slack) driven Fair share

Survey of Related Work (2)
Deadline driven Laxity (slack) driven Fair share EDF … EDF-US LLF (Least Laxity First) … EDZL (Earliest Deadline Zero Laxity) Pfair Bfair U = 1 U = (M+1)/2 PF, PD, PD2, EPDF, ERPF, …. BF U = M U = ? < M U = ? < M U = M

Existing Solution: Bfair Scheduling
Someone had already solved the problem Referenced paper Dakai Zhu, Daniel Mosse, and Rami Melhem, “Multiple-Resource Periodic Scheduling Problem: how much fairness is necessary?,” RTSS’03

Basic Idea (1) Pfair schedule (proportionate fair schedule)
A periodic schedule that is fair at all time Bfair schedule (boundary fair schedule) A periodic schedule that is fair only at period boundaries Fair Lag is between -1 and 1 In other words, accumulated allocation for task Ti is either t*wi or t*wi

The dot lines are period boundaries
Basic Idea (2) Comparison 30 scheduling points 60 context switches 11 scheduling points 46 context switches T1 = (2,5) T2 = (3,15) T3 = (3,15) T4 = (2,6) T5 = (20,30) T6 = (6,30) The dot lines are period boundaries U = 2

BF Algorithm BF Characteristics of BF
An algorithm that generates a Bfair schedule from given tasks Characteristics of BF Utilization bound = M Same complexity as that of the Pfair algorithms (PD, …) Number of context switches is reduced dramatically (in their experiments, 75% reduction)

Algorithm Description (1)
Maintain fairness at boundary time bk (k-th boundary time) Until bk, Ti gets either bk*wi or bk*wi allocations Boundary times: {b0, b1, …, bL} bk < bk+1; b0 = 0, bL = LCM k, Ti, bk is a multiple of pi

Two step allocation Give mandatory time units to all tasks If there are N remaining time units, Give an optional time unit to N highest priority eligible tasks Mandatory units to task Ti mik+1 = max {0, RWik + wi * (bk+1 – bk)} RWik = remaining work for Ti before bk RWik = wi * bk – allocated units for Ti until bk Remaining time units N = M * (bk+1 – bk) -  mik

Give mandatory units to all tasks If there is remaining units, pick urgent tasks. Each task get one optional unit. Update RW (remaining work) for bk+1 Generate the schedule for [bk, bk+1) using McNaughton’s algorithm

How to determine urgent tasks Urgency factor UFik The minimal time needed for a task Ti to collect enough work demand to receive one unit allocation and become punctual after bk+s UFik = {1-(bk+s*wi – bk+s * wi )}/wi

Algorithm Example T1=(2,5), T2=(3,15), T3=(3,15), T4=(2,6), T5=(20, 30), T6=(6,30) U = 2 M = 2 Demand 2 1 5/3 10/3 execution time period = deadline

Evaluation The number of scheduling points is reduced
Tasks sets are randomly generated If task periods are harmonic, the number of scheduling points for BF is LCM/min(pi).

Future Work Efficient implementation on SMP
Support for synchronization Support for non-migration …

Comparison Among LFU, Aged LFU, Window LFU, and PNFU

LFU (1) Description Definition of the frequency of a page
The page with the least frequency becomes a candidate for the victim page Definition of the frequency of a page The accumulated number of references for the page from the beginning the beginning current a b a a a b c time frequency of the page a: 4 frequency of the page b: 2 frequency of the page c: 1 candidate for the victim

LFU (2) Definition of the reference
It is IMPOSSIBLE to count EVERY reference for a page So, LFU assumes that a page was referenced once if at least one reference was made for a certain period of time. time period time referenced ? No Yes Yes Yes No frequency n n n+1 n+2 n+3 n+3

LFU (3) Algorithm description
for each page, maintain a counter named count at every time interval t, execute calculate_frequency calculate_frequency: for each page p if (p is referenced during the past interval) p.count reset the p’s reference bit select_victim_page: for each page p return the page p with the smallest p.count

Aged LFU (1) Description Definition of the frequency of a page
The page with the least frequency becomes a candidate for the victim page (same as LFU) Definition of the frequency of a page W: window size in interval unit ref(p, t) = 1 if the page p was referenced before t interval ref(p, t) = 0 if the page p was not referenced before t interval frequency of a page p = ref(p,1)*2^W + ref(p,2)*2^(W-1) + ref(p,3)*2^(W-2) … + ref(p,W) Frequency: MSB 1 1 1 1 1 1 1 LSB W bits

Aged LFU (2) Definition of the reference (same of LFU)

Aged LFU (3) Algorithm description
for each page, maintain a counter named count (with W-bits wide) W is typically 8 or 16 at every time interval t, execute calculate_frequency calculate_frequency: for each page p if (p is referenced during the past interval) p.count = p.count>>1 + 2^W else p.count = p.count>>1 reset the p’s reference bit Aged LFU is not an approximation of LFU. It is an enhanced version of LFU select_victim_page: for each page p return the page p with the smallest p.count

Window LFU (1) Description Definition of the frequency of a page
The page with the least frequency becomes a candidate for the victim page (same as LFU) A page: web page Definition of the frequency of a page W: windows size in natural number H: the history of past W references ex) H=P1,P2,P1,P3,P4,P5,P1,P |H| = W = 8 Frequency of a page p = the number of occurrences of the page p in H ex) frequency of P1 = 3

Window LFU (2) Definition of the reference
It is POSSIBLE to count EVERY references for a page Because it is a web page So, there is no notion of the ‘time interval’ in window LFU

Window LFU (3) Algorithm Description
maintain a queue named history (with W elements) for each page, maintain a variable named counter at every reference for a page p, execute calculate_frequency(p) calculate_frequency(page p): p_old = the first element in history p_old.counter— discard p_old from history make p the last element in history p.counter++ select_victim_page: for each page p return the page p with the smallest p.count

PNFU (1) Description Categorize the pages with FAP and IFAP groups depending on the frequency of each page In IFAP, pages are selected in LRU order In FAP, pages are selected in FIFO order When selecting a victim page, IFAP is inspected first, then FAP is inspected.

PNFU (2) FIFO queue new page FAP group frequency < 1/w
if no page in IFAP frequency > 1/w IFAP group victim page referenced This makes pages in IFAP to be ordered in LRU order (exactly the same as the stack LRU algorithm)

PNFU (3) Definition of the reference (same as LFU)
the time interval is set to the PERIOD of a task

PNFU (4) Definition of the frequency of a page
Count the number of time intervals that the page was NOT referenced during the past W intervals If a page is referenced, set the counter to 0 If a page is NOT referenced, increase the counter by 1 frequency = (W-counter)/W (if counter < W) = (if counter >= W)

PNFU (5) The definition of the frequency has a problem
W = 4 time interval current W=4 time interval (= period) time count 1 1 2 3 1 2 3 4 5 freq 1 3/4 1 3/4 2/4 1/4 1 3/4 2/4 1/4 The frequency at this time should be calculated as 1/4 However, by PNFU, it is calculated as 3/4

PNFU (6) Problems of PNFU Good points of PNFU
The use of sliding window: not a new idea The emulation of the reference bit: not a new idea The calculation of the frequency is wrong If almost all pages are frequently referenced, PNFU degenerates into FIFO (bad performance) Good points of PNFU If almost all pages are infrequently referenced, PNFU becomes LRU (my question: why don’t we just use LRU for all cases?)

Thank You!!!

Ongoing Research Jiyong Park

Similar presentations

Presentation on theme: "Ongoing Research Jiyong Park"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Ongoing Research Jiyong Park

Similar presentations

Presentation on theme: "Ongoing Research Jiyong Park"— Presentation transcript:

Similar presentations

About project

Feedback