Presentation is loading. Please wait.

Presentation is loading. Please wait.

Soft Timers Efficient Microsecond Software Timer Support for Network Processing MOHIT ARON and PETER DRUSCHEL Rice University Published in ACM Transactions.

Similar presentations


Presentation on theme: "Soft Timers Efficient Microsecond Software Timer Support for Network Processing MOHIT ARON and PETER DRUSCHEL Rice University Published in ACM Transactions."— Presentation transcript:

1 Soft Timers Efficient Microsecond Software Timer Support for Network Processing MOHIT ARON and PETER DRUSCHEL Rice University Published in ACM Transactions on Computer Systems, vol. 18(3), pp. 197.228, 2000. Presented By Glenn Diviney

2 What’s wrong with “Hard” timers? Polling vs. Interrupts Polling vs. Interrupts Interrupts have high overhead and low latency Interrupts have high overhead and low latency Polling has high latency and low overhead Polling has high latency and low overhead Interruption is is expensive Interruption is is expensive CPU pipeline gets disrupted, cache and TLB get dirty. This is expensive CPU pipeline gets disrupted, cache and TLB get dirty. This is expensive Generally not significant, so long as it’s done on the ms frequency Generally not significant, so long as it’s done on the ms frequency Example: Network interrupts can occur at the rate of tens of microseconds Example: Network interrupts can occur at the rate of tens of microseconds Gigabit Ethernet requires a packet transmission every 12 µs (1500 bytes each)! Gigabit Ethernet requires a packet transmission every 12 µs (1500 bytes each)! This amounts to a significant burden on the system if a context switch is involved each time This amounts to a significant burden on the system if a context switch is involved each time

3 Interrupts Device interrupts have a low latency but high overhead due to the added context switching Device interrupts have a low latency but high overhead due to the added context switching The executing thread gets preempted The executing thread gets preempted Can occur at inopportune times which will slow down other work due to the cache pollution, TLB pollution, and pipeline purge resulting in high indirect costs Can occur at inopportune times which will slow down other work due to the cache pollution, TLB pollution, and pipeline purge resulting in high indirect costs

4 Polling Polling has low overhead, but can have high latency due to the frequency of the poll: Polling has low overhead, but can have high latency due to the frequency of the poll: The OS’s timer granularity depends directly on the frequency of the timer interrupts, as well as the overhead incurred by the interrupt The OS’s timer granularity depends directly on the frequency of the timer interrupts, as well as the overhead incurred by the interrupt The cache, TLB, and pipeline costs can be avoided if the polling is done at the right time The cache, TLB, and pipeline costs can be avoided if the polling is done at the right time

5 What’s a Soft Timer? “An operating system facility that allows efficient scheduling of software events at microsecond granularity.” “An operating system facility that allows efficient scheduling of software events at microsecond granularity.” Takes advantage of states where handlers can be invoked at low cost: “Trigger States” Takes advantage of states where handlers can be invoked at low cost: “Trigger States” As in the case when the system is already context-switched to the kernel… why not see if other work can be done “while you’re in there?” As in the case when the system is already context-switched to the kernel… why not see if other work can be done “while you’re in there?” Schedule future events probabilistically Schedule future events probabilistically

6 How Soft Timers work: hardware Pentiums are usually shipped with a programmable timer, which can be told how often to interrupt the CPU. Pentiums are usually shipped with a programmable timer, which can be told how often to interrupt the CPU. These interrupts are usually assigned the highest priority in the OS, which can lead to TLB and cache misses These interrupts are usually assigned the highest priority in the OS, which can lead to TLB and cache misses Testing indicated a total cost to be 4.45 µs on a 300mhz web server, which is insignificant at ms intervals but terrible at 20 µs intervals Testing indicated a total cost to be 4.45 µs on a 300mhz web server, which is insignificant at ms intervals but terrible at 20 µs intervals Timer chip programmed to interrupt at ms intervals Timer chip programmed to interrupt at ms intervals

7 How Soft Timers work: software At unpredictable intervals, the system will arrive at “trigger states” At unpredictable intervals, the system will arrive at “trigger states” End of a system call End of a system call End of an exception handler End of an exception handler End of an interrupt handler End of an interrupt handler CPU idle CPU idle In these states, invoking an event handler is just a function call’s worth of overhead In these states, invoking an event handler is just a function call’s worth of overhead TLB and Cache are already “disturbed” due to the triggering event, so no additional cost should be incurred TLB and Cache are already “disturbed” due to the triggering event, so no additional cost should be incurred In these states, the OS’s Soft Timer facility checks for any pending events without incurring the cost of the hardware timer In these states, the OS’s Soft Timer facility checks for any pending events without incurring the cost of the hardware timer Checks the clock (usually a CPU register) and compares it to the scheduled time of the earliest soft timer event. Checks the clock (usually a CPU register) and compares it to the scheduled time of the earliest soft timer event.

8 The catch Events might get delayed past a scheduled time Events might get delayed past a scheduled time Only the hardware interrupt is guaranteed to happen (providing an upper bound on execution) Only the hardware interrupt is guaranteed to happen (providing an upper bound on execution) Other trigger states appear as random events to the system, or may not happen at all between hardware interrupts Other trigger states appear as random events to the system, or may not happen at all between hardware interrupts

9 Implementation Soft timers provide the following operations Soft timers provide the following operations measure_resolution(): returns a 64-bit value which represents the clock resolution in hertz measure_resolution(): returns a 64-bit value which represents the clock resolution in hertz measure_time(): returns a 64 bit value representing the current time whose resolution is given by measure_resolution() measure_time(): returns a 64 bit value representing the current time whose resolution is given by measure_resolution() schedule_soft_event(T, handler): schedules “handler” to run “T” ticks in the future schedule_soft_event(T, handler): schedules “handler” to run “T” ticks in the future interrupt_clock_resolution(): provides the minimal resolution, which is that of the hardware interrupter interrupt_clock_resolution(): provides the minimal resolution, which is that of the hardware interrupter When invoked, the Soft Timer facility executes all handlers which have a T that is less than the value given by a call to measure_time() by 1. When invoked, the Soft Timer facility executes all handlers which have a T that is less than the value given by a call to measure_time() by 1.

10 Implementation (cont) If X is the resolution of the hardware interrupter, the events will be bounded by: If X is the resolution of the hardware interrupter, the events will be bounded by: T < Actual Event Time < T + X + 1 T < Actual Event Time < T + X + 1 Just a reassurance that the event will happen eventually Just a reassurance that the event will happen eventually Generally, the assumption is that the event will happen as: Actual Event Time = T + d Generally, the assumption is that the event will happen as: Actual Event Time = T + d “d” is the “random” time between non-hardware triggers “d” is the “random” time between non-hardware triggers

11 Applications Rate-based clocking Rate-based clocking Recall 12µs interrupt for gigabit Ethernet Recall 12µs interrupt for gigabit Ethernet Transmission rate becomes variable, but the protocol could maintain an average “actual” rate and adjust the scheduling accordingly to achieve a target rate Transmission rate becomes variable, but the protocol could maintain an average “actual” rate and adjust the scheduling accordingly to achieve a target rate Network polling Network polling Pure polling reduces interrupts and the impact of memory access, but it also can induce latencies by delaying packet processing Pure polling reduces interrupts and the impact of memory access, but it also can induce latencies by delaying packet processing Soft Timers are a perfect alternative to pure polling or a hybrid hardware approach with a network poll timer Soft Timers are a perfect alternative to pure polling or a hybrid hardware approach with a network poll timer Soft Timers show a latency close to interrupt driven processing in common case Soft Timers show a latency close to interrupt driven processing in common case

12 Base overhead test setup FreeBSD was extended to include the Soft Timer facilities FreeBSD was extended to include the Soft Timer facilities They also added support for an the-chip APIC timer in addition to the already-supported 8253 off-chip timer They also added support for an the-chip APIC timer in addition to the already-supported 8253 off-chip timer Connected “a number” of 300 to 500 Mhz machines to a 100mpbs network Connected “a number” of 300 to 500 Mhz machines to a 100mpbs network One acted as a web server One acted as a web server Others repeatedly requested a 6KB file to the point where the web server was saturated Others repeatedly requested a 6KB file to the point where the web server was saturated

13 Base overhead test results Used a “null handler” to measure the per-timer event costs: Used a “null handler” to measure the per-timer event costs: Of note: Of note: The results suggest that the overhead does not scale with processor speed The results suggest that the overhead does not scale with processor speed Soft Timers caused no observable cost Soft Timers caused no observable cost

14 Base overhead test results What about TLB and Cache misses? What about TLB and Cache misses? Touched 50 data cache lines Touched 50 data cache lines Touched 2 instruction cache lines on 2 separate pages Touched 2 instruction cache lines on 2 separate pages All lines touched were different each time, and occurred at 10µs then 20µs intervals All lines touched were different each time, and occurred at 10µs then 20µs intervals Results for events scheduled every 10µs could not be obtained for 8253-based timers due to the high overhead of that facility Results for events scheduled every 10µs could not be obtained for 8253-based timers due to the high overhead of that facility

15 Base overhead test results Prior reasoning about Soft-Timers reducing TLB and Cache misses is confirmed Prior reasoning about Soft-Timers reducing TLB and Cache misses is confirmed Data cache miss reduced by 20-31% Data cache miss reduced by 20-31% Instruction cache miss not reduced Instruction cache miss not reduced Author assumes this is due to only 2 lines being touched Author assumes this is due to only 2 lines being touched TLB misses reduced by 7-13% TLB misses reduced by 7-13%

16 Different workload test setup Intended to induce variation in when the trigger events occur, which is the Achilles Heel of Soft Timers Intended to induce variation in when the trigger events occur, which is the Achilles Heel of Soft Timers Measured the distribution of times between successive trigger stats for various workloads on a 300MHz PII machine Measured the distribution of times between successive trigger stats for various workloads on a 300MHz PII machine Mean granularity in the tens-of-µs, with less than 6% over 100µs Mean granularity in the tens-of-µs, with less than 6% over 100µs

17 Stats on the distributions

18 Rate-Based Clocking: Timer Overhead Web server TCP implementation using Soft Timers vs. hardware timers Web server TCP implementation using Soft Timers vs. hardware timers At 100mbps, 1500 byte packet takes 120µs so it has no observable impact on the network At 100mbps, 1500 byte packet takes 120µs so it has no observable impact on the network Therefore, the metric to isolate is the timer overhead, but possible benefits of rate-based clocking are not exposed Therefore, the metric to isolate is the timer overhead, but possible benefits of rate-based clocking are not exposed - Cache/TLB pollution is 4-8% better -Average time between transmissions only slightly higher with Soft Timers - Huge reduction in overhead

19 TCP: targeting average transmission interval It was suggested that TCP could control transmission intervals by noting the average time since transmitting vs. the requested transmission interval and adjusting the next Soft Timer interval accordingly. It was suggested that TCP could control transmission intervals by noting the average time since transmitting vs. the requested transmission interval and adjusting the next Soft Timer interval accordingly. Two tests on a busy Apache Webserver (300MHz PII): one with a target of 40µs, the other with a target of 60µs Two tests on a busy Apache Webserver (300MHz PII): one with a target of 40µs, the other with a target of 60µs In most cases, the target rate was hit, although with more deviation than the same rate with the hardware timers. In most cases, the target rate was hit, although with more deviation than the same rate with the hardware timers. at line speed of 12µs: at line speed of 12µs: For the 60µs target, the ST transmit interval was 60µs with a std dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7 For the 60µs target, the ST transmit interval was 60µs with a std dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7 For the 40µs target, the ST transmit interval was 40µs with a std dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8 For the 40µs target, the ST transmit interval was 40µs with a std dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8 Delta in timers for hardware interval accounted for because of interrupt disabling in FreeBSD Delta in timers for hardware interval accounted for because of interrupt disabling in FreeBSD

20 Network Performance Substantial improvements in response time and throughput with rate-based clocking Substantial improvements in response time and throughput with rate-based clocking

21 Network polling Significant improvements across the board with Soft Timers Significant improvements across the board with Soft Timers

22 Using the on-chip Timer (APIC) … defeats “the catch” Used to shorten the tail on the event-time distribution Used to shorten the tail on the event-time distribution This timer can be scheduled and cancelled at a very low cost This timer can be scheduled and cancelled at a very low cost Invoked when a deadline is specified while scheduling the next Soft Timer event. Invoked when a deadline is specified while scheduling the next Soft Timer event. This is used to provide an upper bound on execution with low overhead because it gets cancelled when the Soft Timer “beats it to the punch” This is used to provide an upper bound on execution with low overhead because it gets cancelled when the Soft Timer “beats it to the punch”

23 Conclusions Soft timers allow for high granularity and low overhead when compared to hardware timers Soft timers allow for high granularity and low overhead when compared to hardware timers But they have a useful range between the highest granularity of the hardware timer and the Soft Timer trigger interval (~10µs-~100µs on 300 to 500MHz CPUs) But they have a useful range between the highest granularity of the hardware timer and the Soft Timer trigger interval (~10µs-~100µs on 300 to 500MHz CPUs) Useful range appears to widen as CPU gets faster, approximately linearly. Useful range appears to widen as CPU gets faster, approximately linearly. Should be used for events requiring this kind of granularity, assuming they can tolerate probabilistic delays Should be used for events requiring this kind of granularity, assuming they can tolerate probabilistic delays Can be integrated with the on-chip APIC to provide find- grained events with tight deadlines and low overhead Can be integrated with the on-chip APIC to provide find- grained events with tight deadlines and low overhead When restricted to the appropriate class of problems, they always seem to improve things When restricted to the appropriate class of problems, they always seem to improve things

24 Q/A


Download ppt "Soft Timers Efficient Microsecond Software Timer Support for Network Processing MOHIT ARON and PETER DRUSCHEL Rice University Published in ACM Transactions."

Similar presentations


Ads by Google