Soft Timers Efficient Microsecond Software Timer Support for Network Processing MOHIT ARON and PETER DRUSCHEL Rice University Published in ACM Transactions.

Slides:



Advertisements
Similar presentations
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Advertisements

Tutorial 3 - Linux Interrupt Handling -
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Resource Containers: A new Facility for Resource Management in Server Systems G. Banga, P. Druschel,
Fast Communication Firefly RPC Lightweight RPC  CS 614  Tuesday March 13, 2001  Jeff Hoy.
 A quantum is the amount of time a thread gets to run before Windows checks.  Length: Windows 2000 / XP: 2 clock intervals Windows Server systems: 12.
Effects of Clock Resolution on the Scheduling of Interactive and Soft Real- Time Processes by Yoav Etsion, Dan Tsafrir, Dror G. Feitelson Presented by.
Chapter 7 Protocol Software On A Conventional Processor.
© 2006 Pearson Education, Upper Saddle River, NJ All Rights Reserved.Brey: The Intel Microprocessors, 7e Chapter 13 Direct Memory Access (DMA)
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
1 School of Computing Science Simon Fraser University CMPT 300: Operating Systems I Dr. Mohamed Hefeeda.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Supporting Time-sensitive Application on a Commodity OS Ashvin Goel, Luca Abeni, Charles Krasic, Jim Snow, Jonathan Walpole Presented by Wen Sun Some Slides.
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
Supporting Time-sensitive Application on a Commodity OS By Ashvin Goel, Luca Abeni, Charles Krasic, Jim Snow, Jonathan Walpole Presenter: Shuping Tien.
Computer System Structures memory memory controller disk controller disk controller printer controller printer controller tape-drive controller tape-drive.
Threads CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
1Chapter 05, Fall 2008 CPU Scheduling The CPU scheduler (sometimes called the dispatcher or short-term scheduler): Selects a process from the ready queue.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 15 Slide 1 Real-time Systems 1.
1 Physical Clocks need for time in distributed systems physical clocks and their problems synchronizing physical clocks u coordinated universal time (UTC)
Introduction to Embedded Systems
GPS based time synchronization of PC hardware Antti Gröhn
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
1 Chapter 2: Computer-System Structures  Computer System Operation  I/O Structure  Storage Structure  Storage Hierarchy  Hardware Protection  General.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Chapter 2: Computer-System Structures Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection Network Structure.
Srihari Makineni & Ravi Iyer Communications Technology Lab
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Scheduling Lecture 6. What is Scheduling? An O/S often has many pending tasks. –Threads, async callbacks, device input. The order may matter. –Policy,
Time Management.  Time management is concerned with OS facilities and services which measure real time, and is essential to the operation of timesharing.
Outline for Today Objectives: –Time and Timers Administrative details: –Talk on learning at 4 in 130 North Building –Questions?
Chapter 2 Processes and Threads Introduction 2.2 Processes A Process is the execution of a Program More specifically… – A process is a program.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Oindrila.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
Operating Systems 1 K. Salah Module 1.2: Fundamental Concepts Interrupts System Calls.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
Time Management.  Time management is concerned with OS facilities and services which measure real time.  These services include:  Keeping track of.
CORE Lab. E.E. 1 Soft timers : efficient microsecond so ftware timer support for network proc essing Mohit Aron and Peter Druschel 17 th ACM Symposium.
Interrupts and Exception Handling. Execution We are quite aware of the Fetch, Execute process of the control unit of the CPU –Fetch and instruction as.
Interrupt Handler Migration and Direct Interrupt Scheduling for Rapid Scheduling of Interrupt-driven Tasks Reviewer: Kim, Hyukjoong ESLab.
Scheduling.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
Soft Timers : Efficient Microsecond Software Timer Support for Network Processing - Mohit Aron & Peter Druschel CS533 Winter 2007.
Mohit Aron Peter Druschel Presenter: Christopher Head
REAL-TIME OPERATING SYSTEMS
Chapter 19: Real-Time Systems
Memory COMPUTER ARCHITECTURE
Yoav Etsion, Dan Tsafrir, Dror G. Feitelson
CS 6560: Operating Systems Design
Virtual Memory - Part II
Copyright ©: Nahrstedt, Angrave, Abdelzaher
OPERATING SYSTEMS CS3502 Fall 2017
Mechanism: Limited Direct Execution
Presented by Kristen Carlson Accardi
The deadline establish a priority among interrupt requests.
/ Computer Architecture and Design
Real-time Software Design
CS 258 Reading Assignment 4 Discussion Exploiting Two-Case Delivery for Fast Protected Messages Bill Kramer February 13, 2002 #
Threads Chapter 4.
Chapter 19: Real-Time Systems
Supporting Time-Sensitive Applications on a Commodity OS
Chapter 13: I/O Systems.
Presentation transcript:

Soft Timers Efficient Microsecond Software Timer Support for Network Processing MOHIT ARON and PETER DRUSCHEL Rice University Published in ACM Transactions on Computer Systems, vol. 18(3), pp , Presented By Glenn Diviney

What’s wrong with “Hard” timers? Polling vs. Interrupts Polling vs. Interrupts Interrupts have high overhead and low latency Interrupts have high overhead and low latency Polling has high latency and low overhead Polling has high latency and low overhead Interruption is is expensive Interruption is is expensive CPU pipeline gets disrupted, cache and TLB get dirty. This is expensive CPU pipeline gets disrupted, cache and TLB get dirty. This is expensive Generally not significant, so long as it’s done on the ms frequency Generally not significant, so long as it’s done on the ms frequency Example: Network interrupts can occur at the rate of tens of microseconds Example: Network interrupts can occur at the rate of tens of microseconds Gigabit Ethernet requires a packet transmission every 12 µs (1500 bytes each)! Gigabit Ethernet requires a packet transmission every 12 µs (1500 bytes each)! This amounts to a significant burden on the system if a context switch is involved each time This amounts to a significant burden on the system if a context switch is involved each time

Interrupts Device interrupts have a low latency but high overhead due to the added context switching Device interrupts have a low latency but high overhead due to the added context switching The executing thread gets preempted The executing thread gets preempted Can occur at inopportune times which will slow down other work due to the cache pollution, TLB pollution, and pipeline purge resulting in high indirect costs Can occur at inopportune times which will slow down other work due to the cache pollution, TLB pollution, and pipeline purge resulting in high indirect costs

Polling Polling has low overhead, but can have high latency due to the frequency of the poll: Polling has low overhead, but can have high latency due to the frequency of the poll: The OS’s timer granularity depends directly on the frequency of the timer interrupts, as well as the overhead incurred by the interrupt The OS’s timer granularity depends directly on the frequency of the timer interrupts, as well as the overhead incurred by the interrupt The cache, TLB, and pipeline costs can be avoided if the polling is done at the right time The cache, TLB, and pipeline costs can be avoided if the polling is done at the right time

What’s a Soft Timer? “An operating system facility that allows efficient scheduling of software events at microsecond granularity.” “An operating system facility that allows efficient scheduling of software events at microsecond granularity.” Takes advantage of states where handlers can be invoked at low cost: “Trigger States” Takes advantage of states where handlers can be invoked at low cost: “Trigger States” As in the case when the system is already context-switched to the kernel… why not see if other work can be done “while you’re in there?” As in the case when the system is already context-switched to the kernel… why not see if other work can be done “while you’re in there?” Schedule future events probabilistically Schedule future events probabilistically

How Soft Timers work: hardware Pentiums are usually shipped with a programmable timer, which can be told how often to interrupt the CPU. Pentiums are usually shipped with a programmable timer, which can be told how often to interrupt the CPU. These interrupts are usually assigned the highest priority in the OS, which can lead to TLB and cache misses These interrupts are usually assigned the highest priority in the OS, which can lead to TLB and cache misses Testing indicated a total cost to be 4.45 µs on a 300mhz web server, which is insignificant at ms intervals but terrible at 20 µs intervals Testing indicated a total cost to be 4.45 µs on a 300mhz web server, which is insignificant at ms intervals but terrible at 20 µs intervals Timer chip programmed to interrupt at ms intervals Timer chip programmed to interrupt at ms intervals

How Soft Timers work: software At unpredictable intervals, the system will arrive at “trigger states” At unpredictable intervals, the system will arrive at “trigger states” End of a system call End of a system call End of an exception handler End of an exception handler End of an interrupt handler End of an interrupt handler CPU idle CPU idle In these states, invoking an event handler is just a function call’s worth of overhead In these states, invoking an event handler is just a function call’s worth of overhead TLB and Cache are already “disturbed” due to the triggering event, so no additional cost should be incurred TLB and Cache are already “disturbed” due to the triggering event, so no additional cost should be incurred In these states, the OS’s Soft Timer facility checks for any pending events without incurring the cost of the hardware timer In these states, the OS’s Soft Timer facility checks for any pending events without incurring the cost of the hardware timer Checks the clock (usually a CPU register) and compares it to the scheduled time of the earliest soft timer event. Checks the clock (usually a CPU register) and compares it to the scheduled time of the earliest soft timer event.

The catch Events might get delayed past a scheduled time Events might get delayed past a scheduled time Only the hardware interrupt is guaranteed to happen (providing an upper bound on execution) Only the hardware interrupt is guaranteed to happen (providing an upper bound on execution) Other trigger states appear as random events to the system, or may not happen at all between hardware interrupts Other trigger states appear as random events to the system, or may not happen at all between hardware interrupts

Implementation Soft timers provide the following operations Soft timers provide the following operations measure_resolution(): returns a 64-bit value which represents the clock resolution in hertz measure_resolution(): returns a 64-bit value which represents the clock resolution in hertz measure_time(): returns a 64 bit value representing the current time whose resolution is given by measure_resolution() measure_time(): returns a 64 bit value representing the current time whose resolution is given by measure_resolution() schedule_soft_event(T, handler): schedules “handler” to run “T” ticks in the future schedule_soft_event(T, handler): schedules “handler” to run “T” ticks in the future interrupt_clock_resolution(): provides the minimal resolution, which is that of the hardware interrupter interrupt_clock_resolution(): provides the minimal resolution, which is that of the hardware interrupter When invoked, the Soft Timer facility executes all handlers which have a T that is less than the value given by a call to measure_time() by 1. When invoked, the Soft Timer facility executes all handlers which have a T that is less than the value given by a call to measure_time() by 1.

Implementation (cont) If X is the resolution of the hardware interrupter, the events will be bounded by: If X is the resolution of the hardware interrupter, the events will be bounded by: T < Actual Event Time < T + X + 1 T < Actual Event Time < T + X + 1 Just a reassurance that the event will happen eventually Just a reassurance that the event will happen eventually Generally, the assumption is that the event will happen as: Actual Event Time = T + d Generally, the assumption is that the event will happen as: Actual Event Time = T + d “d” is the “random” time between non-hardware triggers “d” is the “random” time between non-hardware triggers

Applications Rate-based clocking Rate-based clocking Recall 12µs interrupt for gigabit Ethernet Recall 12µs interrupt for gigabit Ethernet Transmission rate becomes variable, but the protocol could maintain an average “actual” rate and adjust the scheduling accordingly to achieve a target rate Transmission rate becomes variable, but the protocol could maintain an average “actual” rate and adjust the scheduling accordingly to achieve a target rate Network polling Network polling Pure polling reduces interrupts and the impact of memory access, but it also can induce latencies by delaying packet processing Pure polling reduces interrupts and the impact of memory access, but it also can induce latencies by delaying packet processing Soft Timers are a perfect alternative to pure polling or a hybrid hardware approach with a network poll timer Soft Timers are a perfect alternative to pure polling or a hybrid hardware approach with a network poll timer Soft Timers show a latency close to interrupt driven processing in common case Soft Timers show a latency close to interrupt driven processing in common case

Base overhead test setup FreeBSD was extended to include the Soft Timer facilities FreeBSD was extended to include the Soft Timer facilities They also added support for an the-chip APIC timer in addition to the already-supported 8253 off-chip timer They also added support for an the-chip APIC timer in addition to the already-supported 8253 off-chip timer Connected “a number” of 300 to 500 Mhz machines to a 100mpbs network Connected “a number” of 300 to 500 Mhz machines to a 100mpbs network One acted as a web server One acted as a web server Others repeatedly requested a 6KB file to the point where the web server was saturated Others repeatedly requested a 6KB file to the point where the web server was saturated

Base overhead test results Used a “null handler” to measure the per-timer event costs: Used a “null handler” to measure the per-timer event costs: Of note: Of note: The results suggest that the overhead does not scale with processor speed The results suggest that the overhead does not scale with processor speed Soft Timers caused no observable cost Soft Timers caused no observable cost

Base overhead test results What about TLB and Cache misses? What about TLB and Cache misses? Touched 50 data cache lines Touched 50 data cache lines Touched 2 instruction cache lines on 2 separate pages Touched 2 instruction cache lines on 2 separate pages All lines touched were different each time, and occurred at 10µs then 20µs intervals All lines touched were different each time, and occurred at 10µs then 20µs intervals Results for events scheduled every 10µs could not be obtained for 8253-based timers due to the high overhead of that facility Results for events scheduled every 10µs could not be obtained for 8253-based timers due to the high overhead of that facility

Base overhead test results Prior reasoning about Soft-Timers reducing TLB and Cache misses is confirmed Prior reasoning about Soft-Timers reducing TLB and Cache misses is confirmed Data cache miss reduced by 20-31% Data cache miss reduced by 20-31% Instruction cache miss not reduced Instruction cache miss not reduced Author assumes this is due to only 2 lines being touched Author assumes this is due to only 2 lines being touched TLB misses reduced by 7-13% TLB misses reduced by 7-13%

Different workload test setup Intended to induce variation in when the trigger events occur, which is the Achilles Heel of Soft Timers Intended to induce variation in when the trigger events occur, which is the Achilles Heel of Soft Timers Measured the distribution of times between successive trigger stats for various workloads on a 300MHz PII machine Measured the distribution of times between successive trigger stats for various workloads on a 300MHz PII machine Mean granularity in the tens-of-µs, with less than 6% over 100µs Mean granularity in the tens-of-µs, with less than 6% over 100µs

Stats on the distributions

Rate-Based Clocking: Timer Overhead Web server TCP implementation using Soft Timers vs. hardware timers Web server TCP implementation using Soft Timers vs. hardware timers At 100mbps, 1500 byte packet takes 120µs so it has no observable impact on the network At 100mbps, 1500 byte packet takes 120µs so it has no observable impact on the network Therefore, the metric to isolate is the timer overhead, but possible benefits of rate-based clocking are not exposed Therefore, the metric to isolate is the timer overhead, but possible benefits of rate-based clocking are not exposed - Cache/TLB pollution is 4-8% better -Average time between transmissions only slightly higher with Soft Timers - Huge reduction in overhead

TCP: targeting average transmission interval It was suggested that TCP could control transmission intervals by noting the average time since transmitting vs. the requested transmission interval and adjusting the next Soft Timer interval accordingly. It was suggested that TCP could control transmission intervals by noting the average time since transmitting vs. the requested transmission interval and adjusting the next Soft Timer interval accordingly. Two tests on a busy Apache Webserver (300MHz PII): one with a target of 40µs, the other with a target of 60µs Two tests on a busy Apache Webserver (300MHz PII): one with a target of 40µs, the other with a target of 60µs In most cases, the target rate was hit, although with more deviation than the same rate with the hardware timers. In most cases, the target rate was hit, although with more deviation than the same rate with the hardware timers. at line speed of 12µs: at line speed of 12µs: For the 60µs target, the ST transmit interval was 60µs with a std dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7 For the 60µs target, the ST transmit interval was 60µs with a std dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7 For the 40µs target, the ST transmit interval was 40µs with a std dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8 For the 40µs target, the ST transmit interval was 40µs with a std dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8 Delta in timers for hardware interval accounted for because of interrupt disabling in FreeBSD Delta in timers for hardware interval accounted for because of interrupt disabling in FreeBSD

Network Performance Substantial improvements in response time and throughput with rate-based clocking Substantial improvements in response time and throughput with rate-based clocking

Network polling Significant improvements across the board with Soft Timers Significant improvements across the board with Soft Timers

Using the on-chip Timer (APIC) … defeats “the catch” Used to shorten the tail on the event-time distribution Used to shorten the tail on the event-time distribution This timer can be scheduled and cancelled at a very low cost This timer can be scheduled and cancelled at a very low cost Invoked when a deadline is specified while scheduling the next Soft Timer event. Invoked when a deadline is specified while scheduling the next Soft Timer event. This is used to provide an upper bound on execution with low overhead because it gets cancelled when the Soft Timer “beats it to the punch” This is used to provide an upper bound on execution with low overhead because it gets cancelled when the Soft Timer “beats it to the punch”

Conclusions Soft timers allow for high granularity and low overhead when compared to hardware timers Soft timers allow for high granularity and low overhead when compared to hardware timers But they have a useful range between the highest granularity of the hardware timer and the Soft Timer trigger interval (~10µs-~100µs on 300 to 500MHz CPUs) But they have a useful range between the highest granularity of the hardware timer and the Soft Timer trigger interval (~10µs-~100µs on 300 to 500MHz CPUs) Useful range appears to widen as CPU gets faster, approximately linearly. Useful range appears to widen as CPU gets faster, approximately linearly. Should be used for events requiring this kind of granularity, assuming they can tolerate probabilistic delays Should be used for events requiring this kind of granularity, assuming they can tolerate probabilistic delays Can be integrated with the on-chip APIC to provide find- grained events with tight deadlines and low overhead Can be integrated with the on-chip APIC to provide find- grained events with tight deadlines and low overhead When restricted to the appropriate class of problems, they always seem to improve things When restricted to the appropriate class of problems, they always seem to improve things

Q/A