EE384Y: Packet Switch Architectures Part II Sizing Router Buffers (Recent work by Guido Appenzeller) Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm
How much Buffer does a Router need? Universally applied rule-of-thumb: A router needs a buffer size: 2T is the round-trip propagation time (or just 250ms) C is the capacity of the outgoing link Background Mandated in backbone and edge routers. Appears in RFPs and IETF architectural guidelines. Has major consequences for router design. Comes from dynamics of TCP congestion control. Villamizar and Song: “High Performance TCP in ANSNET”, CCR, 1994. Based on 2 to 16 TCP flows at speeds of up to 40 Mb/s.
Example 10Gb/s linecard or router Memory technologies Requires 300Mbytes of buffering. Read and write new packet every 32ns. Memory technologies SRAM: require 80 devices, 1kW, $2000. DRAM: require 4 devices, but too slow. Problem gets harder at 40Gb/s Hence RLDRAM, FCRAM, etc.
TCP TCP adapts to congestion Sender sends packets, receiver sends ACKs Sending rate is controlled by Window W At any time, only W unacknowledged packets may be outstanding W is adjusted for each packet (in CA mode): If ACK received: W = W+1/W (W=W+1 for each W packets) If packet is lost: W = W/2 (W halved in case of loss) The sending rate of TCP is:
For every W ACKs received, Single TCP Flow Router with large enough buffers for full link utilization t Window size Buffer size and RTT For every W ACKs received, send W+1 packets B Source Dest C’ > C C
Over-buffered Link
Under-buffered Link
Buffer = Rule-of-thumb Interval magnified on next slide
Microscopic TCP Behavior When sender pauses, buffer drains one RTT Drop
Origin of rule-of-thumb Before and after reducing window size, the sending rate of the TCP sender is the same Inserting the rate equation we get The RTT is part transmission delay T and part queuing delay B/C . We know that after reducing the window, the queueing delay is zero.
Rule-of-thumb Rule-of-thumb makes sense for one flow Typical backbone link has > 20,000 flows Does the rule-of-thumb still hold? Answer: If flows are perfectly synchronized, then Yes. If flows are desynchronized then No.
Buffer size is height of sawtooth t
If flows are synchronized t Aggregate window has same dynamics Therefore buffer occupancy has same dynamics Rule-of-thumb still holds.
Two TCP Flows Two TCP flows can synchronize
If flows are not synchronized Aggregate window has less variation Therefore buffer occupancy has less variation The more flows, the smaller the variation Rule-of-thumb does not hold.
If flows are not synchronized Probability Distribution B Buffer Size
Quantitative Model Model congestion window of a flow as random variable model as where For many de-synchronized flows We assume congestions windows are independent All congestion windows have the same probability distribution Now central limit theorem gives us queue length distribution
Required buffer size Simulation
Required buffer size 99.9% 99.5% 2× 98.0%
Small buffers help short flows Average flow completion times of 14 packet flows that share a congested bottleneck link with long-lived flows.
Experiments with backbone router GSR 12000, OC3 Line Card TCP Flows Router Buffer Link Utilization Pkts RAM Model Sim Exp 100 0.5 x 1 x 2 x 3 x 64 129 258 387 1Mb 2Mb 4Mb 8Mb 96.9% 99.9% 100% 94.7% 99.3% 99.8% 94.9% 98.1% 99.7% 400 32 128 192 512kb 99.2% 99.5% Thanks: Experiments conducted by Paul Barford and Joel Sommers, U of Wisconsin
What about Short Flows? So far we assumed long flows in congestion avoidance mode. What if traffic is mainly short flows in slow-start? Answer: Behavior is different, but In mixes of flows, long flows drive buffer requirements Required buffer for short flows is independent of line speed and RTT (same for 1Mbit/s or 40 Gbit/s)
A single, short-lived TCP flow Flow length 62 packets, RTT ~140 ms 32 Flow Completion Time (FCT) 16 8 4 fin ack received syn 2 RTT
Modelling TCP Flows vs. independent bursts Inter-Burst Arrival Time is greater than buffer size Therefore, we assume bursts are independent. Poisson arrivals of flows Arrivals of length Lflow (the flow length in packets) Poisson arrivals of bursts Four different poisson arrival processes of lengths 2,4,...
The M/G/1 Model TCP traffic is modelled as an M/G/1 arrival process: poisson arrivals of jobs with an arrival rate of Average queue length in jobs is: This gives us an average queue length in packets of Let's see if this works in practice...
Average Queue length
Queue Distribution To determine the required buffer, we need the queue distribution. Or at least the tail end of the queue distribution Buffer B Q Packet Loss P(Q = x) For M/G/1 queues there is no general solution for the queue distribution. We did two things (details are in the paper): Use M/G/1 processor sharing model (bad) Use Frank Kelly's effective bandwidth (good)
In Summary Buffer size is dictated by long TCP flows. 10Gb/s linecard with 200,000 x 56kb/s flows Rule-of-thumb: Buffer = 2.5Gbits Requires external, slow DRAM Becomes: Buffer = 6Mbits Can use on-chip, fast SRAM Completion time halved for short-flows 40Gb/s linecard with 40,000 x 1Mb/s flows Rule-of-thumb: Buffer = 10Gbits Becomes: Buffer = 50Mbits