Presentation is loading. Please wait.

Presentation is loading. Please wait.

Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,

Similar presentations


Presentation on theme: "Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,"— Presentation transcript:

1 Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm

2 Winter 2006EE384x2 The Problem  All packet switches (e.g. Internet routers, ATM switches) require packet buffers for periods of congestion.  Size: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data.  Speed: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Memory Linerate, R Memory Linerate, R Memory 1 N 1 N

3 Winter 2006EE384x3 An Example Packet buffers for a 40Gb/s router linecard Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns 10Gbits Buffer Manager Unpredictable Scheduler Requests

4 Winter 2006EE384x4 Memory Technology  Use SRAM? + Fast enough random access time, but - Too low density to store 10Gbits of data.  Use DRAM? + High density means we can store data, but - Can’t meet random access time.

5 Winter 2006EE384x5 Can’t we just use lots of DRAMs in parallel? Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Read/write 320B every 32ns 40-79Bytes: 0-39……………280-319 320B

6 Winter 2006EE384x6 Works fine if there is only one FIFO Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79Bytes: 0-39……………280-319 320B Buffer Memory 320B 40B 320B 40B 320B

7 Winter 2006EE384x7 Works fine if there is only one FIFO Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79Bytes: 0-39……………280-319 320B Buffer Memory 320B ?B 320B ?B 320B Variable Length Packets

8 Winter 2006EE384x8 In practice, buffer holds many FIFOs 40-79Bytes: 0-39……………280-319 320B 1 2 Q e.g.  In an IP Router, Q might be 200.  In an ATM switch, Q might be 10 6. Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 320B ?B 320B ?B How can we write multiple variable-length packets into different queues?

9 Winter 2006EE384x9 Problems 1. A 320B block will contain packets for different queues, which can’t be written to, or read from the same location. 2. If instead a different address is used for each memory, and packets in the 320B block are written to different locations, how do we know the memory will be available for reading when we need to retrieve the packet?

10 Winter 2006EE384x10 Arriving Packets R Unpredictable Scheduler Requests Departing Packets R 12 1 Q 2 1 2 34 34 5 123456 Small head SRAM cache for FIFO heads SRAM Hybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs 5768109 798 11 12141315 5052515354 868887899190 8284838586 929493 95 68791110 1 Q 2 Writing b bytes Reading b bytes cache for FIFO tails 55 56 9697 87 88 57585960 899091 1 Q 2 Small tail SRAM DRAM

11 Winter 2006EE384x11 Some Thoughts 1. What is the minimum SRAM needed to guarantee that a byte is always available in SRAM when requested? 2. What algorithm should we use to manage the replenishment of the SRAM “cache” memory?

12 Winter 2006EE384x12 An Example Q = 5, w = 9+, b = 6 t = 1 Bytes t = 3 Bytes t = 4 Bytes t = 5 Bytes t = 7 Bytes t = 2 Bytes t = 6 Bytes t = 0 Bytes Replenish

13 Winter 2006EE384x13 An Example Q = 5, w = 9+, b = 6 t = 8 Bytes t = 9 Bytes … t = 10 Bytes t = 11 Bytes t = 12 Bytes t = 13 Bytes Replenish … t = 19 Bytes Replenish t = 23 Bytes Read

14 Winter 2006EE384x14 The size of the SRAM cache  Necessity:  How large does the SRAM cache need to be under any MMA?  Theorem: wQ > Q(b - 1)(2 + lnQ)  Sufficiency:  For a specific MMA, and for any pattern of arrivals, what is the smallest SRAM cache needed so that a byte is always available when requested?  For one particular algorithm: wQ = Qb(2 + lnQ) w Bytes Q w

15 Winter 2006EE384x15 Some Definitions  Occupancy: X(q,t)  The number of bytes in FIFO q (in SRAM) at time t.  Deficit: D(q,t) = w - X(q,t) w Q w occupancy deficit

16 Winter 2006EE384x16 Smallest SRAM cache Necessity

17 Winter 2006EE384x17 Smallest SRAM cache Necessity  In addition, each queue needs to hold (b – 1) bytes in case it is replenished with b bytes when only 1 byte has been removed.  Therefore, SRAM size must be at least: Qw > Q(b – 1)(2 + lnQ).

18 Winter 2006EE384x18 Most Deficit Queue First MMA Sufficiency  Algorithm: Every b timeslots, MDQF-MMA replenishes the queue with the largest deficit.  Theorem: With MDQF-MMA, an SRAM cache of size Qw > Qb(2 + lnQ) is sufficient. Examples: 1.40Gb/s linecard, b =640, Q =128: SRAM = 560kBytes 2.160Gb/s linecard, b =2560, Q =512: SRAM = 10MBytes

19 Winter 2006EE384x19 Reducing the size of the SRAM Intuition:  If we use a lookahead buffer to peek at the requests “in advance”, we can replenish the SRAM cache only when needed.  This increases the latency from when a request is made until the byte is available.  But because it is a pipeline, the issue rate is the same.

20 Winter 2006EE384x20 The ECQF-MMA Algorithm 2.Compute: Determine which queue will run into “trouble” soonest. green! 1.Lookahead: Next Q(b – 1) + 1 arbiter requests are known. Q(b-1) + 1 Requests in Lookahead Buffer b - 1 Q Queues 3.Replenish: Fetch b bytes for the “troubled” queue. Q b - 1 Queues

21 Winter 2006EE384x21 Example of ECQF-MMA: Q=4, b=4 t = 0 ; Green Critical Requests in lookahead buffer Queues t = 1 Queues Requests in lookahead buffer t = 2 Queues Requests in lookahead buffer t = 3 Requests in lookahead buffer t = 4 ; Blue Critical Requests in lookahead buffer t = 5 Requests in lookahead buffer t = 6 Requests in lookahead buffer t = 7 Requests in lookahead buffer t = 8 ; Red Critical Requests in lookahead buffer

22 Winter 2006EE384x22 Theorem Patient Arbiter: An SRAM cache of size Q(b – 1) bytes is sufficient to guarantee that a requested byte is available within Q(b – 1) + 1 request times. Algorithm is called ECQF-MMA (Earliest Critical Queue first). Example: 160Gb/s linecard, b =2560, Q =512: SRAM = 1.3MBytes, delay bound is 65  s (equivalent to 13 miles of fiber).

23 Winter 2006EE384x23 Maximum Deficit Queue First with Latency (MDQFL-MMA)  What if application can only tolerate a latency l max < Q(b – 1) + 1 timeslots?  Algorithm: Maximum Deficit Queue First with latency (MDQFL-MMA) services a queue, once every b timeslots in the following order: 1. If there is an earliest critical queue, replenish it. 2. If not, then replenish the queue that will have the most deficit l max timeslots in the future.

24 Winter 2006EE384x24 Pipeline Latency, x SRAM Size Queue Length for Zero Latency Queue Length for Maximum Latency Queue length vs. Pipeline depth Q=1000, b = 10


Download ppt "Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,"

Similar presentations


Ads by Google