Presentation is loading. Please wait.

Presentation is loading. Please wait.

Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.

Similar presentations


Presentation on theme: "Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer."— Presentation transcript:

1 Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, nickm)@stanford.edu Departments of Electrical Engineering & Computer Science, Stanford University

2 Stanford University 2 Problem Statement Motivation: To design an extremely high speed packet buffer.

3 Stanford University 3 Problem Statement ….. 1 Buffer Memory OC768 =40Gb/sec 64 byte cells How to design a buffer with an access time of 6.4ns ? Write Rate R 1 cell in 12.8 ns Read Rate R 1 cell in 12.8 ns 1 2 1 3 2

4 Stanford University 4 Problem Statement ….. 2 Buffer Memory How to create a buffer of size 10Gbit? 123456789 …… R = 40Gb/sec RTT = 0.25s RTT * R = 10Gbit

5 Stanford University 5 System Design Parameters Main Parameters –SRAM Size –Latency faced by a cell System Parameters –I/O Bandwidth

6 Stanford University 6 Packet Buffer Design R.......... R C Egress SRAM Buffer Ingress SRAM Buffer DRAMs SRAM Buffer Area = A Number of Queues = Q b cells Memory Management Algorithm

7 Stanford University 7 Today’s Talk… Optimize Main Parameters –Minimize SRAM size at the cost of latency …….. (earlier) Minimize Latency at the cost of SRAM size Assumptions on system parameters No speedup on I/O –I/O = 2R Simple address architecture –Use single address from every DRAM

8 Stanford University 8 Symmetry Argument The analysis and working of the ingress and egress buffer architectures are similar We shall analyze only the egress buffer architecture

9 Stanford University 9 Importance of Lookahead DRAM SRAM Bandwidth Wasted Unnecessarily need to be buffered => More SRAM Size

10 Stanford University 10 Definitions Latency : Time from a new arbiter request to the time when the cell is in SRAM Lookahead : Number of time slots in future for which the scheduler’s grant pattern is known Latency is sufficient to create lookahead but not necessary

11 Stanford University 11 Latency giving Lookahead Latency Lookahead time Lt = amount of lookahead available for MMA

12 Stanford University 12 More Definitions Replenishment List : Time ordered list of grants received from the arbiter for which a corresponding cell from the DRAM has not been read Critical Queue : A queue that has “b” grants in the RL Most Critical Queue : The earliest critical queue

13 Stanford University 13 Most Critical Queue First Algorithm (MCQFA) Lookahead L= q(b-1) + 1 Always picks the Most Critical Queue and services it Buffer Size Required = 2qb

14 Stanford University 14 Assumptions Lookahead is given –I.e a small number of cell requests in the future is given No additional latency –When a request is actually made, the cell should be present in the SRAM Packed set of requests –I.e. all cell slots have exactly one request made by the arbiter –Later on, we shall relax this assumption

15 Stanford University 15 Architecture of SRAM Bag of CellsRectangular Buffer q b cells

16 Stanford University 16 Why a bag ? Bag of cells : A shared buffer amongst cells belonging to different queues Why a bag ? –A queue can turn critical more than once –Prefetching more than b cells from the same queue require a bag of cells

17 Stanford University 17 Analysis of MCQFA |RL(t) | >= Lookahead (trivial) |RL(t) | >= q(b-1) +1 –because Lookahead = q(b-1) +1 RL(t) >= q(b-1) + 1 means at least one critical queue –Proof by Pigeon Hole Principle

18 Stanford University 18 Theorem A lookahead of q(b-1)+1 guarantees a cell to be present in the SRAM to satisfy that request Proof : For SRAM to be empty atleast b+1 cells are requested from that Queue (hence critical) Time taken for this queue to get refreshed = pb (p->no of more critical queues)

19 Stanford University 19 Theorm DRAM would have taken no more than pb time for refreshing the other queues DRAM would have refreshed this queue by time (p+1)b But the b+1 th request could have come earliest by time (p+1)b +1. Hence the corresponding cell to grant the b+1 th request would already have been present

20 Stanford University 20 Theorm Therefore, with a lookahead of Q(b-1)+1 and a buffer of size 2qb we guarantee that a cell is always present in the SRAM for any request

21 Stanford University 21 When not packed A critical queue might not exist in q(b-1)+1 Upto b requests for any queue can be satisified by the rectangular SRAM The queue will turn critical and the corresponding cells are brought from DRAM

22 Stanford University 22 A dose of reality Typical values –“b” is typically <= 10 –Q = Np, where N = # of ports (for VOQ) p = number of classes per port Implementations –VOQ N = 32, p = 1, Q = 2 5, b = 2 3, SRAM = 20KBytes Latency = 3.2usec –Diffserv N = 32, p = 16, Q = 2 9, b = 2 3, SRAM = 320KBytes Latency = 25.6usec

23 Stanford University 23 Necessity Traffic Pattern time b=2 x=4 q = 6 MMA Scheduler

24 Stanford University 24 Necessity Traffic Pattern time b=2 x=4 MMA Scheduler Iteration 1

25 Stanford University 25 Necessity Traffic Pattern time b=2 x=4 MMA Scheduler Iteration 1

26 Stanford University 26 Necessity Traffic Pattern time b=2 x=4 MMA Scheduler Iteration 1 Iteration 2

27 Stanford University 27 Necessity Traffic Pattern time b=2 x=4 MMA Scheduler Iteration 1 Iteration 2Iteration 3

28 Stanford University 28 Necessity Traffic Pattern time b=2 x=4 MMA Scheduler Iteration 1 Iteration 2Iteration 3

29 Stanford University 29 Necessity Traffic Pattern Analysis: INo. Stime Refreshed Queues Queues Remaining 0 0 0 Q 1 Q-x (Q-x)/b Q1= Q-(Q-x)/b 2 Q-x+Q1 Q1/b Q2= Q1 – Q1/b 3 Q-x+Q1+Q2 Q2/b Q3= Q2 – Q2/b

30 Stanford University 30 Necessity Traffic Pattern In 1 st iteration –Q1(b-1/b) queues with deficit 2 In 2 nd iteration –Q1(b-1/b) 2 queues with deficit 3 In xth iteration –Q1(b-1/b) x = 1 queues with deficit x+1 X = log (b/b-1) Q1 = ln Q1/ ln (1 +1/b-1) ; (Use ln (1+x) = x) = ln Q1(b-1) = ln (Q-(Q-x)/b)

31 Stanford University 31 Future Work Analysis of Space-Latency Continuum Analysis of other parameters –Relaxing I/O, address constraints Still a long way to go…


Download ppt "Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer."

Similar presentations


Ads by Google