Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown

Similar presentations


Presentation on theme: "1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown"— Presentation transcript:

1 1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown E-mail: gireesh@stanford.edu

2 2 Packet Buffering  Big: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data.  Fast: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Input or Output Line CardShared Memory Buffer Line rate, R Memory 1 N 1 N Scheduler Memory Scheduler

3 3 An Example Packet buffers for a 40Gb/s line card Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns 10Gbits Buffer Manager Problem is solved if a memory can be (random) accessed every 4 ns and store 10Gb of data Scheduler requests causes random access

4 4 How can we design high speed packet buffers from commodity available memories? Key Question

5 5 Available Memory Technology  Use SRAM? + Fast enough random access time, but - Too low density to store 10Gbits of data.  Use DRAM? + High density means we can store data, but - Can’t meet random access time.

6 6 Can’t we just use lots of DRAMs in parallel? Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 320B 40B 320B Scheduler Requests

7 7 Works fine if there is only one FIFO queue Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager (on chip SRAM) 320B 40B 320B 40B 320B 40B 320B Scheduler Requests Aggregate 320B for the queue in fast SRAM and read and write to all DRAMs in parallel

8 8 In practice, buffer holds many FIFOs 320B 1 2 Q e.g.  In an IP Router, Q might be 200.  In an ATM switch, Q might be 10 6. Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager (on chip SRAM) 320B ?B 320B ?B We don’t know which head of line packet the scheduler will request next? 40B 320B Scheduler Requests

9 9 Buffer Manager Arriving Packets R Scheduler Requests Departing Packets R 12 1 Q 2 1 2 34 34 5 123456 Small head SRAM cache for FIFO heads (ASIC with on chip SRAM) Parallel Packet Buffer Hybrid Memory Hierarchy cache for FIFO tails 55 56 9697 87 88 57585960 899091 1 Q 2 Small tail SRAM Large DRAM memory holds the body of FIFOs 5768109 798 11 12141315 5052515354 868887899190 8284838586 929493 95 68791110 1 Q 2 Writing B cells Reading B cells DRAM B = degree of parallelism

10 10 Objective  Would like to Minimize the size of SRAM while providing reasonable guarantees  So, ask the following question If the designer is willing to tolerate a certain drop probability then how small can the SRAM get?

11 11 Memory Management Algorithm  Algorithm: At every service opportunity serve a FIFO from the set of FIFOs with occupancy greater than or equal to B  B-work conserving - thus minimizes SRAM size  Round-robin performs as well as largest FIFO first  Some definitions  FIFO occupancy counter: L(i,t)  Sum of occupancies: L(t)

12 12 Model A(t)D(t) A(1,t) A(Q,t)  Model SRAM as a queue  Arrival process A(t) superposition of Q sources A(i,t) with rates  Deterministic service at rate 1  Queue is stable, i.e.,  Approach: assume A(i,t) are independent of each other  Step 1: Analyze for IID sources  Step 2: Show that the IID case is the worst case  Tools used  Analysis in continuous time domain  Use L(t)

13 13 Fixed Batch Decomposition A(1,t) A(Q,t) B*MA(1,t) B*MA(Q,t) B*MD(t) B*ML(t) R(1,t) R(Q,t) Arrivals Departures L(t) Remainder Workload Quotient Workload R(t)

14 14 Assumptions A(i,t) are 1. independent of each other 2. stationary and ergodic 3. simple point processes

15 15 PDF of SRAM Occupancy  Theorem: The quotient workload and the remainder workload are independent of each other  Thus The distribution of SRAM occupancy is the convolution of the distributions of the quotient and remainder workloads

16 16 PDF of Remainder Workload  Theorem: For large Q, PDF of remainder workload approaches a Gaussian distribution with mean Q(B-1)/2 & variance Q(B^2-1)/12  Intuition: Application of central limit theorem

17 17 PDF of Quotient Workload  Theorem [Cao, Ramanan INFOCOM 2002]: For large Q, the behavior of the quotient FIFO approaches the behavior of an M/D/1 queue with the same load  Numerical solution through recurrence relations  Depends only on load  Independent of Q and B  Close to impulse at low loads

18 18 PDF of Buffer Occupancy  Q = 1024; B = 4; Q(B-1)/2 = 1536

19 19 Simulations (load=0.9)  Complementary CDF for Q = 1024; B = 4; load = 0.9  Theory upper bounds simulations

20 20 Conclusions  Established exact bounds relating the drop probability to the SRAM size  Model may be applicable to many queueing systems with batch service  Compared to deterministic guarantees ([Iyer, McKeown HPSR 2001]), an improvement by at most a factor of two  O(QB) a hard lower bound for this architecture


Download ppt "1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown"

Similar presentations


Ads by Google