1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown

Slides:



Advertisements
Similar presentations
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Advertisements

Sundar Iyer Winter 2012 Lecture 8a Packet Buffers with Latency EE384 Packet Switch Architectures.
Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego.
Optimal-Complexity Optical Router Hadas Kogan, Isaac Keslassy Technion (Israel)
Design and Analysis of a Robust Pipelined Memory System Hao Wang †, Haiquan (Chuck) Zhao *, Bill Lin †, and Jun (Jim) Xu * † University of California,
High-Performance Networking Group Isaac Keslassy, Nick McKeown
Sizing Router Buffers Guido Appenzeller Isaac Keslassy Nick McKeown Stanford University.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
Router Architecture : Building high-performance routers Ian Pratt
Queuing Theory For Dummies
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Analysis of a Statistics Counter Architecture Devavrat Shah, Sundar Iyer, Balaji Prabhakar & Nick McKeown (devavrat, sundaes, balaji,
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Crossbar Switches Crossbar switches are an important general architecture for fast switches. 2 x 2 Crossbar Switches A general N x N crossbar switch.
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Input-Queued.
Sizing Router Buffers (Summary)
Sizing Router Buffers Nick McKeown Guido Appenzeller & Isaac Keslassy SNRC Review May 27 th, 2004.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Statistical.
Modeling TCP in Small-Buffer Networks
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
EE 122: Router Design Kevin Lai September 25, 2002.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Statistical Approach to NoC Design Itamar Cohen, Ori Rottenstreich and Isaac Keslassy Technion (Israel)
Fundamental Complexity of Optical Systems Hadas Kogan, Isaac Keslassy Technion (Israel)
Ph. D Oral Examination Load-Balancing and Parallelism for the Internet Stanford University Ph.D. Oral Examination Tuesday, Feb 18 th 2003 Sundar Iyer
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, Departments.
1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
CS144, Stanford University Error in Q3-7. CS144, Stanford University Using longest prefix matching, the IP address will match which entry? a /8.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Load Balanced Birkhoff-von Neumann Switches
Network Processor Algorithms: Design and Analysis Stochastic Networks Conference Montreal July 22, 2004 Balaji Prabhakar Stanford University.
Sizing Router Buffers How much packet buffers does a router need? C Router Source Destination 2T The current “Rule of Thumb” A router needs a buffer size:
MIT Fun queues for MIT The importance of queues When do queues appear? –Systems in which some serving entities provide some service in a shared.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
Designing Packet Buffers for Router Linecards Sundar Iyer, Ramana Kompella, Nick McKeown Reviewed by: Sarang Dharmapurikar.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Network Simulation Motivation: r learn fundamentals of evaluating network performance via simulation Overview: r fundamentals of discrete event simulation.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
Techniques for Fast Packet Buffers Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
scheduling for local-area networks”
Sachin Katti, CS244 Slides courtesy: Nick McKeown
Weren’t routers supposed
Parallelism in Network Systems Joint work with Sundar Iyer
EE384x: Packet Switch Architectures
Memory Management Algorithms Huan Liu, Damon Mosk-Aoyama
COMP/ELEC 429 Introduction to Computer Networks
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Techniques and problems for
Techniques for Fast Packet Buffers
Presentation transcript:

1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown

2 Packet Buffering  Big: For TCP to work well, the buffers need to hold one RTT (about 0.25s) of data.  Fast: Clearly, the buffer needs to store (retrieve) packets as fast as they arrive (depart). Input or Output Line CardShared Memory Buffer Line rate, R Memory 1 N 1 N Scheduler Memory Scheduler

3 An Example Packet buffers for a 40Gb/s line card Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns 10Gbits Buffer Manager Problem is solved if a memory can be (random) accessed every 4 ns and store 10Gb of data Scheduler requests causes random access

4 How can we design high speed packet buffers from commodity available memories? Key Question

5 Available Memory Technology  Use SRAM? + Fast enough random access time, but - Too low density to store 10Gbits of data.  Use DRAM? + High density means we can store data, but - Can’t meet random access time.

6 Can’t we just use lots of DRAMs in parallel? Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory 320B 40B 320B Scheduler Requests

7 Works fine if there is only one FIFO queue Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager (on chip SRAM) 320B 40B 320B 40B 320B 40B 320B Scheduler Requests Aggregate 320B for the queue in fast SRAM and read and write to all DRAMs in parallel

8 In practice, buffer holds many FIFOs 320B 1 2 Q e.g.  In an IP Router, Q might be 200.  In an ATM switch, Q might be Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager (on chip SRAM) 320B ?B 320B ?B We don’t know which head of line packet the scheduler will request next? 40B 320B Scheduler Requests

9 Buffer Manager Arriving Packets R Scheduler Requests Departing Packets R 12 1 Q Small head SRAM cache for FIFO heads (ASIC with on chip SRAM) Parallel Packet Buffer Hybrid Memory Hierarchy cache for FIFO tails Q 2 Small tail SRAM Large DRAM memory holds the body of FIFOs Q 2 Writing B cells Reading B cells DRAM B = degree of parallelism

10 Objective  Would like to Minimize the size of SRAM while providing reasonable guarantees  So, ask the following question If the designer is willing to tolerate a certain drop probability then how small can the SRAM get?

11 Memory Management Algorithm  Algorithm: At every service opportunity serve a FIFO from the set of FIFOs with occupancy greater than or equal to B  B-work conserving - thus minimizes SRAM size  Round-robin performs as well as largest FIFO first  Some definitions  FIFO occupancy counter: L(i,t)  Sum of occupancies: L(t)

12 Model A(t)D(t) A(1,t) A(Q,t)  Model SRAM as a queue  Arrival process A(t) superposition of Q sources A(i,t) with rates  Deterministic service at rate 1  Queue is stable, i.e.,  Approach: assume A(i,t) are independent of each other  Step 1: Analyze for IID sources  Step 2: Show that the IID case is the worst case  Tools used  Analysis in continuous time domain  Use L(t)

13 Fixed Batch Decomposition A(1,t) A(Q,t) B*MA(1,t) B*MA(Q,t) B*MD(t) B*ML(t) R(1,t) R(Q,t) Arrivals Departures L(t) Remainder Workload Quotient Workload R(t)

14 Assumptions A(i,t) are 1. independent of each other 2. stationary and ergodic 3. simple point processes

15 PDF of SRAM Occupancy  Theorem: The quotient workload and the remainder workload are independent of each other  Thus The distribution of SRAM occupancy is the convolution of the distributions of the quotient and remainder workloads

16 PDF of Remainder Workload  Theorem: For large Q, PDF of remainder workload approaches a Gaussian distribution with mean Q(B-1)/2 & variance Q(B^2-1)/12  Intuition: Application of central limit theorem

17 PDF of Quotient Workload  Theorem [Cao, Ramanan INFOCOM 2002]: For large Q, the behavior of the quotient FIFO approaches the behavior of an M/D/1 queue with the same load  Numerical solution through recurrence relations  Depends only on load  Independent of Q and B  Close to impulse at low loads

18 PDF of Buffer Occupancy  Q = 1024; B = 4; Q(B-1)/2 = 1536

19 Simulations (load=0.9)  Complementary CDF for Q = 1024; B = 4; load = 0.9  Theory upper bounds simulations

20 Conclusions  Established exact bounds relating the drop probability to the SRAM size  Model may be applicable to many queueing systems with batch service  Compared to deterministic guarantees ([Iyer, McKeown HPSR 2001]), an improvement by at most a factor of two  O(QB) a hard lower bound for this architecture