Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.

Slides:



Advertisements
Similar presentations
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Advertisements

Sundar Iyer Winter 2012 Lecture 8a Packet Buffers with Latency EE384 Packet Switch Architectures.
Optimal-Complexity Optical Router Hadas Kogan, Isaac Keslassy Technion (Israel)
Lecture 12. Emulating the Output Queue So far we have shown that it is possible to obtain the same throughput with input queueing as with output queueing.
Submitters: Erez Rokah Erez Goldshide Supervisor: Yossi Kanizo.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
Towards Simple, High-performance Input-Queued Switch Schedulers Devavrat Shah Stanford University Berkeley, Dec 5 Joint work with Paolo Giaccone and Balaji.
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
April 10, HOL Blocking analysis based on: Broadband Integrated Networks by Mischa Schwartz.
1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Fundamental Complexity of Optical Systems Hadas Kogan, Isaac Keslassy Technion (Israel)
Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,
1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.
COMP680E by M. Hamdi 1 Course Exam: Review April 17 (in-Class)
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
1 Netcomm 2005 Communication Networks Recitation 5.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.
1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Load Balanced Birkhoff-von Neumann Switches
Belgrade University Aleksandra Smiljanić: High-Capacity Switching High-Capacity Packet Switches.
Nick McKeown CS244 Lecture 7 Valiant Load Balancing.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.
Summary of switching theory Balaji Prabhakar Stanford University.
The Router SC 504 Project Gardar Hauksson Allen Liu.
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
An Introduction to Packet Switching Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Improving Matching algorithms for IQ switches Abhishek Das John J Kim.
Topics in Internet Research: Project Scope Mehreen Alam
Reduced Rate Switching in Optical Routers using Prediction Ritesh K. Madan, Yang Jiao EE384Y Course Project.
Scheduling algorithms for CIOQ switches Balaji Prabhakar.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Input buffered switches (1)
scheduling for local-area networks”
Weren’t routers supposed
Packet Forwarding.
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Providing 100% throughput for non-uniform Bernoulli traffic
Presentation transcript:

Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006

2 Motivation  Network operators want performance guarantees  Throughput guarantee  Delay guarantee  High performance routers use crossbars  Hard to build crossbar-based routers with guarantees  My talk:  How a crossbar with a small amount of internal buffering can give guarantees

3 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

4 Generic Crossbar-Based Architecture Speedup of S Scheduler VOQs

5 Admissible Traffic  Traffic Matrix  Traffic is admissible if

6  100% Throughput  An algorithm delivers 100% throughput if for any admissible traffic the average backlog is finite Throughput Guarantee Speedup of S Scheduler

7 Previous Work Wave Front Arbiter [Tamir] Parallel Iterative Matching [Anderson et al.] iSLIP [McKeown] Longest Port First [Mekkittikul et al.] Maximum Weight Matching [McKeown et al.] Maximal Matching S=2 [Dai,Prabhakar] Heuristics Theoretically Proven

8 Maximal Matching Has Become Hard  TTX Switch Fabric  Uses maximal matching  Speedup less than 2  Consumes up to 8kW  Limited to ~2.5Tb/s  No 100% throughput guarantee

9 Traditional Crossbar  Crossbar Requirements  An input can send at most one cell  An output can receive at most one cell  Scheduling Problem  Must overcome two constraints simultaneously  New Crossbar  Relieve contention  Remove dependency between inputs and outputs

10 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

11 Buffered Crossbar  Arrival Phase  Scheduling Phases – Speedup of 2  Departure Phase

12 Scheduling Phase  Input Schedule  Each input selects in parallel a cell for an empty crosspoint  Output Schedule  Each output selects in parallel a cell from a full crosspoint

13 Example of Input/Output Scheduling  Round-robin Policy  Each input schedules in a round-robin order  Each output schedules in a round-robin order

14 Previous Work  Buffered Crossbar Simulations [Rojas-Cessa et al. 2001]  32x32 switch, Uniform Bernoulli Traffic, Round-Robin, S=1

15  Theorem 1  A buffered crossbar with speedup of 2 delivers 100% throughput for any admissible Bernoulli iid traffic using any work-conserving input/output schedules. 100% Throughput

16 Intuition of Proof ε <1-ε 12 1-ε ++ ε = 2- ε  When a flow is backed up, the services for this backlog exceeds the arrivals

17 Intuition of Proof Q ij = Queue Length 0 if buffer empty 1 if buffer full B ij =

18 Intuition of Proof  Recall  If Q ij > 0, then for X ij,  Expected increase is 2  Expected decrease If B ij = 1, then in output schedule one B *j will decrease If B ij = 0,then in input schedule one Q i* will decrease  Thus expected decrease is 2

19 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

20  Work-conserving Property  If there is a cell for a given output in the system, that output is busy. Work Conservation Output Queued (OQ) Switch

21 ? Emulating an OQ switch  Under identical inputs, the departure time of every cell from both switches is identical

22 4 Input Priority List  Label each cell with their corresponding departure times  Arrange input cells into an input priority list  Output selects crosspoint with earliest departure time 4

23 Input Priority List Good guy Bad guys Bad guy  Label each cell with their corresponding departure times  Arrange input cells into an input priority list  Output selects crosspoint with earliest departure time

24 Definitions  Output Margin – cells at its output with earlier departure time  Input Margin – cells ahead in input priority list destined to different outputs  Total Margin – Output Margin minus Input Margin good guys 2 bad guys

25 Emulation of FIFO OQ Switch  Scheduling Phase  Crosspoint is full – Output Margin will increase by one  Crosspoint is empty – Input Margin will decrease by one  Total Margin increases by two 1 83

26 Emulation of FIFO OQ Switch  Arrival Phase  Input Margin might increase by one  Departure Phase  Output Margin will decrease by one  Total Margin decreases by at most two

27 Emulation of FIFO OQ Switch  Lemma 1  For every time slot, total margin does not decrease

28 FIFO Insertion Policy  Arrival Phase  Cell for non-empty VOQ, insert behind cells for same output  Cell for empty VOQ, insert at head of input priority list

29 FIFO Insertion Policy  Lemma 2  An arriving cell will have a non-negative total margin 47

30  Theorem 2  A buffered crossbar with speedup of 2 can exactly emulate a FIFO OQ switch.  Result was shown independently  B. Magill, C. Rohrs, R. Stevenson, “Output-Queued Switch Emulation by Fabrics With Limited Memory”, in IEEE Journal on Selected Areas in Communications, pp , May  Theorem 3  A buffered crossbar with speedup of 2 can be work-conserving with a distributed algorithm. Emulation of FIFO OQ Switch

31 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

32 Delay Guarantees one output, many logical FIFO queues 1 m Weighted fair queueing sorts packets constrained traffic PIFO models  Weighted Fair Queueing  Weighted Round Robin  Strict priority etc. one output, single PIFO queue Push In First Out (PIFO) constrained traffic push

33 Achieving Delay Guarantees in Crossbars  Theorem 4  A crossbar switch with a speedup of 2 can exactly emulate an OQ switch which provides delay guarantees.  Theorem 5  A crossbar switch with a speedup of 2-1/N is necessary and sufficient to exactly emulate an NxN FIFO OQ switch.

34 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

35 3 Emulation of PIFO OQ Switch  Crosspoint Blocking  A cell in the crosspoint has a larger departure time  Swap Phase  If an arriving cell has a smaller departure time than the cell in the crosspoint, swap the two cells

PIFO Insertion Policy  Arrival Phase  Insert cell directly behind cell with departure time just earlier  If cell has earliest departure time, then insert at head of input priority list

37  Theorem 6  A buffered crossbar with speedup of 3 can exactly emulate an OQ switch with delay guarantees. PIFO Emulation

38 Output Linecard Header Scheduling Architecture Buffered Crossbar Input Linecard Headers Grants Header Scheduler

39 Header Scheduling  Schedule headers instead of cells  Headers are converted into grants in output schedule  Grants are sent back to the input

40 Output Linecard Grant Stream Buffered Crossbar Input Linecard Headers Grants Grant FIFO Header Scheduler  Input can receive N grants in one scheduling phase  Bounded to p+N-1 grants over p consecutive phases

41 Modified Buffered Crossbar  Modified Buffered Crossbar  N cells per crosspoint – requires N 3 cell buffers  N cells per output – requires N 2 cell buffers  Theorem 7  A modified buffered crossbar with speedup of 2 can emulate an OQ switch with delay guarantees with a fixed delay of N scheduling phases.

42 Summary  Buffered crossbars  Uses crosspoints to relieve contention  Inputs and outputs schedule independently and in parallel  Performance guarantees  Throughput – any work-conserving input/output schedule  Work Conservation – simple insertion policy  Delay – header scheduling

43 Relevant Papers  Crossbars  Shang-Tse Chuang, Ashish Goel, Nick McKeown, Balaji Prabhakar, “Matching Output Queuing with a Combined Input Output Queued Switch,” IEEE Journal on Selected Areas in Communications, vol.17, n.6, pp , Dec  Buffered Crossbars  Shang-Tse Chuang, Sundar Iyer, Nick McKeown, “Practical Algorithms for Performance Guarantees in Buffered Crossbars,” Proceedings of IEEE INFOCOM 2005, Miami, Florida, Mar