Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)

Slides:



Advertisements
Similar presentations
1 Scheduling Reserved Traffic in Input-Queued Switches: New Delay Bounds via Probabilistic Techniques Matthew Andrews and Milan Vojnović Bell Labs, Lucent.
Advertisements

Configuring a Load-Balanced Switch in Hardware Srikanth Arekapudi, Shang-Tse (Da) Chuang, Isaac Keslassy, Nick McKeown Stanford University.
Optimal-Complexity Optical Router Hadas Kogan, Isaac Keslassy Technion (Israel)
Lecture 12. Emulating the Output Queue So far we have shown that it is possible to obtain the same throughput with input queueing as with output queueing.
Michael Markovitch Joint work with Gabriel Scalosub Department of Communications Systems Engineering Ben-Gurion University Bounded Delay Scheduling with.
Modeling the Interactions of Congestion Control and Switch Scheduling Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty.
Submitters: Erez Rokah Erez Goldshide Supervisor: Yossi Kanizo.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Fast Matching Algorithms for Repetitive Optimization Sanjay Shakkottai, UT Austin Joint work with Supratim Deb (Bell Labs) and Devavrat Shah (MIT)
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS, Technion), Isaac Keslassy (EE, Technion)
CS 268: Router Design Ion Stoica March 1, 2004.
1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Input-Queued.
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
A Load-Balanced Switch with an Arbitrary Number of Linecards Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
EE 122: Router Design Kevin Lai September 25, 2002.
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Scheduling in Delay Graphs with Applications to Optical Networks Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis.
Fundamental Complexity of Optical Systems Hadas Kogan, Isaac Keslassy Technion (Israel)
1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Optimal Load-Balancing Isaac Keslassy (Technion, Israel), Cheng-Shang Chang (National Tsing Hua University, Taiwan), Nick McKeown (Stanford University,
1 Netcomm 2005 Communication Networks Recitation 5.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Computer Networks Switching Professor Hui Zhang
Load Balanced Birkhoff-von Neumann Switches
Nick McKeown CS244 Lecture 7 Valiant Load Balancing.
High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.
Summary of switching theory Balaji Prabhakar Stanford University.
Engineering Jon Turner Computer Science & Engineering Washington University Coarse-Grained Scheduling for Multistage Interconnects.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
Improving Matching algorithms for IQ switches Abhishek Das John J Kim.
Scheduling algorithms for CIOQ switches Balaji Prabhakar.
A Load-Balanced Switch with an Arbitrary Number of Linecards Offense Anwis Das.
Input buffered switches (1)
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
scheduling for local-area networks”
Lower bound for the Stable Marriage Problem
CS 268: Router Design Ion Stoica February 27, 2003.
Packet Forwarding.
Addressing: Router Design
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Presentation transcript:

Packet-Mode Emulation of Output-Queued Switches David Hay, CS, Technion Joint work with Hagit Attiya (CS) and Isaac Keslassy (EE)

Outline Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff  Emulation with S  4  Emulation with S  2 Emulation of OQ switch w/ bounded buffer Simulation Results

CIOQ Switches

Cell-Mode Scheduling

Trend towards Packet-Mode Cell-mode scheduling is getting too hard  Fragmentation and reassembly should work very fast, at the external rate  Extra header for each cell  loss of bandwidth For optical switches such fragmentation and reassembly are prohibitive Cell-mode schedulers are packet-oblivious  Degradation of the overall performance

Packet-Mode Scheduling

No need for fragmentation and reassembly Must ensure contiguous packet delivery over the fabric  While input i delivers a packet to output j, neither input i nor output j can handle other packets. Can packet-mode schedulers provide similar performance guarantees as cell-mode schedulers? [Marsan et al., 2002][Ganjali et al., 2003][Turner, 2006]

Output Queuing Emulation OQ switches are considered optimal with respect to queuing delay and throughput  But too hard to implement in practice… Emulation: Same input traffic  same output traffic How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

Output Queuing Emulation OQ switches are considered optimal with respect to queuing delay and throughput  But too hard to implement in practice… Emulation: Same input traffic  same output traffic How hard is it for cell-mode / packet-mode CIOQ switch to emulate OQ switch?

Easy with speedup S=N  N scheduling decisions every time-slot:  In the 1 st decision forward the cell of input 1  In the 2 nd decision forward the cell of input 2 ⋮  In the N th decision forward the cell of input N Cell-Mode Emulation is Possible

Easy with speedup S=N  N scheduling decisions every time-slot:  In the 1 st decision forward the cell of input 1  In the 2 nd decision forward the cell of input 2 ⋮  In the N th decision forward the cell of input N Cell-Mode Emulation is Possible

1 st Key Concept: Slackness of a cell (in the input side) L(C) = OC(C) - IT(C) Slackness may decrease by at most 2 in every time-slot  A cell leaves the destination of C  OC--  A cell arrives at the input and is queued before C  IT++ Initial slackness can be made non-negative  When C arrive, Insert it in the OC(C) th place of its input buffer. Plan: Ensure that slackness always increases by 2  Slackness is never negative  All cells are delivered on time Cell-Mode Emulation w/ S=2 [Chuang et al.,1999] Input Thread: (“bad guys”) How many cells proceed C in its input-port buffer? Output Cushion: (“good guys”) How many cells are queued in the output-buffer of C’s destination, and should leave the OQ switch before C

Stable Marriage (stable matching): Given two equal-size sets M,W and preference lists from every m  M, w  W. Find a matching in which there are no two pairs (m,w),(m’,w’) s.t.  m prefer w’ over w  w’ prefer m over m Classical problem in CS  Stable marriage always exists  Many algorithms.. Cell-Mode Emulation w/ S=2 [Chuang et al.,1999]

Critical Cell First (CCF) algorithm performs stable marriage at each decision:  M is the set of inputs, W is the set of outputs  i prefers o 1 over o 2 if there is a cell for o 1 that is queued before all cells for o 2  o prefers i 1 over i 2 if there is a cell from i 1 that should leave before all cells from i 2 Cell-Mode Emulation w/ S=2 [Chuang et al.,1999]

For each cell C from input-port i to output port j, and each scheduling decision:  C is forwarded (and we don’t care about it)  C’ was forwarded from i, and i preferred to forward it  IT--  C’ was forwarded to j, and j preferred to receive it  OC++ Two scheduling decisions every time-slots  Slackness always increases by 2 Cell-Mode Emulation w/ S=2 [Chuang et al.,1999]

Easy with speedup S=N Possible with speedup S=2 (w/ CCF)  Lower bound: S≥2-1/N is required [Chuang et al.,1999] Cell-Mode Emulation What is the speedup required for packet-mode emulation?

Outline Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff  Emulation with S  4  Emulation with S  2 Emulation of OQ switch w/ bounded buffer Simulation Results

Packet-Mode Emulation is Impossible Regardless of speedup  Even with speedup S=N

Packet-Mode Emulation is Impossible

Outline Cell-Mode Scheduling vs. Packet-Mode Scheduling Impossibility of an Exact Emulation Speedup-RQD Tradeoff  Emulation with S  4  Emulation with S  2 Emulation of OQ switch w/ bounded buffer Simulation Results

Emulation w/ Relative Queuing Delay The CIOQ switch is allowed a bounded lag behind the shadow OQ switch  Exact same behavior as the optimal OQ switch, but with some extra delay  Called relative queuing delay Can we provide packet-mode OQ emulation with bounded RQD and small speedup?

Our Results: Speedup-RQD tradeoff Speedup RQD 2 4 2L max Lower bound on RQD (even with infinite speedup) Lower bound on the speedup (from cell-mode scheduling) Generalization of cell-mode scheduling with S=2: Taking each packet of size ≤ L max as one huge cell L max = maximum packet size (known value)

Intuition for Emulation Algorithms Packet Mode CIOQ Packet Mode OQ Cell Mode CIOQ w/ S=2

PIFO Cell-Mode OQ Switch FIFO = First-In First-Out

PIFO Cell-Mode OQ Switch FIFO = First-In First-Out PIFO = Push-In First-Out

PIFO Cell-Mode OQ Switch FIFO = First-In First-Out PIFO = Push-In First-Out  FIFO Packet-Mode OQ Switch is a PIFO Cell-Mode Switch

Underlying CCF Algorithm Cell-Mode CIOQ w/ CCF (and speedup S=2) emulates any PIFO cell- mode OQ switch [Chuang et al.,1999]  But, CCF does not maintain contiguous packet forwarding over the fabric! Packet Mode CIOQ Packet Mode OQ Cell Mode CIOQ w/ S=2 PIFO Cell-Mode OQ =

Intuition for Emulation Algorithms Packet Mode CIOQ Packet Mode OQ Cell Mode CIOQ w/ S=2 Two sub-steps: 1.Framing 2.Contiguous Decomposition

Frame-Based Schedulers Works in pipelined frame-based manner Within each frame: Build a demand matrix for this frame Schedule the demand matrix of the previous frame time

At each frame of size T, CCF forwards at most 2T cells from each input and to each output. Building the Demand Matrix Number of cells CCF sent from input 1 to output 1 in the last frame ≤ 2T ≤ ≤ ≤ ≤ Problem: A packet may span several frames. 2T

Building the Demand Matrix Count only packets whose last cell is forwarded by the CCF in the frame Each row/column in the matrix is bounded by 2T+N(L max -1)  For each input-output pair only cells of one additional packet can be added. Translates into RQD of 2T+(L max -2).

Intuition for Emulation Algorithms Packet Mode CIOQ Packet Mode OQ Cell Mode CIOQ w/ S=2 Two sub-steps: 1.Framing 2.Contiguous Decomposition

Decomposing the Demand Matrix Challenge: Decompose the matrix into permutations while maintaining contiguous packet delivery.  Each permutation dictates a scheduling decision. First try: optimal Birkhoff von-Neumann decomposition results in 2T+N(L max -1) permutations.

Contiguous Greedy Decomposition To maintain contiguous packet delivery:  If (i,j) was matched in iteration t-1 and there are more (i,j) cells to schedule  keep for iteration t. Find a greedy matching for the rest of the matrix.  Speedup: RQD: 2T+L max -1

Our Results: Speedup-RQD tradeoff Speedup RQD 2 4 2L max S=4+ (N(L max -1))/T RQD = 2T+L max -1 Next…

Intuition for Emulation Algorithms Packet Mode CIOQ Packet Mode OQ Cell Mode CIOQ w/ S=2 Two sub-steps: 1.Framing 2.Contiguous Decomposition

Emulation w/ S  2 - Framing Keep a separate demand matrix for every possible packet size Example: Possible packets sizes are 3,4,6 # of size 3 packets # of size 4 packets # of size 6 packets

Emulation w/ S  2 - Framing Concatenate packets of the same size into mega-packets of size k =LCM(1,…,L max ) Leftover matrix for each size m size 6 size 4size 3Mega Packets (of size 12)

Emulation w/ S  2 - Framing Concatenate packets of the same size into mega-packets of size k =LCM(1,…,L max ) Leftover matrix for each size m size 6 size 4size 3Mega Packets (of size k=12)

Emulation w/ S  2 - Framing Concatenate packets of the same size into mega-packets of size k =LCM(1,…,L max ) Leftover matrix for each size m size 6 size 4size 3 (leftovers) Mega Packets (of size 12)

Emulation w/ S  2 - Framing Concatenate packets of the same size into mega-packets of size k =LCM(1,…,L max ) Leftover matrix for each size m size 6 size 4 (leftovers) size 3 (leftovers) Mega Packets (of size 12)

Emulation w/ S  2 - Framing Concatenate packets of the same size into mega-packets of size k =LCM(1,…,L max ) Leftover matrix for each size m size 6 (leftovers) size 4 (leftovers) size 3 (leftovers) Mega Packets (of size 12)

Emulation w/ S  2 - Framing Sum of each row/column is bounded  For mega packets matrix: ≤ (2T+N(L max -1))/ k  For each leftover matrix of size m: ≤ N( k -1)/m size 6 (leftovers) size 4 (leftovers) size 3 (leftovers) Mega Packets (of size 12) < 12/3 < 12/4 < 12/6

Emulation w/ S  2 - Decomposition Optimally decompose (w/ Birkhoff von- Neumann) the mega-packets matrix and then the leftover matrices Bound on the mega- packets matrix Hold each permutation k times for contiguous (mega)-packet delivery

Our Results: Speedup-RQD tradeoff Speedup RQD 2 4 2L max S=4+ (N(L max -1))/T RQD = 2T+L max -1 S=2+(NkL max -1)/T RQD = 2T+Lmax-1

Wrap-up Packet-mode scheduling can be done with the same speedup as cell-mode scheduling  With the price of bounded RQD  Future work: lower bounds ??

Thank You!