Presentation is loading. Please wait.

Presentation is loading. Please wait.

Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.

Similar presentations


Presentation on theme: "Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006."— Presentation transcript:

1 Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems dachuang@cisco.com EE384Y Thursday, April 27, 2006

2 2 Motivation  Network operators want performance guarantees  Throughput guarantee  Delay guarantee  High performance routers use crossbars  Hard to build crossbar-based routers with guarantees  My talk:  How a crossbar with a small amount of internal buffering can give guarantees

3 3 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

4 4 Generic Crossbar-Based Architecture Speedup of S Scheduler VOQs

5 5 Admissible Traffic  Traffic Matrix  Traffic is admissible if

6 6  100% Throughput  An algorithm delivers 100% throughput if for any admissible traffic the average backlog is finite Throughput Guarantee Speedup of S Scheduler

7 7 Previous Work 19851990199520002005 Wave Front Arbiter [Tamir] Parallel Iterative Matching [Anderson et al.] iSLIP [McKeown] Longest Port First [Mekkittikul et al.] Maximum Weight Matching [McKeown et al.] Maximal Matching S=2 [Dai,Prabhakar] Heuristics Theoretically Proven

8 8 Maximal Matching Has Become Hard  TTX Switch Fabric  Uses maximal matching  Speedup less than 2  Consumes up to 8kW  Limited to ~2.5Tb/s  No 100% throughput guarantee

9 9 Traditional Crossbar  Crossbar Requirements  An input can send at most one cell  An output can receive at most one cell  Scheduling Problem  Must overcome two constraints simultaneously  New Crossbar  Relieve contention  Remove dependency between inputs and outputs

10 10 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

11 11 Buffered Crossbar  Arrival Phase  Scheduling Phases – Speedup of 2  Departure Phase

12 12 Scheduling Phase  Input Schedule  Each input selects in parallel a cell for an empty crosspoint  Output Schedule  Each output selects in parallel a cell from a full crosspoint

13 13 Example of Input/Output Scheduling  Round-robin Policy  Each input schedules in a round-robin order  Each output schedules in a round-robin order

14 14 Previous Work  Buffered Crossbar Simulations [Rojas-Cessa et al. 2001]  32x32 switch, Uniform Bernoulli Traffic, Round-Robin, S=1

15 15  Theorem 1  A buffered crossbar with speedup of 2 delivers 100% throughput for any admissible Bernoulli iid traffic using any work-conserving input/output schedules. 100% Throughput

16 16 Intuition of Proof ε <1-ε 12 1-ε ++ ε = 2- ε  When a flow is backed up, the services for this backlog exceeds the arrivals

17 17 Intuition of Proof Q ij = Queue Length 0 if buffer empty 1 if buffer full B ij =

18 18 Intuition of Proof  Recall  If Q ij > 0, then for X ij,  Expected increase is 2  Expected decrease If B ij = 1, then in output schedule one B *j will decrease If B ij = 0,then in input schedule one Q i* will decrease  Thus expected decrease is 2

19 19 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

20 20  Work-conserving Property  If there is a cell for a given output in the system, that output is busy. Work Conservation Output Queued (OQ) Switch

21 21 ? Emulating an OQ switch  Under identical inputs, the departure time of every cell from both switches is identical

22 22 4 Input Priority List 576 5 6 1 1 2 9 2 3 8 3 1  Label each cell with their corresponding departure times  Arrange input cells into an input priority list  Output selects crosspoint with earliest departure time 4

23 23 Input Priority List 576 56 4 132 9 4 2 1 3 1 8 2 Good guy Bad guys Bad guy  Label each cell with their corresponding departure times  Arrange input cells into an input priority list  Output selects crosspoint with earliest departure time

24 24 Definitions 576 56 2 4 132 9 4 2 1 3  Output Margin – cells at its output with earlier departure time  Input Margin – cells ahead in input priority list destined to different outputs  Total Margin – Output Margin minus Input Margin 1 8 2 good guys 2 bad guys

25 25 Emulation of FIFO OQ Switch 576 56 2 4 12 9 4 2 1 3  Scheduling Phase  Crosspoint is full – Output Margin will increase by one  Crosspoint is empty – Input Margin will decrease by one  Total Margin increases by two 1 83

26 26 Emulation of FIFO OQ Switch 576 56 2 4 12 9 4 2 1 3  Arrival Phase  Input Margin might increase by one  Departure Phase  Output Margin will decrease by one  Total Margin decreases by at most two 1 83 3

27 27 Emulation of FIFO OQ Switch 576 56 2 4 2 9 4 2 3 833  Lemma 1  For every time slot, total margin does not decrease

28 28 FIFO Insertion Policy 56 4 2 9 4 2 3 8 576 3 23 47  Arrival Phase  Cell for non-empty VOQ, insert behind cells for same output  Cell for empty VOQ, insert at head of input priority list

29 29 FIFO Insertion Policy 576 56 2 4 2 9 4 2 3 833  Lemma 2  An arriving cell will have a non-negative total margin 47

30 30  Theorem 2  A buffered crossbar with speedup of 2 can exactly emulate a FIFO OQ switch.  Result was shown independently  B. Magill, C. Rohrs, R. Stevenson, “Output-Queued Switch Emulation by Fabrics With Limited Memory”, in IEEE Journal on Selected Areas in Communications, pp.606-615, May. 2003.  Theorem 3  A buffered crossbar with speedup of 2 can be work-conserving with a distributed algorithm. Emulation of FIFO OQ Switch

31 31 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

32 32 Delay Guarantees one output, many logical FIFO queues 1 m Weighted fair queueing sorts packets constrained traffic PIFO models  Weighted Fair Queueing  Weighted Round Robin  Strict priority etc. one output, single PIFO queue Push In First Out (PIFO) constrained traffic push

33 33 Achieving Delay Guarantees in Crossbars  Theorem 4  A crossbar switch with a speedup of 2 can exactly emulate an OQ switch which provides delay guarantees.  Theorem 5  A crossbar switch with a speedup of 2-1/N is necessary and sufficient to exactly emulate an NxN FIFO OQ switch.

34 34 Contents  Throughput Guarantees  Buffered Crossbar - 100% Throughput  Buffered Crossbar - Work Conservation  Delay Guarantees  Traditional Crossbar – Emulating an OQ Switch  Buffered Crossbar – Emulating an OQ Switch

35 35 3 Emulation of PIFO OQ Switch 576 56 2 4 1 9 4 2 1 2  Crosspoint Blocking  A cell in the crosspoint has a larger departure time  Swap Phase  If an arriving cell has a smaller departure time than the cell in the crosspoint, swap the two cells 1 83 67 5 3 2 1 4

36 36 1 3 5 67 PIFO Insertion Policy 57631 9 4 2 1 1 83 2  Arrival Phase  Insert cell directly behind cell with departure time just earlier  If cell has earliest departure time, then insert at head of input priority list 4 2 4 2 3 15

37 37  Theorem 6  A buffered crossbar with speedup of 3 can exactly emulate an OQ switch with delay guarantees. PIFO Emulation

38 38 Output Linecard Header Scheduling Architecture Buffered Crossbar Input Linecard Headers Grants Header Scheduler

39 39 Header Scheduling 2 9 4 3  Schedule headers instead of cells  Headers are converted into grants in output schedule  Grants are sent back to the input 1 183 1 4 2 2 5 56 367 4 2 2 2

40 40 Output Linecard Grant Stream Buffered Crossbar Input Linecard Headers Grants Grant FIFO Header Scheduler  Input can receive N grants in one scheduling phase  Bounded to p+N-1 grants over p consecutive phases

41 41 Modified Buffered Crossbar  Modified Buffered Crossbar  N cells per crosspoint – requires N 3 cell buffers  N cells per output – requires N 2 cell buffers  Theorem 7  A modified buffered crossbar with speedup of 2 can emulate an OQ switch with delay guarantees with a fixed delay of N scheduling phases.

42 42 Summary  Buffered crossbars  Uses crosspoints to relieve contention  Inputs and outputs schedule independently and in parallel  Performance guarantees  Throughput – any work-conserving input/output schedule  Work Conservation – simple insertion policy  Delay – header scheduling

43 43 Relevant Papers  Crossbars  Shang-Tse Chuang, Ashish Goel, Nick McKeown, Balaji Prabhakar, “Matching Output Queuing with a Combined Input Output Queued Switch,” IEEE Journal on Selected Areas in Communications, vol.17, n.6, pp.1030-1039, Dec. 1999.  Buffered Crossbars  Shang-Tse Chuang, Sundar Iyer, Nick McKeown, “Practical Algorithms for Performance Guarantees in Buffered Crossbars,” Proceedings of IEEE INFOCOM 2005, Miami, Florida, Mar. 2005.


Download ppt "Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006."

Similar presentations


Ads by Google