048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion isaac@ee.technion.ac.il http://comnet.technion.ac.il/~isaac/ Scaling

Spring 2006048866 – Packet Switch Architectures2 Achieving 100% throughput 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic, but known traffic matrix  Technique: Non-uniform schedule (Birkhoff-von Neumann) 4. Unknown traffic matrix  Technique: Lyapunov functions (MWM) 5. Faster scheduling algorithms  Technique: Speedup (maximal matchings)  Technique: Memory and randomization (Tassiulas)  Technique: Twist architecture (buffered crossbar) 6. Accelerate scheduling algorithm  Technique: Pipelining  Technique: Envelopes  Technique: Slicing 7. No scheduling algorithm  Technique: Load-balanced router

Spring 2006048866 – Packet Switch Architectures3 Outline Up until now, we have focused on high performance packet switches with: 1. A crossbar switching fabric, 2. Input queues (and possibly output queues as well), 3. Virtual output queues, and 4. Centralized arbitration/scheduling algorithm. Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?

Spring 2006048866 – Packet Switch Architectures4 Crossbar switch Limiting factors 1. N 2 crosspoints per chip, or N x N -to-1 multiplexors 2. It’s not obvious how to build a crossbar from multiple chips, 3. Capacity of “I/O”s per chip.  State of the art: About 300 pins each operating at 3.125Gb/s ~= 1Tb/s per chip.  About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup.  Crossbar chips today are limited by “I/O” capacity.

Spring 2006048866 – Packet Switch Architectures5 Scaling 1. Scaling Line Rate  Bit-slicing  Time-slicing 2. Scaling Time (Scheduling Speed)  Time-slicing  Envelopes  Frames 3. Scaling Number of Ports  Naïve approach  Clos networks  Benes networks

Spring 2006048866 – Packet Switch Architectures6 Bit-sliced parallelism Linecard (from each input) Cell Cell is “striped” across k identical planes. Scheduler makes same decision for all slices. However, doesn’t decrease scheduling speed Other problem(s)? Scheduler 8 7 6 5 4 3 2 1 k

Spring 2006048866 – Packet Switch Architectures7 Time-sliced parallelism Cell carried by one plane; takes k cell times. Centralized scheduler is unchanged. It works for each slice in turn. Problem: same scheduling speed Scheduler 8 7 6 5 4 3 2 1 k Cell Linecard (from each input) Cell

Spring 2006048866 – Packet Switch Architectures9 Time-sliced parallelism with parallel scheduling Now scheduling is distributed to each slice. Scheduler has k cell times to schedule Problem(s)? Slow Scheduler 2 1 k Cell Linecard (from each input) Cell 3 Slow Scheduler Slow Scheduler Slow Scheduler

Spring 2006048866 – Packet Switch Architectures10 Envelopes  Envelopes of k cells [Kar et al., 2000]  Problem: “Should I stay or should I go now?”  Waiting  starvation (“Waiting for Godot”)  Timeouts  loss of throughput Slow Scheduler Linecard (at each VOQ) Cell

Spring 2006048866 – Packet Switch Architectures11 Frames for scheduling  The slow scheduler simply takes its decision every k cell times and holds it for k cell times  Often associated with pipelining  Note: pipelined-MWM still stable (intuitively: the weight doesn’t change much)  Possible problem(s)? Slow Scheduler Linecard (at each VOQ) Cell

Spring 2006048866 – Packet Switch Architectures12 Scaling a crossbar  Conclusion:  Scaling the line rate is relatively straightforward (although the chip count and power may become a problem).  Scaling the scheduling decision is more difficult, and often comes at the expense of packet delay.  What if we want to increase the number of ports?  Can we build a crossbar-equivalent from multiple stages of smaller crossbars?  If so, what properties should it have?

Spring 2006048866 – Packet Switch Architectures14 Scaling number of outputs Naïve Approach 4 inputs 4 outputs Building Block: 16x16 crossbar switch: Eight inputs and eight outputs required!

Spring 2006048866 – Packet Switch Architectures15 3-stage Clos Network n x kn x k m x mm x m k x nk x n 1 N N = n x m k ≥ n 1 2 … m 1 2 … … … k 1 2 … m 1 N nn

Spring 2006048866 – Packet Switch Architectures16 With k = n, is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,4), (3,3), (4,2)

Spring 2006048866 – Packet Switch Architectures17 With k = n is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,2), (4,4), (5,3), … By rearranging matches, the connections could be added. Q: Is this Clos network “rearrangeably non-blocking”?

Spring 2006048866 – Packet Switch Architectures18 With k = n a Clos network is rearrangeably non-blocking Route matching is equivalent to edge-coloring in a bipartite multigraph. Colors correspond to middle-stage switches. (1,1), (2,4), (3,3), (4,2) Each vertex corresponds to an n x k or k x n switch. No two edges at a vertex may be colored the same. Vizing ‘64: a D -degree bipartite graph can be colored in D colors. (remember: Birkhoff-von Neumann Decomposition Theorem) Therefore, if k = n, a Clos network is rearrangeably non-blocking (and can therefore perform any permutation).

Spring 2006048866 – Packet Switch Architectures19 How complex is the rearrangement?  Method 1: Find a maximum size bipartite matching for each of D colors in turn, O( DN 2. 5 ).  Why does it work?  Method 2: Partition graph into Euler sets, O( N.logD ) [Cole et al. ‘00]

Spring 2006048866 – Packet Switch Architectures20 Euler partition of a graph Euler partition of graph G : 1.Each odd degree vertex is at the end of one open path. 2.Each even degree vertex is at the end of no open path.

Spring 2006048866 – Packet Switch Architectures21 Euler split of a graph Euler split of G into G 1 and G 2 : 1.Scan each path in an Euler partition. 2.Place each alternate edge into G 1 and G 2 G G1G1 G2G2

Spring 2006048866 – Packet Switch Architectures22 Edge-Coloring using Euler sets  Assume for simplicity that  the graph is regular (all vertices have the same degree, D ), and  D=2 i  Perform i “Euler splits” and 1-color each resulting graph. This is log D operations, each of O(E).

Spring 2006048866 – Packet Switch Architectures23 Implementation Scheduler Route connections Route connections Request graph PermutationPaths

Spring 2006048866 – Packet Switch Architectures24 Implementation Pros  A rearrangeably non-blocking switch can perform any permutation  A cell switch is time-slotted, so all connections are rearranged every time slot anyway Cons  Rearrangement algorithms are complex (in addition to the scheduler) Can we eliminate the need to rearrange?

Spring 2006048866 – Packet Switch Architectures25 Strictly non-blocking Clos Network Clos’ Theorem: If k >= 2n – 1, then a new connection can always be added without rearrangement.

Spring 2006048866 – Packet Switch Architectures26 Clos Theorem I1I1 I2I2 … ImIm O1O1 O2O2 … OmOm M1M1 M2M2 … … … MkMk n x k m x m k x n 1 N N = n x m k ≥ 2n-1 1 N nn

Spring 2006048866 – Packet Switch Architectures27 Clos Theorem IaIa ObOb 1 1 n k 1 n k 1.Consider adding the n -th connection between 1 st stage I a and 3 rd stage O b. 2.We need to ensure that there is always some center-stage M available. 3.If k > (n – 1) + (n – 1), then there is always an M available. i.e. we need k >= 2n – 1. n – 1 already in use at input and output. n-1 n? 1 n-1 n?

Spring 2006048866 – Packet Switch Architectures28 Benes networks Recursive construction

Spring 2006048866 – Packet Switch Architectures29 Benes networks Recursive construction

Spring 2006048866 – Packet Switch Architectures30 Scaling Crossbars: Summary  Scaling the bit-rate through parallelism is easy.  Scaling the scheduler is hard.  Scaling the number of ports is harder.  Clos network:  Rearrangeably non-blocking with k = n, but routing is complicated,  Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.  Benes network: scaling with small components

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.

Similar presentations

Presentation on theme: "048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.

Similar presentations

Presentation on theme: "048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling."— Presentation transcript:

Similar presentations

About project

Feedback