Presentation on theme: "Nick McKeown Spring 2012 Lecture 4 Parallelizing an OQ Switch EE384x Packet Switch Architectures."— Presentation transcript:
Nick McKeown Spring 2012 Lecture 4 Parallelizing an OQ Switch EE384x Packet Switch Architectures
Scaling an OQ Switch one output 1 k many outputs 1 k N N Not so clear. Work conserving if memory b/w >= R(N+1)
At most two memory operations per time slot: 1 write and 1 read Parallel OQ Switch May not be work-conserving 1 1 k=3 N=3 A C B 2 Time slot = 1 A5 A6 A7 A5 A6 A7 B5 B6 A8 B5 B6 A8 Time slot = 2 B6 B5 A8 C5 C6 Time slot = 3 Constant size packets
Problem How can we design a parallel OQ work- conserving switch from slower parallel memories? Work Conserving Theorem (sufficiency) A parallel output-queued switch is work-conserving with 3N –1 memories, each able to perform at most one memory operation per time slot.
Re-stating the Problem 1.There are K cages which can contain an infinite number of pigeons. 2.Assume that time is slotted, and in any one time slot a.At most N pigeons can arrive and at most N can depart. b.At most 1 pigeon can enter or leave a cage via a pigeon hole. c.The time slot at which arriving pigeons will depart is known 3.For any switch What is the minimum K, such that all N pigeons can be immediately placed in a cage when they arrive, and can depart at the right time?
Only one packet can enter or leave a memory at time t Intuition for Theorem Only one packet can enter a memory at time t Time = t DT=t+X DT=t Only one packet can enter or leave a memory at any time Memory
Proof of Theorem When a packet arrives in a time slot it must choose a memory not chosen by 1.The N – 1 other packets that arrive at that timeslot. 2.The N other packets that depart at that timeslot. 3.The N - 1 other packets that can depart at the same time as this packet departs (in future). Proof By the pigeon-hole principle, the switch can be work- conserving if there are 3N –1 memories, each able to perform at most one memory operation per time slot.
Memory A Parallel Shared Memory Switch C A Departing Packets R R Arriving Packets A5 A4 B1 C1 A1 C3 A5 A4 From theorem 1, k = 7 memories don’t suffice.. but 8 memories do Memory 1 K=8 C3 At most one operation – a write or a read per time slot B B3 C1 A1 A3 B1
Distributed Shared Memory Switch The central memories are distributed to the line cards and shared. Memory and line cards can be added incrementally. From theorem 1, the switch is work-conserving if we have a total of 3N –1 memories, each able to perform one operation per time slot i.e. a total memory bandwidth of 3NR. Switch Fabric Line Card 1Line Card 2Line Card N R RR Memories
Switch bandwidth What switch bandwidth does the DSM switch need in order to be work-conserving? Theorem (sufficiency) A switch bandwidth of 4NR is sufficient for a distributed shared memory switch to be work-conserving. Proof There are a maximum of 3 memory accesses and 1 external line access per time slot.
Switch Algorithm What switching algorithm allows the DSM switch to be work-conserving? 1.Shared bus: No algorithm needed. 2.Crossbar switch: Algorithm needed because only permutations are allowed. Theorem An edge coloring algorithm can switch packets for a work-conserving distributed shared memory switch Proof König’s theorem: Any bipartite graph with maximum degree has an edge coloring with colors.
Summary - Switches with 100% throughput None 2NR 2NR/kNk Maximal2NR6NR3R2N MWM NR2NR2RNCrossbarIQ None2NR 1BusShared Mem. Switch Algorithm Switch BW Total Mem BW Mem. BW # Mem.Fabric NoneNRN(N+1)R(N+1)RNBusOQ PSM C. Sets4NR2N(N+1)R2R(N+1)/kNkClosPPS - OQ C. Sets4NR 4RN C. Sets6NR3NR3RN Edge Color 4NR3NR3RN Xbar C. Sets3NR 3NR/kkBus C. Sets4NR 4NR/kNk Clos Time Reserve * 3NR6NR3R2N Crossbar PPS DSM Juniper M-series CIOQ Cisco GSR