CMPE 252A : Computer Networks Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 12
Jellyfish: Networking Data Centers Randomly Paper by Ankit Singla, et.al . NDSI 2012. Some figures are from slides presented by Chi-Yao Hong, UIUC.
Facebook Amazon ‘Add capacity On a DAILY BASIS’ Datacenter is useful, Realistic world. Growing. http://news.netcraft.com/archives/2013/05/20/amazon-web-services-growth-unrelenting.html
Fat-Tree Topology Incremental growth??
Structured networks Hypercube, Fattree
Fat tree: Structure VS Limit N_switches: 3-level Fat tree : 5k2/4 for fat tree using k-port switches 24-port 3456 hosts 32-port 8192 hosts 48-port 27648 hosts What for 10000 hosts? Over utilize? Leave unused ports? Over utilize -> congection. Leaf of the trees. Modify. Leave unused, -> huge cost.
No structure = no restriction Goals Bandwidth & Capacity Better VM Placement Reduce Traffic Better Topology Avoid Bottleneck Robustness Failure resistance Flexbility: Incremental Expansion Easy to add VM Easy to remove VM Let’s consider this : What are the goals when building a datacenter. Jellyfish is based on the intuition that if we do not have a strict topology structure, When modifying the network, we do not to follow the rules, that makes high flexibility. In the later slides we will know that this no structure topology also provides high bandwidth. No structure = no restriction
Jellyfish : no structure Propose, topology of Jelly fish, The Jellyfish approach is to construct a RG at the top-of-rack switch layer. Severs connected on the broader of the random graph.
Virtualized jellyfish topology Each Switch has 12 neighbor switches. What does it look like? (click) Large core, and antennaes.. Topology of jellyfish networks for 432 severs, 180 switches, degree = 12
Random graph Regular Graph Random Regular Graph RG(n,r) Each vertex has the same degree r Random Regular Graph Random sampled from all RG(n,r) Hard to generate Question: How to generate? Before introducing the jellyfish details , We first introduce some concepts. (click) namely RG(n,r) A Random .. Means a random selected graph from all RG(n,r)s NP -hard
Not-so-uniform Random-RG(n,r) :: RRG(n,r) Procedure to modify RRG(n-1,r) to RRG(n,r) r=3 RRG(4,3) RRG(5,3) Actually use a not-so-uniform RG, A little different from the paper. R Refers to k-r demonstrate…. Required that RRG(n,r) has at least r+1 nodes, full graph.
Goals Bandwidth & Capacity Incremental Expansion Better VM Placement Reduce Traffic Better Topology Avoid Bottleneck Incremental Expansion Easy to add VM Easy to remove VM 接下来用一些examples 来show是一个better
About the Evaluation bisection bandwidth: Theoretical calculation for RRG, Bollobas’ theoretical lower bounds Throughput: random permutation traffic Each host choose one to send (at full speed)
Jellyfish VS LEGUP LEGUP attemps to maximize the bandwith, optimizes for bisection bandwidth, The drop … occurs because the number of servers increase in that step.
Vs. FatTree Bisection bandwidth Jellyfish: larger B-bandwidth using same # switches & servers Jellyfish: more servers under the same B-bandwidth and # switches
Lower cost
Better failure resilience
Larger Throughput Shows the number of servers at full throughput, under the assumption that: optimal routing, Caculated, not simulated.
Jellyfish vs. Small World Small World 3D Hexagon Torus (5 reg + 1 rand) Smallworld: grid + random Small World Ring (2 reg + 4 rand) Small World 2D Torus (4 reg + 2 rand)
Jellyfish has higher capacity than the (same equipment) small world data center topologies [41] built using a ring, a 2D-Torus, and a 3D-Hex-Torus as the underlying lattice. Results are averaged over 10 runs It is not clearly shown in the paper what does this 1 mean, but I’m think this normalized throughput refers to the ration that, Jellyfish throughput divided by the total server bandwitdth
Reason of better performance
redundancy
Better than jellyfish ??? More hosts using same # of switches? Connecting more switches , each of which has same # ports, (limit the diameter) How many switches can be connected , with 3 switch-to-switch ports , and switch-to-switch path length <= 2? Petersen Graph After knowing that Jellyfish makes less redundancy than Fat Tree, while it does not produce congestion, We might ask, can we do better? Can we link more hosts using the same # of switchs? That is to say that we are trying to build a larger network topology, In the larger network, each switch has the same # of ports as switches in jellyfish. And the routing cost , which can Be indicated by the diameter of the graph, does not increase. More mathematical question, make it specific, combinational mathematics, 10.
Degree-diameter-graph Generating a large delta-
Degree-Diameter Graph have (nearly) highest throughput Jellyfish is only little bit worse.
But… Practical constraint: Routing / Congestion Control Cable
Routing & Congestion Control Utilize capacity without structure no layers! Routing : ECMP: fail to provide large path diversity K shortest path: Congestion Control TCP/ multipath TCP If all available capacity is fully utilized,
O(k2N*ShortestPath(N)) K-shortest path Different Path S-e1-e2-e3-…ex...-en-T S-e1-e2-e3-…ey…-em-T Algorithm to find 2nd-shortest path: Find a shortest path P from S to T in G For each e in P …Remove e from G …Calculate shortest path on G , namely SP(e) …add e back to Graph Return min(SP(e)) O(k2N*ShortestPath(N))
K-shortest path forwarding Shortest Paths (S,T): SAB1C1DT, SAB2C2DT, SAB3C2DT, B (B1,T) C1 A C S D A T (S,T) A (B4,T) C2 A choose a random node in the routing table to forward, E.g, A Interestingly, package in from S to T, when goes though B, can be forwarded backward to A, This is because the graph is not so well-connected, But a random regular graph is theoretically well connected, and degrees are much larger, Here we draw thick edges, that means the edge (S,A) is on three paths, while C2-D on 2. Namely , for the links between switches, if some links can be used on different sessions, that means different connect pairs can use the same link, Then we can utilize the link more. (A,T) B1 B2 B3
About 1000 edge is simultaneously on at most 1 routing paths. Inter-switch link’s path count in ECMP and k-shortestpath routing for random permutation traffic at the server-level on a typical Jellyfish of 686 servers. For each link, we count the number of distinct paths it is on.
Multi Path TCP (MPTCP) http://blogs.citrix.com/2013/08/23/networking-beyond-tcp-the-mptcp-way/
Packet simulation results for different routing and congestion control protocols
cabling Jellyfish uses 20% less # cables ,
Cabling in large data centers Topology generated automatically, Cables connected manually.. ( 10% of cost) Error detect : link-layer discovery protocol.
Jellyfish of Jellyfish Restrict some connections in pod Result: 2-layered random Graph
Jellyfish of Jellyfish Restrict some connections in pod Result: 2-layered random Graph
Cables between pods can be aggregated
Conclusion Bandwidth & Capacity Incremental Expansion Lower Cost Limitation: slow to compute forwarding paths. Large forwarding tables. Enough, even higher than fattree.
Space Shuffle: A Scalable, Flexible, and High-Bandwidth Data Center Network Ye Yu and Chen Qian
Motivation: Goals of Data Center Design High-bandwidth Data center applications generates high internal & external communication Flexibility Adding servers and expanding network bandwidth incrementally. Scalability Routing and Forwarding should rely on small forwarding state.
Motivation: Existing Data Center Architectures Network Bandwidth Incremental Growth (Flexible) Forwarding State per switch (Scalability) FatTree [SIGCOMM’ 08] Good No Fixed SWDC [SOCC’11] Fair Yes Constant Jellyfish [NSDI’12] Better than FatTree & SWDC Large and grows fast No shortest paths. Does not support multipath well. Greedy Routing Random Interconnection K-shortest path routing is inefficient. Big forwarding state.
Motivation: Goal of Space Shuffle (S2) How to build a flexible data center architecture that achieves high-throughput and scalability ? Approach: Greedy routing on random interconnection. Challenges: How to build a random interconnection that enables greedy routing? How does the greedy routing protocol achieve high-throughput and near-optimal path length?
Outline Motivation Space Shuffle Data Center Topology The Routing Protocol in Space Shuffle Data Center Discussion & Evaluation
S2 Topology Construction -Assign Servers Servers and Top-Of-Rack switches. Uniformly assign servers to switches. Connect servers to switches. The rest ports are used for inter-switch connections.
S2 Topology Construction: -Virtual Coordinates
S2 Topology Construction: -Virtual Spaces Switch ID Coor. 1 Coor. 2 A 0.05 0.17 B 0.13 0.62 C 0.23 0.91 D 0.36 0.42 E 0.53 F 0.51 0.58 G 0.63 0.73 H 0.78 0.26 I 0.97 A B C D E F G H I A B I D E F G H C Space 2 Space 1
S2 Topology Construction: -Connect the switches A G A B C D E F G H I H B F D E A B C D E F G H I Space 1 Space 2 A switch is physically connected to switches that are adjacent to itself in at least one space
S2 Topology Construction: -Connect the switches A I B H C D G F E
S2 Topology Construction: -Deploy-as-a-whole Construction Step 1 Assign hosts / switches Step 2 Generate coordinates (randomly) Step 3 Wire the network according to the coordinates.
S2 Topology Construction: -Incremental Construction Add a new switch T into existing S2 network Assign coordinate for T. For each space: Place T on the circle Find the switch SL and SR on the left/right side of T Disconnect SL,SR Connect T,SL; Connect T,SR SR T SL
Outline Motivation Space Shuffle Data Center Topology The Routing Protocol in Space Shuffle Data Center Discussion & Evaluation
Routing Protocol in S2: -Routable Address Step 1 Step 2 S2 uses greedy routing and greedy forwarding. In S2, the switches decide which port to forward the packet by estimating the distance between the destination and the possible next-hop switches. The key of greedy routing is how to represent the destination and how to estimate the distance to the destination. The routable add of a packet to hosth is defined as a pair , xh and idh, Where xh, (the switch that connected to h) idh, The routing protocol of s2 is defined as follow. 1 greedylite route the pkt to …. 2 the …
Routing Protocol in S2 -Definition of Distance CD(0.05,0.23) = |0.23-0.05| = 0.18 CD(0.17,0.91) = 0.17+(1-0.91) = 0.28 MCD2(A,C) = min(0.18,0.28)= 0.18 Switch Coor. 1 Coor. 2 A 0.05 0.17 C 0.23 0.91 A C A C
Routing Protocol in S2 -Forwarding Decision using MCD Switch MCD to the destination H 0.35 A 0.18 D 0.13 G 0.19 I 0.06 The switch with minimum MCD to the destination gets the packet Minimum of Minimum CD: Greediest
Routing Protocol in S2 -Multipath Next-hop candidates: all neighbor switch with smaller MCD to the destination than current. It provides enough path diversity by doing such selection only on the first switch of the path. Switch MCD to the destination Current 0.3 Neighbor 1 0.5 Neighbor 2 0.1 Neighbor 3 0.2 Neighbor 4 0.4 the packet goes to the destination as long as MCD decreases
Routing Protocol in S2: -Balanced Random Coordinates More traffic on links with small end-to-end MCD values. Uniformly distributed coordinates improves load balancing. Pure random generator may produce crowded coordinates. May lead to heavy-loaded links and hurt the network performance. Balanced Random Coordinate Generator avoids crowded coordinates. Provides better link-fairness & better network performance
Outline Motivation Space Shuffle Data Center Topology The Routing Protocol in Space Shuffle Data Center Discussion & Evaluation
Evaluation Topology property Routing efficiency Practical throughput
Evaluation -Topology Property S2 and Jellyfish: Flexible FatTree: Fixed Bisection bandwidth # of switches S2 & Jellyfish topologies share similar theoretical throughput, better than FatTree.
Evaluation -Routing Table Length 10 inter-switch ports
Evaluation -Routing Path Length SWDC: long routing paths, lower throughput. S2: near-optimal routing paths Jellyfish: optimal paths , expensive 12 inter-switch ports
Evaluation -Practical Throughput Greedy routing of S2 exploits the path diversity. S2 achieves near-Jellyfish throughput. S2 & Jellyfish both outperform SWDC 250-switch 500-host network
Comparing S2 with Jellyfish Construction Coordinates Ring Topology Generate ‘Almost’ Random Regular Graph Routing Greediest K-shortest path Hard to fit a Jellyfish topology into a routable coordinate space
Key-based Routing: -Definition Key-value stores https://www.facebook.com/photo.php?fbid= 677700648959984 Key-based Routing: route to the destination using the key of the content. (Not necessarily to know the IP) IP-based Routing: IP of the destination.
Key-based Routing: -Delivery Guarantee For any destination coordinate X, greediest routing will route the packet to a switch S, S is closest to X in at least one space. Solution: Keep one replica in each of fist r spaces and route using MCDr , r <=L For data a with key Ka, use global hash function H to calculate the destination coordinate X=H(Ka) In each of the r spaces, the access switch of the server for a is selected using global hash function H(Ka)
S2 Topology Construction- -Overview H servers and N Top-Of-Rack switches. Uniformly assign switches to servers. Generate Virtual Coordinates of switches. Connect the switches according to the coordinates, using the rest ports. (x1,x2,…) The rest ports are used for inter-switch connections
Summary High-bandwidth Flexibility Scalability S2 demonstrate high-bandwidth and high network throughput. Flexibility S2 supports incremental construction. Scalability Greedy routing in S2 only requires constant size of routing state.
Thank you! Q & A