Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008

Similar presentations


Presentation on theme: "Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008"— Presentation transcript:

1 Dcell : A Scalable and Fault Tolerant Network Structure for Data Centers
Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008 Presented by Jinyoung Han 2009/03/26

2 Outline DCN motivation DCell Routing in DCell Experiments Results
Conclusion

3 Data Center Networking (DCN)
Provide increasingly popular online application services, such as search, s, web 2.0, and gaming, etc Data center network Network infrastructure inside a data center, which connects a large number of servers via high-speed links and switches

4 Observations for DCN Ever increasing scale Network capacity
Google has 450,000 servers (in 30 data centers) in 2006 Microsoft doubles its number of servers in 14 months The expansion rate exceeds Moore’s Law When data centers scale, failures become the norm Using high-end switches/routers to scale up is costly Network capacity Many bandwidth hungry data-centric applications MapReduce (Google) File replication in GFS (Google File System)

5 Design goals for DCN Scalable to a large number of servers
Fault tolerant Provide high network capacity

6 Interconnection Structure for Data Centers
Existing tree structure does not scale Expensive high-end switches to scale up Single point of failure and bandwidth bottleneck ? New network structure -> DCell

7 DCell Ideas #1: Use mini-switches to scale out
#2: Leverage servers be part of the routing infrastructure Servers have multiple ports and need to forward packets #3: Use recursion to scale and build complete graph to increase capacity

8 DCell structure DCell is a recursively defined structure DCell_0
Have n servers and a mini-switch. All servers are connected to switch. DCell_1 n+1 Dcell_0s 2-tuple [a1, a0], a1 for level 1 / a0 for level 0 two servers with 2-tuples [i, j − 1] and [j, i] are connected with a link for every i and every j > i DCell_k Recursively defined

9 DCell: the Construction
n=2, k=2 DCell_1 n=2, k=1 Dcell_0 Server Mini-switch n servers in a DCell_0 n=2, k=0

10 DCell: The Properties Scalability: The number of servers scales doubly exponentially as the node degree increases For example, number of servers in a DCell_0 is 8 (n=8) and the number of server ports is 4 (i.e., k=3) -> N=27,630,792 Fault-tolerance: The bisection width is larger than Bisection width Minimal number of links to be removed to partition a network into two parts of equal size Large bisection implies high network capacity and fault-tolerance

11 DCell Routing src n1 n2 dst

12 DCell Routing (cont.) Network diameter: The maximum path length using DCellRouting in a DCellk is at most But: DCellRouting is NOT a shortest-path routing is NOT a tight diameter bound for DCell The mean and max path lengths of shortest-path and DCellRouting n k N Shortest-path DCellRouting Mean Max 4 2 420 4.87 7 5.16 5 930 5.22 5.50 6 1806 5.48 5.73 3 176,820 9.96 15 11.29 865,830 10.74 11.98 3,263,442 11.31 12.46 Yet: DCellRouting is close to shortest-path routing

13 DFR: DCell Fault-tolerant Routing
Design goal Support millions of servers Advantages to take DCellRouting and DCell topology

14 DFR: DCell Fault-tolerant Routing
src dst m1 m2 n2 n1 r1 DCellb i1 i2 p1 q2 i3 DCellb p2 q1 Proxy Proxy s2 s1 Servers in a same share local link-state

15 Experimental Results: Testbed
DCell1: 20 servers, 5 DCell0s DCell0: 4 servers Ethernet wires 8-port mini-switches, 50$ each

16 Experimental Results: Fault Tolerance
DCell fault-tolerant routing can handle various failures Link failure Server/switch failure Rack failure Link failure Server shutdown

17 Experimental Results: Network Capacity
All to all traffic: each server sends 5GB file to every other servers

18 Summary DCell: Benefits: Use commodity mini-switches to scale out
Let servers be part of the routing infrastructure Use recursion to reduce the node degree and complete graph to increase network capacity Benefits: Scales doubly exponentially High aggregate bandwidth capacity Fault tolerance Cost saving


Download ppt "Chuanxiong Guo, et al, Microsoft Research Asia, SIGCOMM 2008"

Similar presentations


Ads by Google