Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Support for Cloud Services Lixin Gao, UMass Amherst.

Similar presentations


Presentation on theme: "Network Support for Cloud Services Lixin Gao, UMass Amherst."— Presentation transcript:

1 Network Support for Cloud Services Lixin Gao, UMass Amherst

2 Outline Data center networking – Design issues – Resource sharing Asynchronous computation model

3 Conventional Data Center Networks Hierarchical tree structure High speed core switches are expensive Hard to scale

4 Data Center Network Design Commodity Hardware – Server – Switch Scalable Fat tree, Dcell, Bcube, VL2, ….

5 Dpillar Structure Devices – All servers have dual-port – All switches have n-port Server and switch columns – k columns Server naming – (col, label), label Connecting rule – Servers in and, their labels differ at only

6 Design Issues Inexpensive Scale to a large number of servers Fault Tolerant Routing Load Balancing

7 Network Resource Sharing within Data Center Virtualization of CPU (Xen), memory (DiffEng), storage (SAN) Network resource can become bottleneck – Sorting and shuffling of MapReduce – Sync among tasks slows down computation – Backup of VMs Bandwidth sharing – Granularity: point-to-point or group based – Fair share: centralized vs. distributed – Privacy: public cloud vs. private cloud

8 MapReduce Model Map: generate key value pairs Reduce: aggregate values for a key from multiple sources Shuffle and sort

9 Iterative Computations PageRank Clustering BFS Youtube video suggestion Pattern Recognition

10 Synchronous Model Ease of MapReduce implementation However, – Overhead of sync operation, sorting – Slow convergence, waste of CPU, network resources – Many iterative computations can be performed asynchronously PageRank, shorest path, adsorption, link proximity estimation, belief propagation….

11 Shortest Paths 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ 3 1 4 2 5 1 5 2 2 4 3 2 3 1 4 ∞ 1 ∞ 1 map reduce

12 Shortest Paths 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ 3 1 4 2 5 1 5 2 2 4 3 2 3 1 4 Parallel execution 7 8 3 6 3 ∞ 1 ∞ 1 8 4 5 5 map reduce

13 Shortest Paths 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ 3 1 4 2 5 1 5 2 2 4 3 2 3 1 4 7 8 3 6 3 ∞ 1 ∞ 1 8 4 Parallel execution 5 5 map reduce

14 An Asynchronous Model A general framework – Eliminate synchronization – Scheduling policy Prove correctness for a wide range of applications – PageRank, Personalized PageRank – Link Proximity Estimation Commute time, Katz metric, shortest path – Bayesian Inference Scheduling policies – Top-k query

15 Shortest Path Facebook datasetSSSP-m dataset

16 PageRank Google webgraph PageRank-m webgraph

17 Conclusions Network design within data center – Design based on commodity hardware – Network resources sharing Asynchronous computation framework – Reduced bandwidth requirement – Efficient computation

18 An Example of Outage planet02.csc.ncsu.edu experiences packet loss on July 30, 2005

19 Causes of Outages Most lost packets are caused by routing outages Failure TypeLost packets fraction unknown145720.2 Routing dynamics 581110.8

20 Towards 5 Nines Reliability Exploiting redundancy on Internet Path – Multiple routing instances to ensure consistency Exploiting multiple sites within a cloud – Site selection through route monitoring – Deliver through private WAN

21 Packet Loss due to Routing Failures Failover events: 76% packets lost Recovery events: 26% packets lost Failover Recovery

22 Round-trip Delay Failover events have significant impact on packet round-trip delays. In the worst case, packet round-trip delays can be more than 900msec. FailoverRecovery

23 Reordering during Failover Events The number of reordered packets is small. However, the offset of reordered packets is large. Larger buffer sizes for real-time applications.


Download ppt "Network Support for Cloud Services Lixin Gao, UMass Amherst."

Similar presentations


Ads by Google