Presentation is loading. Please wait.

Presentation is loading. Please wait.

Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research.

Similar presentations


Presentation on theme: "Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research."— Presentation transcript:

1 Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research

2 Why are Data Centers Important?  Host a variety of applications  Web portals  Web services  Multimedia applications  Chat/IM  DC performance  app performance  Congestion impacts app latency

3 Why are Data Centers Important?  Poor performance  loss of revenue  Traffic engineering is crucial

4 Contributions  Study effectiveness of current approach  Developed guidelines for ideal approach  Designed and implemented MicroTE  Showed effectiveness in mitigating congestion

5 Road Map  Background  Traffic Engineering in data centers  Design goals for ideal TE  MicroTE  Evaluation  Conclusion

6 Options for TE in Data Centers?  Current supported techniques  Equal Cost MultiPath (ECMP)  Spanning Tree Protocol (STP)  Proposed  Fat-Tree, VL2  Other existing WAN techniques  COPE,…, OSPF link tuning

7 How do we evaluate TE?  Simulator  Input: Traffic matrix, topology, traffic engineering  Output: Link utilization  Optimal TE  Route traffic using knowledge of future TM  Data center traces  Cloud data center (CLD)  Map-reduce app  ~1500 servers  University data center (UNV)  3-Tier Web apps  ~500 servers

8 Draw Backs of Existing TE  STP does not use multiple path  40% worst than optimal  ECMP does not adapt to burstiness  15% worst than optimal

9 Draw Backs of Proposed TE  Fat-Tree  Rehash flows  Local opt. != global opt.  VL2  ECMP on VLB architecture  Coarse grained flow assignment  VL2 & Fat-Tree do not adapt to burstiness

10 Draw Backs of Other Approaches  COPE …. OSPF link tuning Requires hours or days of predictable traffic Traffic unpredictable over 100 secs (Maltz. et al) Ingress Egress x

11 Design Goals for Ideal TE

12 Design Requirements for TE  Calculate paths & reconfigure network  Use all network paths  Use global view  Avoid local optimals  Must react quickly  React to burstiness  How predictable is traffic? ….

13 Is Data Center Traffic Predictable?  YES! 27% or more of traffic matrix is predictable  Manage predictable traffic more intelligently 99% 27%

14 How Long is Traffic Predictable?  Different patterns of predictability  1 second of historical data able to predict future 1.5 – 5.0 1.6 - 2.5

15 MicroTE

16 MicroTE: Architecture  Global view:  Created by network controller  React to predictable traffic:  Routing component tracks demand history  All N/W paths:  Routing component creates routes using all paths Monitoring Component Routing Component Network Controller

17 Architectural Questions  Efficiently gather network state?  Determine predictable traffic?  Generate and calculate new routes?  Install network state?

18 Architectural Questions  Efficiently gather network state?  Determine predictable traffic?  Generate and calculate new routes?  Install network state?

19 Monitoring Component  Efficiently gather TM  Only one server per ToR monitors traffic  Transfer changed portion of TM  Compress data  Tracking predictability  Calculate EWMA over TM (every second)  Empirically derived alpha of 0.2  Use time-bins of 0.1 seconds

20 Routing Component Install routes Calculate network routes for predictable traffic Set ECMP for unpredictable traffic Determine predictable ToRs New Global View

21 Routing Predictable Traffic  LP formulation  Constraints  Flow conservation  Capacity constraint  Use K-equal length paths  Objective  Minimize link utilization  Bin-packing heuristic  Sort flows in decreasing order  Place on link with greatest capacity

22 Implementation  Changes to data center  Switch  Install OpenFlow firmware  End hosts  Add kernel module  New component  Network controller  C++ NOX modules

23 Evaluation

24 Evaluation: Motivating Questions  How does MicroTE Compare to Optimal?  How does MicroTE perform under varying levels of predictability?  How does MicroTE scale to large DCN?  What overheard does MicroTE impose?

25 Evaluation: Motivating Questions  How does MicroTE Compare to Optimal?  How does MicroTE perform under varying levels of predictability?  How does MicroTE scale to large DCN?  What overheard does MicroTE impose?

26 How do we evaluate TE?  Simulator  Input: Traffic matrix, topology, traffic engineering  Output: Link utilization  Optimal TE  Route traffic using knowledge of future TM  Data center traces  Cloud data center (CLD)  Map-reduce app  ~1500 servers  University data center (UNV)  3-Tier Web apps  ~400 servers

27 Performing Under Realistic Traffic  Significantly outperforms ECMP  Slightly worse than optimal (1%-5%)  Bin-packing and LP of comparable performance

28 Performance Versus Predictability  Low predictability  performance is similar to ECMP

29 Performance Versus Predictability  Low predictability  performance is similar to ECMP  High predictability  performance is comparable to Optimal  MicroTE adjusts according to predictability

30 Scaling to Large DC  Reactiveness to traffic changes  Gathering network state  Bounded by network transfer time  Route computation  Flow installation  Bounded by switch installation time DC Size (# servers) LPHeuristic 2K0.04s0.007s 4K0.6s0.055s 8K6.85s0.25s 16K-0.98s

31 Other Related Works  SPAIN/MultiPath TCP  Use multiple paths  better DC utilization  Eliminating congestion isn’t goal  Helios/Flyways  Require augmenting existing DC  Hedera/Fat-Tree  Require forklift upgrade: full bisection

32 Conclusion  Study existing TE  Found them lacking (15-40%)  Study data center traffic  Discovered traffic predictability (27% for 2 secs)  Developed guidelines for ideal TE  Designed and implemented MicroTE  Brings state of the art within 1-5% of Ideal  Efficiently scales to large DC (16K servers)

33 Thank You  MicroTE adopts to low level properties of traffic.  Can we develop techniques that cooperate with applications?


Download ppt "Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research."

Similar presentations


Ads by Google