Presentation is loading. Please wait.

Presentation is loading. Please wait.

TeXCP: Protecting Providers’ Networks from Unexpected Failures & Traffic Spikes Dina Katabi MIT - CSAIL nms.csail.mit.edu/~dina.

Similar presentations


Presentation on theme: "TeXCP: Protecting Providers’ Networks from Unexpected Failures & Traffic Spikes Dina Katabi MIT - CSAIL nms.csail.mit.edu/~dina."— Presentation transcript:

1 TeXCP: Protecting Providers’ Networks from Unexpected Failures & Traffic Spikes Dina Katabi MIT - CSAIL dk@mit.edu nms.csail.mit.edu/~dina

2 Crash Course in XCP TeXCP is TE with an XCP-Like Protocol

3 Problem Addressed by XCP: TCP has trouble providing a few Gb/s per-flow throughput

4 Gigabit Links Need to increase faster

5 Kb/s Link How much faster? Need to increase faster  Need Explicit Feedback

6 Need to increase faster Need to coordinate connections! Scalability Constraint: Routers should not maintain per-connection state Scalability Constraint: Routers should not maintain per-connection state  Need Explicit Feedback How much faster?

7 XCP

8 Feedback Round Trip Time Throughput Congestion Header Feedback Round Trip Time Throughput How does XCP Work? Feedback = + 1 packet/sec

9 Round Trip Time Throughput Feedback = - 3 packet/sec How does XCP Work?

10 Rate = Rate + Feedback Explicit Feedback Make senders react according to the amount of spare capacity Explicit Feedback Make senders react according to the amount of spare capacity How does XCP Work?

11 How Does an XCP Router Compute the Feedback? 1. Efficiency Controller 2. Fairness Controller Router makes decisions every control interval D

12 How Does an XCP Router Compute the Feedback? Efficiency Controller Fairness Controller Goal: Divides  between connections to converge to fairness Algorithm: If  > 0  Divide  equally between connections If  < 0  Divide  between connections proportionally to their current rates (shown to achieve Fairness [Jain]) Goal: Matches input traffic to link capacity & drains the queue Algorithm: Aggregate traffic changes by   ~ Spare Capacity  ~ - Queue Size So,  =  Spare -  Queue/D MIMD AIMD

13  =  d avg Spare -  Queue Theorem: System converges to optimal utilization (i.e., stable) for any link capacity, delay, number of sources if: (Proof based on Nyquist Criterion) Getting the devil out of the details … Efficiency Controller Fairness Controller No Parameter Tuning No Parameter Tuning Algorithm: If  > 0  Divide  equally between connections If  < 0  Divide  between connections proportionally to their current rates Need to estimate number of connections N D: Control/Counting Interval No Per-Flow State

14 Avg. Utilization XCP Remains Efficient as Bandwidth or Delay Increases Bottleneck Capacity (Mb/s)Round Trip Delay (sec) Utilization vs. Delay XCP increases proportionally to spare capacity  and  chosen to make XCP robust to delay Utilization vs. Capacity

15 TeXCP: Intra-Domain Traffic Engineering with XCP with Srikanth Kandula, Asfandyar Qureshi, Shan Sinha

16 What is the problem? Provider Network (AT&T, Sprint, BBN, ….)

17 What is the problem? L.A. Traffic Engineering routes the traffic demands of IE pairs to achieve good network performance? Boston Ingress Egress

18 Good performance means: Network is robust to unexpected events: o Link failures o Traffic Spikes Support as much demands as possible given capacities Minimize the maximum link utilization (i.e., load balancing) Minimize the maximum link utilization (i.e., load balancing)

19 To minimize Max U  Multi paths L.A. Boston Two issues: - How much traffic to put on each path? - How does TCP interact with multi-path routing? Two issues: - How much traffic to put on each path? - How does TCP interact with multi-path routing? Two issues: - How much traffic to put on each path? - How does TCP interact with multipath routing? Two issues: - How much traffic to put on each path? - How does TCP interact with multipath routing?

20 How much traffic to put on each path? Repeatedly o Measure the utilization of all paths between L.A. and Boston o Move some traffic from the highly utilized path to the underutilized path Do it in the network!

21 How to divide traffic between available paths to minimize max U? Simplistic solution: give all information to a centralized computer and solve it as a linear programming problem But that wouldn’t be realtime and wouldn’t react to failures and attack traffic Need a distributed, in network, realtime solution

22 Feedback Delays Challenge 1:

23 Feedback Delays Challenge 1: L.A. Boston Queue Utilization ? Utilization feedback might be obsolete

24 Coordination Without Global Knowledge Challenge 2:

25 Coordination Without Global Knowledge Challenge 2: L.A. Boston SF NYC

26 Coordination Without Global Knowledge Challenge 2: L.A. Boston SF NYC Actions of uncoordinated ingress nodes might cause undesirable effect

27 Use experience from congestion control Congestion control & TE are close o CC: single path; want 100% utilization o TE: multi-paths; want balanced utilization XCP tells us how fast each path can change its utilization so that system is stable despite delay XCP tells us how to coordinate multiple senders who share the same path Solution Idea: TeXCP: traffic engineering with an XCP- like protocol

28 TeXCP TeXCP agent knows the paths between IE, which are computed offline Paths are pinned A TeXCP agent per IE, at ingress node L.A. Boston TeXCP Agent

29 TeXCP divides the IE traffic between available paths to Minimize the Max U? A TeXCP agent 3 components o Probe path utilization o Load balancer o Per-path XCP controller

30 Periodically, send probes on each path to learn Max utilization along the path Probe Path Utilization Component 1: Ingress 1 Egress 1 x U1 = 0.4 U2 = 0.7 U1 = 0.4 U2 = 0.7 Probes are sent along the slow path like ICMP messages  implementation is easy

31 Load Balancer Component 2: Objective: Balance utilization across IE paths How? move traffic from overutilized paths to under-utilized paths o Continuously estimate demands L(t) o Assume x p is fraction of traffic that should be sent on path p o Periodically, update x p o Where r p is the sending rate on path p and u p is its utilization

32 Per-Path XCP Controller Component 3: Load Balancer tells us how much to change utilization, but o Can’t increase/decrease immediately because of delays o Need to coordinate the increase/decrease from all ingress nodes Run a lightweight XCP on each path o Replace congestion header with probes along the slow path o Change the slow path in core routers to answer probes

33 TeXCP Performance

34 Simple Topology Traffic demands per IE pair are modeled using 100 Pareto on-off sources, or Poisson arrivals Also, 50% of the traffic is uncontrollable by TeXCP Ingress 1 Ingress 2 Ingress 3 Egress 1 Egress 2 Egress 3 L1 L2 L3

35 TeXCP balances the load across available paths, and reacts to demands change in realtime Time (sec) Utilization Decrease in Uncontrollable Traffic Started at t = 0 with Load1 =1.3 Load2=0.8 Load3=0.8

36 TeXCP is More Accurate than Previous Approaches Time (sec) Utilization Time (sec) TeXCP MATE

37 Real Traffic and Provider Topology

38 OSPF Optimizer The utilization of all Abilene links as function of time TeXCP Improves Robustness Against Link Failures Link Down Link Up With TeXCP Link Down Link Up

39 TeXCP Improves Robustness Against Link Failures Without TeXCP With TeXCP Link Down Link Up TeXCP prevents congestion caused by link failures Max Utilization

40 When multiple solutions results in the same Max U, TeXCP prefers solutions with shorter paths

41 TeXCP Prefers Shorter Paths Ingress2 Ingress1 TeXCP prefers low delay paths while balancing utilization

42 How about TCP? Multi-paths must NOT reorder TCP packets Problem: Can you take traffic at any backbone router and accurately split it between multiple paths without reordering TCP packets? Problem: Can you take traffic at any backbone router and accurately split it between multiple paths without reordering TCP packets?

43 Simplistic Solution Assign TCP flows to each path proportionally to the desirable split 1. Flows are not all equal: Elephants & Mice 2. So, estimate the rate of each TCP 3. But rates change over with time 4. Too complex

44 Our Approach If pkt enters network after the previous pkt from the flow has left  We can reassign the flow to a different path Use UDP traffic to correct for errors TCP flow 2 1

45 Our Approach Each TeXCP agent keeps a Hash Table If (now-last_seen) > Max delay, we can change the assigned path Assign a TCP flow to a new path with a probability equals to the fraction of traffic you want on this path Last Seen Path 9920.2659 3

46 OC12 traffic Split it between 2 paths, with desirable splitting changing with time as a sinusoidal wave Path1 Fraction Path2 Fraction

47 OC12 traffic Split it between 2 paths, with desirable splitting changing with time as a sinusoidal wave Path1 Fraction Path2 Fraction Path1 UDP Path2 UDP

48 Very, Very Cheap. Edge routers maintain a hash table of 2 10 entries (10KB).

49 Conclusion TeXCP o Adaptive multipath routing protocol o Makes the network more robust against link failures and traffic spikes o Note TeXCP does not assume XCP Multipath routing can be easily and effectively implemented without causing TCP packet reordering

50 More Information at: http://nms.lcs.mit.edu/~dina/texcp.html Questions?

51 TeXCP Reacts in Real-Time to Traffic Spikes Max Utilization Time (sec) No TeXCP Optimal TeXCP Spike begins

52 Comparison with MATE Ingress 1 Ingress 2 Ingress 3 Egress 1 Egress 2 Egress 3 Mate Minimize delay Conservative increase

53 Link load / capacity Comparison with MATE Link load / capacity Time (sec) TeXCP’s utilization is more balanced Avg. drop rate in MATE is 2% while in TeXCP is 0% TeXCP’s utilization is more balanced Avg. drop rate in MATE is 2% while in TeXCP is 0% decrease in cross traffic TeXCP MATE Time (sec)

54 XCP Deals Well with Short Web-Like Flows Arrivals of Short Flows/sec AverageUtilization AverageQueue Drops

55  =  d avg Spare -  Queue Theorem: System converges to optimal utilization (i.e., stable) for any link bandwidth, delay, number of sources if: (Proof based on Nyquist Criterion) Efficiency Controller Fairness Controller Stability Properties Stability Properties Algorithm: If  > 0  Divide  equally between flows If  < 0  Divide  between flows proportionally to their current rates Need to estimate number of connections N D: Counting Interval No Per-Connection State Traffic rate change by  every d avg Rate r(t) changes per time unit by


Download ppt "TeXCP: Protecting Providers’ Networks from Unexpected Failures & Traffic Spikes Dina Katabi MIT - CSAIL nms.csail.mit.edu/~dina."

Similar presentations


Ads by Google