Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN ’ 05 / May 10, 2005.

Similar presentations


Presentation on theme: "Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN ’ 05 / May 10, 2005."— Presentation transcript:

1 Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN ’ 05 / May 10, 2005

2 TMH - GAN'05, 05/10/2005 2 Outline Background Problem Description Methodology Experiments and Results Conclusion and Future Works

3 TMH - GAN'05, 05/10/2005 3 “Are we there yet?” When you need Throughput Prediction?  File download: xx minutes left: MS IE vs. Mozilla  Mirror site selection: Knoppix: Florida State Univ. (fsu.edu) or TU Ilmenau, Germany (tu-ilmenau.de)  Resource selection in a grid environment  Cache selection for web content delivery services

4 TMH - GAN'05, 05/10/2005 4 Which site will give the best throughput? Current approaches and tools:  Geographical distance  Ping (ICMP)  Download 512 KBytes (fixed size) – NWS / iperf  Download 10 seconds (fixed duration) - iperf Last two approaches are most accurate:  How much data to download / How long? Is “Bandwidth * Delay” the answer? One size fits all?  “All or nothing” – no result is available until the end of transmission

5 TMH - GAN'05, 05/10/2005 5 Problem Description Predicted future throughput can be used in mirror/replica site selection Predict throughput of a TCP bulk transfer  Single TCP stream  Input: Time Series of (Arrival time, Bytes received)  Output: Predicted future throughput  Make a prediction of future throughput after 10 ~ 100 RTTs  Utilize knowledge of TCP flow patterns  Assume TCP flow patterns will repeat later in the same TCP stream

6 TMH - GAN'05, 05/10/2005 6 TCP Flow Patterns (a) Rate Control(b) Congestion Control (c) Rate Control with delay(d) Mixed Congestion Control Textbook Examples: In Reality:

7 TMH - GAN'05, 05/10/2005 7 Approach to Throughput Prediction Analyze Time-Series (TS 1 ) of (Arrival Time, Bytes received) to get a meaningful throughput Time-Series  Possible solutions: Instant throughput: throughput since previous TCP segment Fixed Interval throughput: avg throughput over a fixed time period Per RTT throughput: partition using fixed SYN-ACK RTT  Idea: TCP sends a window full of data segments every RTT Partition Time-Series (TS 1 ) with fixed SYN-ACK RTT, and get per RTT Throughput (TS 2 ) Analyze per RTT Throughput Time-Series (TS 2 ) to predict future throughput Compare different prediction methods across all traces

8 TMH - GAN'05, 05/10/2005 8 TCP Segment Partitioning (1) Over 1 GBytes/sec About 220 Bytes/sec Instant throughput shows wide-range of fluctuation. Log Scaled Fixed Interval throughput shows less fluctuation. 121 KB/sec 40 KB/sec Fixed Interval of 100 ms SYN-ACK RTT = 176 ms per RTT Throughput

9 TMH - GAN'05, 05/10/2005 9 RTT estimation  Use fixed SYN-ACK RTT  Simple and effective Partition TCP segments into per RTT throughput time series TCP Segment Partitioning (2) SYN ACK RTT

10 TMH - GAN'05, 05/10/2005 10 Throughput Prediction (1) TCP Patterns  Rate Control limited (RC)  Congestion Control limited (CC) Identify basic elements  Flat regions  Exponential Climb regions  Linear Climb regions  Drop points Drop points Flat Linear ClimbExponential Climb

11 TMH - GAN'05, 05/10/2005 11 Throughput Prediction (2) Peak of slow start  Data points up to end of 1 st slow start are ignored for prediction initial slow start does not repeat RC-based prediction  Use flat regions CC-based prediction  Use complete CC cycles Window-based prediction  If no clear pattern observed Peak of slow start

12 TMH - GAN'05, 05/10/2005 12 Experiments (1) - Setup Download data files from 290 web sites (Debian/Gentoo mirrors)  Use TCPDUMP to capture receiver ’ s traffic  Record SYN-ACK RTTs  Include Retransmitted packets (0.09%)  Average file size is 30 MBytes 461 traces collected at Univ. of Houston Traces are analyzed using perl scripts

13 TMH - GAN'05, 05/10/2005 13 Experiments (2) – Prediction Methods Prediction methods compared  Moving Average (MA) – avg throughput of previous 10 RTTs  Exponential Weighted Moving Average (EWMA)  Aggregate throughput – average past throughput (same as cumulative average); use this as predicted throughput  TCP Pattern prediction Average error in predicted future throughput  Cut off at 100% if over, in case measured future throughput is very small predicted throughput – measured throughput measured throughput x 100%

14 TMH - GAN'05, 05/10/2005 14 TCP Throughput Prediction: average throughput of 9~25 RTTs (RC-based prediction) Aggregate Throughput Prediction: average throughput of 0~25 RTTs Window size (in RTTs) Illustration of Prediction (1) Make a prediction for next 200 RTTs: Peak of slow start TCP Throughput Prediction: using Window-based prediction after 27 th RTTs (a significant drop) Drop at 27 th RTT Prediction at 25 th RTT Prediction at 40 th RTT per RTT throughput Aggregate TCP Pattern 25 th RTT40 th RTT

15 TMH - GAN'05, 05/10/2005 15 Window size (in RTTs) Illustration of Prediction (2) Make a prediction for next 200 RTTs: Avg error against measured future throughput of next 200 RTTs (for example, at 20 th RTT, avg throughput of 21~220 RTTs is used) Closer to 0, better the prediction. per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern

16 TMH - GAN'05, 05/10/2005 16 Illustration of Prediction (3) Make a prediction for next 200 RTTs: Throughput prediction using Congestion-Control based patterns. Prediction made at 65 th RTT using 3 CC complete cycles One complete CC cycle Closer to 0, better the prediction. per RTT throughput Aggregate TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern

17 TMH - GAN'05, 05/10/2005 17 Aggregate is not accurate for small window size (< 30 RTTs) Results (1) – predict next 200 RTTs at different time MA / EWMA generally not as accurate 30 th RTT per RTT throughput Aggregate TCP Pattern per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern

18 TMH - GAN'05, 05/10/2005 18 Results (2) – predict at 15 th RTT for different time in the future When only limited data is available, MA performs best; TCP Pattern is close Aggregate is not accurate per RTT throughput Aggregate TCP Pattern per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern

19 TMH - GAN'05, 05/10/2005 19 Results (3) – predict at 25 th RTT for different time in the future More data is available, TCP Pattern performs best; MA is close Aggregate performs better per RTT throughput Aggregate TCP Pattern per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern

20 TMH - GAN'05, 05/10/2005 20 Results (4) – predict at 50 th RTT for different time in the future Even more data is available, MA now performs worse, due to dynamic of TCP flows TCP Pattern best and Aggregate is close per RTT throughput Aggregate TCP Pattern per RTT throughput Moving Average EWMA TCP Pattern per RTT throughput Aggregate Moving Average EWMA TCP Pattern

21 TMH - GAN'05, 05/10/2005 21 Summary of Results Aggregate is accurate with sufficient data, not with a few RTTs of data MA performs very well for a few RTTs of data EWMA is not a good predictor TCP Pattern generally performs better or as well as other methods

22 TMH - GAN'05, 05/10/2005 22 Summary of Results (table view) Methods Small # of RTTs of data Large # of RTTs of data AggregateWorse (3)Better (2) Moving Average Best (1)Worse (3) EWMAWorst (4) TCP PatternBetter (2)Best (1)

23 TMH - GAN'05, 05/10/2005 23 Conclusion and Future Works TCP-pattern based throughput prediction is as good or better than other methods. Good predictions within 25 RTTs (or ~ 5 sec). Patterns observed: 65% Rate Control, few Congestion Control Methods using Aggregate (e.g. NWS) can not be expected to work well for small test files What’s next?  Identify more patterns  Add a degree of confidence for each prediction  Multiple TCP streams

24 TMH - GAN'05, 05/10/2005 24 That’s all, folks! Thank You!

25 TMH - GAN'05, 05/10/2005 25 Supplement Slides

26 TMH - GAN'05, 05/10/2005 26 Characteristics of collected traces (1) TermsValuesComments Number of traces461 Downloaded file size26-34 MBAvg: 30 MB Unique web sites290Debian/Gentoo Avg # segment per trace24,062 (min/max/median) = (17,025/69,866/24,412) Retransmitted segments0.09%97 out of 461 traces Avg # retransmitted segments per trace 103.6 (min/max/median) = (0/2,672/4) Avg SYN-ACT RTT0.1696 sec (min/max/median) = (0.02/2.91/0.155) Avg # RTTs per trace2,589 (min/max/median) = (143/110,673/662)

27 TMH - GAN'05, 05/10/2005 27 Characteristics of collected traces (2) Type#traces%Comments Rate Control30165.29%35 traces (7.59%) have big gaps (> 10 RTTs) Congestion Avoidance 306.51% Mixed or Congestion Control 13028.20%51 traces (11.06%) are very low in volume (up to 8~12 pkts/RTT (vs ~44 pkts/RTT)) Total461100.00% Classification: one trace presents over 50% “ some type ” of patterns.

28 TMH - GAN'05, 05/10/2005 28 Some Trace Patterns (300 RTTs) Under-estimated RTT; 100 RTTs

29 TMH - GAN'05, 05/10/2005 29 Results (0.5) – predict next 100 RTTs at different time per RTT throughput Aggregate Moving Average EWMA TCP Pattern

30 TMH - GAN'05, 05/10/2005 30 Results (1.5) – predict next 400 RTTs at different time per RTT throughput Aggregate Moving Average EWMA TCP Pattern

31 TMH - GAN'05, 05/10/2005 31 Bandwidth Bandwidth:  The amount of data that can be pushed through a link in unit time. Usually measured in bits or bytes per second. Bottleneck Bandwidth (BB) Available Bandwidth (AB) Throughput (T) T ≤ AB ≤ BB


Download ppt "Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers Tsung-i (Mark) Huang Jaspal Subhlok University of Houston GAN ’ 05 / May 10, 2005."

Similar presentations


Ads by Google