Presentation is loading. Please wait.

Presentation is loading. Please wait.

18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut.

Similar presentations


Presentation on theme: "18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut."— Presentation transcript:

1 18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut Eurecom Sophia Antipolis, France

2 18 November 2015 2 Outline n Introduction and Motivation u Root cause analysis of TCP throughput: what and why? n Part 1: Methodology u InTraBase: Integrated Traffic Analysis Based on Object Relational DBMS n Part 2: Root cause analysis techniques u Taxonomy of TCP rate limitation causes u Our approach to infer limitation causes n Part 3: Case study on Performance Analysis of ADSL Clients n Conclusions u Contributions u Future work

3 18 November 2015 3 The Internet: over the last 5 years… n Traffic volumes and number of users have skyrocketed n Access link capacities have multiplied n Dominance shifted from Web+FTP into Peer-to-peer applications n TCP still the dominating transport protocol u Carries over 90% of traffic

4 18 November 2015 4 The Internet: questions raised n ISPs would like to know how clients are doing n What are the performance limitations that Internet applications are facing? n Why does a client with 4Mbit/s ADSL access obtain only total download rate of few KB/s with eDonkey? n Why, after upgrading my link, I see no improvement in throughput? n Internet does not provide directly answers u The network is dumb! è Need techniques for traffic measurement and analysis

5 18 November 2015 5 Root Cause Analysis of TCP Throughput What? n Analysis and inference of the reasons that prevent a given TCP connection from achieving a higher throughput. u Reasons are called limitation causes Why TCP? n TCP typically over 90% of all traffic

6 18 November 2015 6 Background n TCP Rate Analysis Tool (T-RAT) by Zhang et al. (sigcomm 2002) u Pioneering research work F Ground breaking insights F It is not all congestion! F Opened up many questions u We implemented and tested it F Results are way off too often F Fundamental assumptions do not hold u T-RAT analyzes unidirectional traffic F Passively collected measurements F Usable in more cases (asymmetric paths) F The source of the problems

7 18 November 2015 7 Our approach n We analyze only passive traffic measurements u Capture and store all TCP/IP headers, analyze later off-line n Observe traffic at a single measurement point u Applicable in diverse situations u E.g. at the edge of an ISP’s network F Know all about clients’ downloads and uploads n Bidirectional packet traces n Connection level analysis

8 18 November 2015 8 n Single measurement point anywhere along the path u Cannot/don’t want to control it u Complicates estimation of parameters (RTT and cwnd) Challenges (1/3) A: RTT ~ d1  piece of cake… B: RTT ~ d3+d4  How to get d4? (Did ack2 trigger data2?) ack2 A B

9 18 November 2015 9 Challenges (2/3) n A lot of data to analyze u Potentially millions of connections per trace n Deep analysis u For each connection of each trace F Compute a lot of metrics F Divide connections into pieces Analyse separately and compute more metrics F Need to keep track of everything

10 18 November 2015 10 Challenges (3/3) n Find the right metrics to characterize all limitations u Not too many u Need to gather a lot of experience n Get it right! u Several methods for computing a particular metrics F Choose the “best” for the situation F Try to maximize correctness of results F E.g. 5 ways to estimate RTTs u Careful validations F Benchmark with a lot of reference traces F Cross validate metrics

11 18 November 2015 11 Outline n Introduction and Motivation u Root cause analysis of TCP throughput: what and why? n Part 1: Methodology u InTraBase: Integrated Traffic Analysis Based on Object Relational DBMS n Part 2: Root cause analysis techniques u Taxonomy of TCP rate limitation causes u Our approach to infer limitation causes n Part 3: Case study on Performance Analysis of ADSL Clients n Conclusions u Contributions u Future work

12 18 November 2015 12 Why did we need InTraBase? n First try: ad-hoc scripts and specialized software tools (tcptrace et al.) n Problems: 1.Management Data, metadata, and tools Got lost with files containing data and ad- hoc scripts Lot of metrics to compute and combine 2.Cumbersome analysis process Iterative analysis Data loses semantics and structure 3.Scalability Cannot analyze large enough data sets Filter Process Combine Store Interpret

13 18 November 2015 13 Our InTraBase approach Application logs Web100 Raw base data files Network link Base data Results Queries Meta data Database System Application TCP IP Preprocess tcpdump Functions n Store traffic measurements in files as base data n Upload base data into the db and process it within the db u Issue SQL queries u Object-relational DBMS  create functions for advanced processing

14 18 November 2015 14 Benefits from a DBMS-based Approach n Organize and manage data, related metadata, analysis results and tools n Data becomes structured and has semantics u Processing and updating data is easier F Tools “understand” the data  higher-level programming u Searching is more efficient (indexes) u Store reusable intermediate results u It is easier to combine different data sources F E.g. across OSI layers

15 18 November 2015 15 SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2; SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2; SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2; SELECT plot_ts_hist(‘SELECT * FROM iat(t2.cnxid,t2.reverse,”packets”)','histogram.pdf') FROM (SELECT cnxid,reverse FROM cnxs,(SELECT max(throughput) FROM cnxs) AS t1 WHERE cnxs.throughput=t1.max) AS t2; Histogram of the packet inter-arrival times of the fastest connection connections bytes packets tput … connection id packets timestamp start #seq end #seq flags … connection id iat(…)plot_ts_hist() histogram.pdf

16 18 November 2015 16 Outline n Introduction and Motivation u Root cause analysis of TCP throughput: what and why? n Part 1: Methodology u InTraBase: Integrated Traffic Analysis Based on Object Relational DBMS n Part 2: Root cause analysis techniques u Taxonomy of TCP rate limitation causes u Our approach to infer limitation causes n Part 3: Case study on Performance Analysis of ADSL Clients n Conclusions u Contributions u Future work

17 18 November 2015 17 Scope n Study long lived TCP connections u Short connections are another topic F Dominated by slow start? n Assume FIFO scheduling u Necessary for link capacity estimations with packet dispersion techniques u Reasonable assumption for most traffic u May not hold for cable modem and 802.11 access networks

18 18 November 2015 18 Limitation Causes for TCP Throughput n Application n Transport layer u TCP receiver F Receiver window limitation u TCP protocol F Slow start… n Network layer u Bottleneck link

19 18 November 2015 19 Application that sends larger bursts separated by idle periods u BitTorrent, HTTP/1.1 (persistent) only keep-alive messages transfer periods

20 18 November 2015 20 Limitation Causes: Application n The application does not even attempt to use all network resources n TCP connections are partitioned into two periods: u Bulk Transfer Period (BTP): application provides constantly data to transfer F Never run out of data in buffer B1 u Application Limited Period (ALP): opposite of BTP F TCP has to wait for data because B1 is empty Application TCP Network Sender Receiver buffers B1

21 18 November 2015 21 Limitation Causes: TCP Receiver n Receiver advertized window limits the rate u max amount of outstanding bytes = min(cwnd,rwnd)  Sender is idle waiting for ACKs to arrive n Flow control u Sender application overflows receiving application u Buffer B2 is full n Configuration problem (unintentional) u default receiver advertized window is set too low u window scaling is not enabled Application TCP Network Sender Receiver buffers B2

22 18 November 2015 22 Limitation Causes: Network n Limitation is due to congestion at a bottleneck link u Shared bottleneck: obtain only a fraction of its capacity u Non-shared bottleneck: obtain all of its capacity

23 18 November 2015 23 Our Approach to Root Cause Analysis n Divide & Conquer 1.Partition connections into BTPs and ALPs F Filter out application impact 2.Analyze the bulk transfer periods for limitation by F TCP receiver F TCP protocol F Network n Methods are based on metrics computed from packet headers

24 18 November 2015 24 Why filter out application effect? n Many TCP/IP –level traffic studies do not account for application effect u RTTs, burstiness… u Try to study network properties but end up measuring application effect instead!

25 18 November 2015 25 Distinguishing BTPs from ALPs: Isolate & Merge algorithm n 1. phase: Isolate u Fact: TCP always tries to send MSS size packets u Consequence: small packets (size < MSS) and idle time indicate application limitation F Buffer between application and TCP is empty Time Idle time > RTT MSS packet packet smaller than MSS ALP … … large fraction of small packets

26 18 November 2015 26 Distinguishing BTPs from ALPs: Isolate & Merge algorithm n 2. phase: Merge u Why? F After Isolate, BTPs may be separated by very short ALPs F Analyze impact of the application How much ALPs decrease overall throughput? u How? F Merge subsequent transfer periods separated by ALP to create a new BTP F Mergers controlled with drop parameter F Iterate until all possible mergers are performed

27 18 November 2015 27 BTP Analysis 1.Compute limitation scores for each BTP u 4 quantitative scores F  [0,1] F We use retransmission rates, inter-arrival time patterns, path capacity, RTT etc. 2.Perform classification of BTPs into limitation causes u Map (combination of) limitation scores into a cause u Threshold-based scheme

28 18 November 2015 28 Classification scheme n 4 thresholds need to be set b-score Dispersion score Retransmission score Receiver window limitation score

29 18 November 2015 29 Classification: calibrating the thresholds n Difficult task: Diversity vs. Control u Reference data needs to be representative & diverse enough F No simulations u Need to control experiments in some way to get what we want n Reference data with partially controlled experiments u Try to generate transfers limited by certain cause u FTP downloads from Fedora Core mirror sites F 232 sites covering all continents u Artificial bottleneck links with rshaper F network limitation u Nistnet to add delay F receiver limitation (W r /RTT < bw) u Control the number of simultaneous downloads F unshared vs. shared bottleneck Interne t Australia Japan Finland USA Eurecom Rshaper Nistnet

30 18 November 2015 30 Classification: calibrating the thresholds example bottleneck set at 1 Mbit/s, 1 download at a time set th1 here

31 18 November 2015 31 Outline n Introduction and Motivation u Root cause analysis of TCP throughput: what and why? n Part 1: Methodology u InTraBase: Integrated Traffic Analysis Based on Object Relational DBMS n Part 2: Root cause analysis techniques u Taxonomy of TCP rate limitation causes u Our approach to infer limitation causes n Part 3: Case study on Performance Analysis of ADSL Clients n Conclusions u Contributions u Future work

32 18 November 2015 32 Motivation n Stress test for our techniques u Do we learn useful things? n Knowing throughput limitations (=performance) is useful u ISPs want satisfied clients u Need to know what’s going on before things can be improved n Installed InTraBase at France Telecom to study traffic at their ADSL access network u Root cause analysis techniques implemented within InTraBase

33 18 November 2015 33 Measurement Setup n 24 hours of traffic on March 10, 2006 n 290 GB of TCP traffic u 64% downstream, 36% upstream n Observed packets from ~3000 clients, analyze only 1335 u Excluded clients did not generate enough traffic for RCA Two pcap probes here Internet collect network access network

34 18 November 2015 34 n Connections u Size distribution highly skewed u Use only 1% of them for RCA F Represent > 85% of all traffic n Clients u Heavy-hitters: 15% of clients generate 85-90% of traffic (up & down) u Low access link utilization F Why? Warming up…

35 18 November 2015 35 Results of Limitation Analysis n Striking result u Application limits performance of over 80% of clients u What’s going on?

36 18 November 2015 36 Application analysis: Application limited traffic n Quite stable and symmetric volumes u Over 80% of all traffic n eDonkey and “other” dominate P2P other eDonkey

37 18 November 2015 37 Application analysis: Saturated access link n No recognized P2P n Asymmetric port 80/8080 downstream u Real Web traffic?

38 18 November 2015 38 Connecting the evidence… n Most clients’ performance limited by applications n Very low link utilizations for application limited traffic n Most of application limited traffic seems to be P2P u Peers often have asymmetric uplink and downlink capacities u P2P applications/users enforce upload rate limits  Most clients’ download performance seems to suffer from P2P clients drastically limiting their upload rates Interne t Low utilization Low capacity+rate limiter downloading client uploading clients

39 18 November 2015 39 Outline n Introduction and Motivation u Root cause analysis of TCP throughput: what and why? n Part 1: Methodology u InTraBase: Integrated Traffic Analysis Based on Object Relational DBMS n Part 2: Root cause analysis techniques u Taxonomy of TCP rate limitation causes u Our approach to infer limitation causes n Part 3: Case study on Performance Analysis of ADSL Clients n Conclusions u Contributions u Future work

40 18 November 2015 40 Conclusions Claims and contributions  Part 1  Part 2  Part 3 1.DBMSs provide powerful infrastructure for analysis of passive traffic measurements u Performance is good. 2.We can infer root causes for TCP throughput using u bidirectional packet traces at u single measurement point located anywhere on the TCP/IP path. 3.Today’s Internet applications interact in diverse ways with TCP u Bias/error in TCP/IP path analysis u Filter out their effects first 4.TCP root cause analysis techniques with DBMS-based analysis enable: u performance evaluation of applications, u evaluation of network utilization, and u identification of TCP configuration problems.

41 18 November 2015 41 The case is not yet closed… n Short connections u Challenge previous “old” results with RCA u What about persistent connections? n Wireless traffic u Non-FIFO scheduling u Link-layer issues n Extended case study on ADSL clients u We saw a day, what about a week? u Trends, consistency

42 18 November 2015 42 Thank you! Questions?

43 18 November 2015 43 Backup slides

44 18 November 2015 44 Thesis claims 1.DBMSs provide powerful infrastructure for analysis of passive traffic measurements u Performance is good. 2.We can infer root causes for TCP throughput using u bidirectional packet traces at u single measurement point located anywhere on the TCP/IP path. 3.Today’s Internet applications interact in diverse ways with TCP u Bias/error in TCP/IP path analysis u Filter out their effects first 4.TCP root cause analysis techniques with DBMS-based analysis enable: u performance evaluation of applications, u evaluation of network utilization, and u identification of TCP configuration problems.

45 18 November 2015 45 TCP Rate Limiting Factors (Causes) FactorDescription Congestion Sender’s window is adjusted according to TCP’s congestion control algorithm in response to detecting packet loss Bandwidth Sender is limited by the bandwidth on the bottleneck link. It fully utilizes the that bandwidth without competing with any other flows on the bottleneck link. (e.g. a user connected via a modem line) Transport Flow has left slow-start and is doing congestion avoidance, but does not experience any loss and is not limited by receiver/sender/bandwidth (typically a flow that is just a little bigger than a “mouse”)

46 18 November 2015 46 TCP Rate Limiting Factors FactorDescription ReceiverFlow is limited by receiver maximum advertised window Sender Flow is limited by sending buffer which limits the amount of unacknowledged data that can be outstanding at any time. Opportunity Application has a limited amount of data to send and never leaves slow start (i.e. the flow very short/small) Application limited Application does not produce data fast enough to be limited by either the transport layer or by network bandwidth. (e.g. a telnet session where the user types)

47 47 Time SYN FIN First packet of the blue connection; “SYN” flag Last packet of the blue connection; “FIN” flag one connection trace RTT of connection blue Flight

48 48 Trace (tcpdump text format, only tcp packets) Split up into connections Split up into flights Rtt’s between 3 ms and 3 sec (27 candidates) Categorization of flights into state “Slow start”, “congestion avoidance” or “unknown” Round trip time Rate limitation tests T-RAT algorithm

49 18 November 2015 49 The way we measure and analyze Internet measurements passive measurement techniques active measuring (probing) SNMP/RMON Packet monitoring (pcap,DAG) Flow measurements (Netflow) Link capacity/bw estimation traceroute ping … Offline analysis of measurements passive measurements active measurements Specialized software tools (tcptrace) scripts (perl,awk…) Toolkits (Coralreef) scripts InTraBase Plotting/graphing of results Matlab, R, …

50 18 November 2015 50 Benefits from a DBMS-based Approach n The database consists of reusable components u Performing new analysis is less laborious and error prone

51 18 November 2015 51 Processing a tcpdump file 1. Store annotation about the trace into traces table 2. Create packets table 3. Copy packets from a file into the packets modified tcpdump or dagdump - enforce structure - add connection ids DBS pcap trace file psql copy 4. Create connection level statistics into connections 5. Insert unique 4-tuple to cnxid mapping data into cid2tuple dag trace file

52 18 November 2015 52 Prototype base table layout

53 18 November 2015 53 Prototype Evaluation and Performance Optimization n Prototype on PostgreSQL u analyze TCP traffic from packet traces n Typical off-the-shelf DBMS is not optimized for scientific data management u Scientists complain about performance u Configuration matters: default is generally no good n Optimizations for frequent I/O operations mandatory u Processing time drops in best case from days to minutes! n We observed good enough performance u Processing time scales linearly up to 10GB traces (at least) u Process 10 GB trace easily over night u 50% overhead in disk space consumption F Price to pay for structured data

54 18 November 2015 54 Evaluating the Prototype n Analysis of the feasibility u Processing time u Disk space consumption n Test data: u BitTorrent traffic files F Few large connections u Mixed Internet traffic files F Lots of small connections n Tests run on Linux 2.6.3., 2x Intel Xeon 2.2GHz, SCSI RAID

55 18 November 2015 55 Prototype Feasibility Evaluation: loading & processing a trace n Prototype on PostgreSQL u analyze TCP traffic from packet traces n Good enough processing time u Scales linearly up to 10GB (at least) u Not supposed to be real time! n 50% overhead in disk space consumption u Usually not an issue u Price to pay for structured data

56 18 November 2015 56 Optimizing the Performance n Typical off-the-shelf DBMS is not optimized for scientific data management u Configuration matters: default is generally no good n Performance depends on the characteristics of the data u In analysis of packet-level traffic measurements a specific query is often very popular u Optimize performance for this query

57 18 November 2015 57 Optimization of typical analysis task Focus on minimizing the I/O of the c-query u indexing = fast lookup u clustering = physical grouping of data Result set size is typically very small compared to the number of packets queried  rarely the bottleneck Analysis task specific  no generic solutions

58 18 November 2015 58 Performance measurements n Setup u Two 5 GB traces of packet headers: 1. trace containing mixed Internet traffic u a lot of parallel connections 2. trace consisting of only BitTorrent traffic from a single client u few parallel connections u Execute the c-query for connections with different sizes F With and without I/O optimizations n Main results u Total time to query all the connections drops for F mixed trace: from 8 days to 27 min! F BitTorrent trace: from 1.5 days to 32 min

59 18 November 2015 59 Optimizing the Performance: Tuning the DBMS n Typical workload (based on experience with InTraBase) u Few users u Low query arrival rate u Queries touch large amounts of data u Single most common data intensive query (= c-query) SELECT * FROM trace1_packets WHERE connection_id=x ORDER BY timestamp; n Consequent configuration u Tune up buffer sizes F Sorting (c-query’s ORDER BY), caching (c-query results) u Minimize impact of Write Ahead Logging (atomicity)  rarely parallel query processing

60 18 November 2015 60 Minimizing the cost of I/O of the c-query n Indexes u fast lookup of specific rows from tables u logical u index on connection id n Clustering u group data together on the hard disk u physical u cluster on connection id SELECT * FROM trace1_packets WHERE connection_id=x ORDER BY timestamp; few connections in parallel a lot of connections in parallel

61 18 November 2015 61 More Results n Indexes are not always beneficial unless data is clustered u When querying large number of packets u Data scattered over disk n Total time to query all the connections drops for u mixed trace: from 8 d to 27 min u BitTorrent trace: from 1.5 d to 32 min n Caching and parallel I/O with RAID striping u Negligible in total exec time if data is indexed and clustered F 13% reduction at most u These results are machine specific, not generic F Parallel I/O and caching could help a lot if we had faster/more CPUs or slower disks

62 18 November 2015 62 Indexing & Clustering Total execution time Gigabit trace BitTorrent trace index becomes useless without clustering n total time to query all the connections drops for u mixed trace: from 8 d to 27 min u BitTorrent trace: from 1.5 d to 32 min

63 18 November 2015 63 Indexing & Clustering CPU IOWAIT time Gigabit trace BitTorrent trace After indexing and clustering almost all of the time is spent for other CPU activity than waiting for I/O.

64 18 November 2015 64 Other results n Caching and parallel I/O with RAID striping u Significant effect for IOWAIT time u Negligible in total exec time if data is indexed and clustered F 13% reduction at most u These results are machine specific, not generic F Parallel I/O and caching could help a lot if we had faster/more CPUs or slower disks

65 18 November 2015 65 Other results n If the CPU is no longer waiting for I/O, what is it doing? SELECT * FROM trace1_packets WHERE connection_id=x ORDER BY timestamp; u Exec time highly dependent on query result set width  internal DBMS operations related to handling result set tuples u Not sorting

66 18 November 2015 66 Conclusions n DBMS-based approach for analysis of passive packet- level measurements is attractive u provides structured and semantic data u performance remains as one of the major challenges n To get the most out of the database system… u DBMS should be correctly tuned F default is generally no good u database design needs to take into account the data characteristics F common query optimization F potential gain is very large

67 18 November 2015 67 Minimizing the cost of I/O of the c-query n Caching u OS-level caching F cache I/O results u DBMS-level caching F cache query results n Parallel I/O u high level: parallel DBS F overhead due to distributed transaction handling u low level: RAID striping F maximum I/O parallelism with clustering SELECT * FROM trace1_packets WHERE connection_id=x ORDER BY timestamp;

68 18 November 2015 68 An example of xplot time-sequence diagram receiver advertized window limit received acknowledgments retransmitted data sent data packets pushed data pkt is marked with a diamond outstanding bytes size of receiver advertized window

69 18 November 2015 69 Limitation Causes: Application n Five types of applications: 1.Constant application defined rate F Streaming applications, Skype, rate limited P2P (eDonkey, BitTorrent) F Connections: single ALP 2.User dependent transmission rate F telnet, instant messengers (MSN, ICQ) F Connections: typically single ALP 3.Transmission bursts separated by idle periods F Web browsing with persistent connections, BitTorrent F Connections: BTPs interspersed with ALPs 4.Transmit all data at once F FTP F Connections: single BTP 5.Mixture of 1. and 3. F Rate limited BitTorrent client

70 18 November 2015 70 Application that sends small amounts of data at constant rate

71 18 November 2015 71 TCP End Points: Receiver window limitation maximum amount of outstanding bytes = min(cwnd,rwnd)

72 18 November 2015 72 Inferring measurement point

73 18 November 2015 73 Limitation scores n receiver window limitation score u the fraction of time that the amount of outstanding bytes is limited by the receiver advertized window n retransmission score u the fraction of retransmitted bytes n dispersion score u measures how close the throughput achieved is to the capacity of the path u n b-score u measures the “burstiness” of the transfer u relates to time that sending TCP is idle waiting for ACKs F Receiver window limitation u computed from packet inter-arrival times

74 18 November 2015 74 Receiver window limitation score n Uses two time series: u outstanding bytes (O) u receiver advertised window (R) u Computed over RTT-long interval n Compute R-O for each pair of values u indicates how close the sender is to the limit set by the receiver advertised window u output 1 if R ~ O, and 0 otherwise n Limitation score is the average value from the R-O comparison u indicates the fraction of time being limited by the receiver advertised window

75 18 November 2015 75 n Retransmission score u fraction of bytes retransmitted n Dispersion score u assess the impact of the bottleneck on the throughput u if D S is close to zero tput~r non-shared bottleneck link u else shared bottleneck link Network limitation scores

76 18 November 2015 76 b score high b score  TCP receiver limited low b score  network limited n B-score relates to the time the sending TCP is idle waiting for ACKs to arrive Sender waits for ACKs

77 18 November 2015 77 Classification scheme n 4 thresholds need to be set This step must come before these two steps Order of these steps is irrelevant Can be performed at any stage

78 18 November 2015 78 Classification: calibrating the thresholds Threshold 0.25 gives maximum separation

79 18 November 2015 79 Classification scheme with thresholds empirically set

80 18 November 2015 80 Classifying the reference data 1 download at a time in BTPs in bytes

81 18 November 2015 81 Classifying the reference data 10 downloads at a time in BTPs in bytes

82 18 November 2015 82 Classifying the reference data 10 downloads at a time, added delay in BTPs in bytes

83 18 November 2015 83 Adapting InTraBase for RCA n PL functions u implement RCA algorithms u Populate RCA tables

84 18 November 2015 84 n Applications u Port based identification n Connections u Size distribution highly skewed u Use only 1% of them for RCA F Represent > 85% of all traffic n Clients u Heavy-hitters: 15% of clients generate 85-90% of traffic (up & down) u Low access link utilization F Why? Warming up… >5% of traffic each

85 18 November 2015 85 Client-level root cause analysis Limitation causes for clients 1.Application 2.Saturated access link 3.Network limitation due to distant bottleneck link 4.TCP configuration Connection-level RCA 1.ALPs 2.network limited BTPs during which utilization > 90% 3.network limited BTPs during which utilization < 90% 4.download (=local problem) BTPs limited by TCP layer associate bytes Extend the InTraBase framework

86 18 November 2015 86 Results of Limitation Analysis n Few active clients overall n Application limitation dominates n Network limitation by distant bottleneck also experienced similar contains most bytes contains some bytes

87 18 November 2015 87 Application analysis: Distant bottleneck link n Diverse mixture u Cause is not necessarily due to client’s behavior

88 18 November 2015 88 Impact of Limitation Causes n How far from optimal (access link saturation) are we? n Main observations u Uplink utilization < 50% during most of application and network limited uploads u Very low downlink utilization for application limited traffic F Utilization < 20% during 65% of ALPs

89 18 November 2015 89 Impact of Limitation Causes n Very low downlink utilization for application limited traffic upstream downstream How far from optimal (access link saturation) are we?

90 18 November 2015 90 M. Siekkinen et al. IntraBase: Integrated Traffic Analysis Based on a Database Management System. In e2emon 2005. M. Siekkinen et al. Object-Relational DBMS for Packet- Level Traffic Analysis: Case Study on Performance Optimization. In e2emon 2006. Conclusions Claims and contributions  Part 1  Part 2  Part 3 1.DBMSs help solve problems of management and suboptimal analysis process cycle in passive analysis packet traces u Performance is feasible. u I/O optimizations (indexing, clustering) can improve performance a lot 2.We can infer root causes for TCP throughput using u bidirectional packet traces from u single measurement point located anywhere on the TCP/IP path. 3.Today’s Internet applications interact in diverse ways with TCP u They introduce bias in TCP/IP path studies u Filter out their effects first 4.TCP root cause analysis techniques together with DBMS-based approach enable: u performance evaluation of Internet application protocols, u evaluation of network utilization, and u identification of certain TCP configuration problems. M. Siekkinen et al. Root Cause Analysis for Long-Lived TCP Connections. In CoNEXT 2005. M. Siekkinen et al. On the Interaction Between Internet Applications and TCP. Under submission. M. Siekkinen et al. Performance Limitations of ADSL Users: A Case Study. Under submission.


Download ppt "18 November 2015 1 Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications Matti Siekkinen Ph.D. Defense October 30, 2006 Institut."

Similar presentations


Ads by Google