Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Server-based Characterization and Inference of Internet Performance Venkat Padmanabhan Lili Qiu Helen Wang Microsoft Research UCLA/IPAM Workshop March.

Similar presentations


Presentation on theme: "1 Server-based Characterization and Inference of Internet Performance Venkat Padmanabhan Lili Qiu Helen Wang Microsoft Research UCLA/IPAM Workshop March."— Presentation transcript:

1 1 Server-based Characterization and Inference of Internet Performance Venkat Padmanabhan Lili Qiu Helen Wang Microsoft Research UCLA/IPAM Workshop March 2002

2 2 Outline Overview Server-based characterization of performance Server-based inference of performance –Passive Network Tomography Summary and future work

3 3 Overview Goals –characterize end-to-end performance –infer characteristics of interior links Approach: server-based monitoring –passive monitoring  relatively inexpensive –enables large-scale measurements –diversity of network paths

4 4 Web server clients DATA ACKs

5 5 Research Questions Server-based characterization of end- to-end performance –correlation with topological metrics –spatial locality –temporal stability Server-based inference of internal link characteristics –identification of lossy links

6 6 Related Work Server-based passive measurement –1996 Olympics Web server study (Berkeley, 1997 & 1998) –characterization of TCP properties (Allman 2000) Active measurement –NPD (Paxson 1997) –stationarity of Internet path properties (Zhang et al. 2001)

7 7 Experiment Setting Packet sniffer at microsoft.com –550 MHz Pentium III –sits on spanning port of Cisco Catalyst 6509 –packet drop rate < 0.3% –traces up to 2+ hours long, 20-125 million packets, 50-950K clients Traceroute source –sits on a separate Microsoft network, but all external hops are shared –infrequent and in the background

8 8 Topological Metrics and Loss Rate Topological distance is a poor predictor of packet loss rate. All links are not equal  need to identify the lossy links

9 9 Spatial Locality Spatial locality  there may be shared cause for packet loss Do clients in the same cluster see similar loss rates? Loss rate is quantized into buckets –0-0.5%, 0.5-2%, 2-5%, 5-10%, 10-20%, 20+% –suggested by Zhang et al. (IMW 2002) Focus on lossy clusters –average loss rate > 5%

10 10 Temporal Stability Loss rate again quantized into buckets Metric of interest: stability period (i.e., time until transition into new bucket) Median stability period ≈ 10 minutes Consistent with previous findings based on active measurements

11 11 Putting it all together All links are not equal  need to identify the lossy links Spatial locality of packet loss rate  lossy links may well be shared Temporal stability  worthwhile to try and identify the lossy links

12 12 Passive Network Tomography Goal: determine characteristics of internal network links using end-to- end, passive measurements We focus on the link loss rate metric –primary goal: identifying lossy links Why is this interesting? –locating trouble spots in the network –keeping tabs on your ISP –server placement and server selection

13 13 Sprint AT&T Web server UUNET C&W Qwest AOL Earthlink Darn, it’s slow! Why is it so slow?

14 14 Related Work MINC (Caceres et al. 1999) –multicast-based active probing Striped unicast (Duffield et al. 2001) –unicast-based active probing Passive measurement (Coates et al. 2002) –look for back-to-back packets Shared bottleneck detection –Padmanabhan 1999, Rubenstein et al. 2000, Katabi et al. 2001

15 15 Active Network Tomography S AB S AB Multicast probes Striped unicast probes

16 16 Problem Formulation l1l1 l8l8 l7l7 l6l6 l2l2 l4l4 l5l5 l3l3 server clients p1p1 p2p2 p3p3 p4p4 p5p5 Collapse linear chains into virtual links (1-l 1 )*(1-l 2 )*(1-l 4 ) = (1-p 1 ) (1-l 1 )*(1-l 2 )*(1-l 5 ) = (1-p 2 ) … (1-l 1 )*(1-l 3 )*(1-l 8 ) = (1-p 5 ) Under-constrained system of equations

17 17 #1: Random Sampling Randomly sample the solution space Repeat this several times Draw conclusions based on overall statistics How to do random sampling? –determine loss rate bound for each link using best downstream client –iterate over all links: pick loss rate at random within bounds update bounds for other links Problem: little tolerance for estimation error l1l1 l8l8 l7l7 l6l6 l2l2 l4l4 l5l5 l3l3 server clients p1p1 p2p2 p3p3 p4p4 p5p5

18 18 #2: Linear Optimization Goals Parsimonious explanation Robust to estimation error L i = log(1/(1-l i )), P j = log(1/(1-p j )) minimize  L i +  |S j | L 1 +L 2 +L 4 + S 1 = P 1 L 1 +L 2 +L 5 + S 2 = P 2 … L 1 +L 3 +L 8 + S 5 = P 5 L i >= 0 Can be turned into a linear program l1l1 l8l8 l7l7 l6l6 l2l2 l4l4 l5l5 l3l3 server clients p1p1 p2p2 p3p3 p4p4 p5p5

19 19 #3: Bayesian Inference Basics: –D: observed data s j : # packets successfully sent to client j f j : # packets that client j fails to receive –Θ: unknown model parameters l i : packet loss rate of link i –Goal: determine the posterior P(Θ|D) –inference is based on loss events, not loss rates Bayes theorem –P(Θ|D) = P(D|Θ)P(Θ)/ ∫ P(D|Θ)P(Θ)dΘ –hard to compute since Θ is multidimensional l1l1 l8l8 l7l7 l6l6 l2l2 l4l4 l5l5 l3l3 server clients (s 1,f 1 )(s 2,f 2 )(s 3,f 3 )(s 4,f 4 )(s 5,f 5 )

20 20 Gibbs Sampling Markov Chain Monte Carlo (MCMC) –construct a Markov chain whose stationary distribution is P(Θ|D) Gibbs Sampling: defines the transition kernel –start with an arbitrary initial assignment of l i –consider each link i in turn –compute P(l i |D) assuming l j is fixed for j≠i –draw sample from P(l i |D) and update l i –after burn-in period, we obtain samples from the posterior P(Θ|D)

21 21 Gibbs Sampling Algorithm 1) Initialize link loss rates arbitrarily 2) For j = 1 : burn-in for each link i compute P(l i |D, {l i ’}) where l i is loss rate of link i, and {l i ’} =  j  i l j 3) For j = 1 : realSamples for each link i compute P(l i |D, {l i ’}) Use all the samples obtained at step 3 to approximate P(  |D)

22 22 Experimental Evaluation Simulation experiments Internet traffic traces

23 23 Simulation Experiments Advantage: no uncertainty about link loss rate Methodology –Topologies used: randomly-generated: 20 - 3000 nodes, max degree = 5-50 real topology obtained by tracing paths to microsoft.com clients –randomly-generated packet loss events at each link a fraction f of the links are good, and the rest are “bad” LM1: good links: 0 – 1%, bad links: 5 – 10% LM2: good links: 0 – 1%, bad links: 1 – 100% Goodness metrics: –Coverage: # correctly inferred lossy links –False positives: # incorrectly inferred lossy links

24 24 Simulation Results

25 25 Simulation Results

26 26 Simulation Results High confidence in top few inferences

27 27 Trade-off TechniquesCoverageFalse PositiveComputation Random samplingHigh Low LPMediumLowMedium Gibbs samplingHighLowHigh

28 28 Internet Traffic Traces Challenge: validation –Divide client traces into two: tomography set and validation set –Tomography data set => loss inference –Validation set => check if clients downstream of the inferred lossy links experience high loss Results –false positive rate is between 5 – 30% –likely candidates for lossy links: links crossing an inter-AS boundary links having a large delay (e.g. transcontinental links) links that terminate at clients –example lossy links: San Francisco (AT&T)  Indonesia (Indo.net) Sprint  PacBell in California Moscow  Tyumen, Siberia (Sovam Teleport)

29 29 Summary Poor correlation between topological metrics & performance Significant spatial locality and temporal stability Passive network tomography is feasible Tradeoff between computational cost and accuracy Future directions –real-time inference –selective active probing Acknowledgements: –MSR: Dimitris Achlioptas, Christian Borgs, Jennifer Chayes, David Heckerman, Chris Meek, David Wilson –Infrastructure: Rob Emanuel, Scott Hogan http://www.research.microsoft.com/~padmanab


Download ppt "1 Server-based Characterization and Inference of Internet Performance Venkat Padmanabhan Lili Qiu Helen Wang Microsoft Research UCLA/IPAM Workshop March."

Similar presentations


Ads by Google