Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Interesting Links

Similar presentations


Presentation on theme: "1 Interesting Links"— Presentation transcript:

1 1 Interesting Links http://statistik.wu-wien.ac.at/anuran/

2 On the Self-Similar Nature of Ethernet Traffic Will E. Leland, Walter Willinger and Daniel V. Wilson BELLCORE Murad S. Taqqu BU Presented by: Ashish Gupta ashish@cs.northwestern.edu April 23 rd, 2003 Analysis and Prediction of the Dynamic Behavior of Applications, Hosts, and Networks Prof. Peter Dinda http://www.cs.northwestern.edu/~pdinda/predclass-s03

3 3 Overview What is Self Similarity? Ethernet Traffic is Self-Similar Source of Self Similarity Implications of Self Similarity

4 Section 1: What is Self-Similarity ?

5 5 Intuition of Self-Similarity Something “feels the same” regardless of scale

6 6

7 7

8 8

9 9 Stochastic Objects In case of stochastic objects like time-series, self-similarity is used in the distributional sense

10 10 Pictorial View of Self-Similarity

11 11 Why is Self-Similarity Important? Recently, network packet traffic has been identified as being self-similar. Current network traffic modeling using Poisson distributing (etc.) does not take into account the self-similar nature of traffic. This leads to inaccurate modeling of network traffic.

12 12 Problems with Current Models A Poisson process  When observed on a fine time scale will appear bursty  When aggregated on a coarse time scale will flatten (smooth) to white noise A Self-Similar (fractal) process  When aggregated over wide range of time scales will maintain its bursty characteristic

13 13 Pictorial View of Current Modeling

14 14 Consequences of Self-Similarity Traffic has similar statistical properties at a range of timescales: ms, secs, mins, hrs, days Merging of traffic (as in a statistical multiplexer) does not result in smoothing of traffic Bursty Data Streams Aggregation Bursty Aggregate Streams

15 15 Side-by-side View

16 Section 1.5: Self-Similarity Definitions

17 17 Definitions and Properties Long-range Dependence  autocorrelation decays slowly Hurst Parameter  Developed by Harold Hurst (1965)  H is a measure of “burstiness” also considered a measure of self-similarity  0 < H < 1  H increases as traffic increases

18 18 Definitions and Properties Cont.’d low, medium, and high traffic hours as traffic increases, the Hurst parameter increases  i.e., traffic becomes more self-similar

19 19 Self-Similarity in Traffic Measurement ( Ⅱ ) Network Traffic

20 20 Properties of Self Similarity X = (X t : t = 0, 1, 2, ….) is covariance stationary random process (i.e. Cov(X t,X t+k ) does not depend on t for all k) Let X (m) ={X k (m) } denote the new process obtained by averaging the original series X in non-overlapping sub-blocks of size m. Mean , variance  2 Suppose that Autocorrelation Function r(k)  k -β, 0<β<1 E.g. X(1)= 4,12,34,2,-6,18,21,35 Then X(2)=8,18,6,28 X(4)=13,17

21 21 Auto-correlation Definition X is exactly second-order self-similar if  The aggregated processes have the same autocorrelation structure as X. i.e.  r (m) (k) = r(k), k  0 for all m =1,2, … X is [asymptotically] second-order self-similar if the above holds when [ r (m) (k)  r(k), m   Most striking feature of self-similarity: Correlation structures of the aggregated process do not degenerate as m  

22 22 Traditional Models This is in contrast to traditional models Correlation structures of their aggregated processes degenerate as m   i.e. r (m) (k)  0 as m  for k = 1,2,3,... Example:  Poisson Distribution  Self-Similar Distribution

23 23

24 24 Long Range Dependence Processes with Long Range Dependence are characterized by an autocorrelation function that decays hyperbolically as k increases Important Property: This is also called non-summability of correlation

25 25 Intuition Short-range processes:  Exponential Decay of autocorrelations, i.e.:  r(k) ~ p k, as k  , 0 < p < 1  Summation is finite The intuition behind long-range dependence:  While high-lag correlations are all individually small, their cumulative affect is important  Gives rise to features drastically different from conventional short-range dependent processes

26 26 The Measure of Self-Similarity Hurst Parameter H, 0.5 < H < 1 Three approaches to estimate H (Based on properties of self-similar processes)  Variance Analysis of aggregated processes  Analysis of Rescaled Range (R/S) statistic for different block sizes  A Whittle Estimator

27 27 Variance Analysis Variance of aggregated processes decays as:  Var(X (m) ) = am -b as m  inf, For short range dependent processes (e.g. Poisson Process),  Var(X (m) ) = am -1 as m  inf, Plot Var(X (m) ) against m on a log-log plot Slope > -1 indicative of self-similarity

28 28

29 29 The R/S statistic where For a given set of observations, Rescaled Adjusted Range or R/S statistic is given by

30 30 Example X k = 14,1,3,5,10,3 Mean = 36/6 = 6 W 1 =14-(1*6 )=8 W 2 =15-(2*6 )=3 W 3 =18-(3*6 )=0 W 4 =23-(4*6 )=-1 W 5 =33-(5*6 )=3 W 6 =36-(6*6 )=0 R/S = 1/S*[8-(-1)] = 9/S

31 31 The Hurst Effect For self-similar data, rescaled range or R/S statistic grows according to cn H  H = Hurst Paramater, > 0.5 For short-range processes,  R/S statistic ~ dn 0.5 History: The Nile river  In the 1940-50’s, Harold Edwin Hurst studies the 800-year record of flooding along the Nile river.  (yearly minimum water level)  Finds long-range dependence.

32 32

33 33 Whittle Estimator Provides a confidence interval Property: Any long range dependent process approaches FGN, when aggregated to a certain level Test the aggregated observations to ensure that it has converged to the normal distribution

34 34 Recap Self-similarity manifests itself in several equivalent fashions:  Non-degenerate autocorrelations  Slowly decaying variance  Long range dependence  Hurst effect

35 Section 2: Ethernet Traffic is Self-Similar

36 36 The Famous Data Leland and Wilson collected hundreds of millions of Ethernet packets without loss and with recorded time-stamps accurate to within 100µs. Data collected from several Ethernet LAN’s at the Bellcore Morristown Research and Engineering Center at different times over the course of approximately 4 years.

37 37

38 38 Plots Showing Self-Similarity ( Ⅰ ) H=0.5 H=1 Estimate H  0.8

39 39 Plots Showing Self-Similarity ( Ⅱ ) Higher Traffic, Higher H High Traffic Mid Traffic Low Traffic 1.3%-10.4% 3.4%-18.4% 5.0%-30.7%

40 40 Observation shows “contrary to Poisson”  Network UtilizationH  As we shall see shortly, H measures traffic burstiness As number of Ethernet users increases, the resulting aggregate traffic becomes burstier instead of smoother H : A Function of Network Utilization

41 41 Difference in low traffic H values Pre-1990: host-to-host workgroup traffic Post-1990: Router-to-router traffic Low period router-to-router traffic consists mostly of machine-generated packets  Tend to form a smoother arrival stream, than low period host-to-host traffic

42 42 Summary Ethernet LAN traffic is statistically self-similar H : the degree of self-similarity H : a function of utilization H : a measure of “burstiness” Models like Poisson are not able to capture self-similarity

43 43 Discussions How to explain self-similarity ?  Heavy tailed file sizes How this would impact existing performance?  Limited effectiveness of buffering  Effectiveness of FEC

44 Section 3: Explaining Self - Similarity

45 45 Introduction The superposition of many ON/OFF sources whose ON-periods and OFF-periods exhibit the Noah Effect produces aggregate network traffic that features the Joseph Effect. Also known as packet train models Noah Effect: high variability or infinite variance Joseph Effect: Self-similar or long-range dependent traffic

46 46 The Noah Effect Noah Effect is the essential point of departure from traditional to self-similar traffic modeling Results in highly variable ON-OFF periods : Train length and inter-train distances can be very large with non-negligible probabilities Infinite Variance Syndrome : Many naturally occurring phenomenon can be well described with infinite variance distributions Heavy-tail distributions,  parameter

47 47 Existing Models Traditional traffic models: finite variance ON/OFF source models Superposition of such sources behaves like white noise, with only short range correlations

48 48 Idealized ON/OFF Model Lengths of ON- and OFF periods are iid positive random variables, U k Suppose that U has a hyperbolic tail distribution, Property (1) is the infinite variance syndrome or the Noah Effect.   2 implies E(U 2 ) =   > 1 ensures that E(U) < , and that S 0 is not infinite

49 49 http://statistik.wu-wien.ac.at/cgi-bin/anuran.pl

50 50

51 51 Explaining Self-Similarity Consider a set of processes which are either ON or OFF  The distribution of ON and OFF times are heavy tailed  1  2   The aggregation of these processes leads to a self-similar process H = (3 - min  1  2  )/2  So, how do we get heavy tailed ON or OFF times?

52 52 H : Measuring “Burstiness” Intuitive explanation using M/G/  Model  As α  1, service time is more variable, easier to generate burst  Increasing H !

53 53 Heavy Tailed ON Times and File Sizes Analysis of client logs showed that ON times were, in fact, heavy tailed   ~ 1.2  Over about 3 orders of magnitude This lead to the analysis of underlying file sizes   ~ 1.1  Over about 4 orders of magnitude  Similar to FTP traffic Files available from UNIX file systems are typically heavy tailed

54 54 Heavy Tailed OFF times Analysis of OFF times showed that they are also heavy tailed   ~ 1.5

55 55 Ethernet LAN Traffic Measurements at the Source Level Location  Bellcore Morristown Research and Engineering Center The first set  The busy hour of the August 1989 Ethernet LAN measurements  About 105 sources, 748 active source-destination pairs  95% of the traffic was internal The second set  9 day-long measurement period in December 1994  About 3,500 sources, 10,000 active pairs  Measurements are made up entirely of remote traffic

56 56 Textured Plots of Packet Arrival Times

57 57 Textured Plots of Packet Arrival Times

58 58 Checking for the Noah Effect Complementary distribution plots Hill’s estimate  Let U 1, U 2,…, U n denote the observed ON-(or OFF-)periods and write U (1)  U (2)  …  U (n) for the corresponding order statistics

59 59

60 60

61 61 Important Findings Most surprising result: Noah Effect is extremely widespread, regardless of source machine (fileserver or client machine) Explanations:  Hyperbolic tail behavior for file sizes residing in file sizes  Pareto-like tail behavior for UNIX processes run time  Human-computer interactions occur over a wide range of timescales Although network traffic is intrinsically complex, parsimonious modeling is still possible.  Estimating a single parameter  (intensity of the Noah Effect) is enough.

62 62 An example File size Distribution on a Win2000 machine

63 63 Conclusion The presence of the Noah Effect in measured Ethernet LAN traffic is confirmed. The superposition of many ON/OFF models with Noah Effect results in aggregate packet streams that are consistent with measured network traffic, and exhibits the self-similar or fractal properties. Spawned research around the network community

64 64 Self-similarity and long range dependence in networks Vern Paxson and Sally Floyd, Wide-Area Traffic: The Failure of Poisson ModelingWide-Area Traffic: The Failure of Poisson Modeling Mark E. Crovella and Azer Bestavros, Self-Similarity in World Wide Web Traffic: Evidence and Possible CausesSelf-Similarity in World Wide Web Traffic: Evidence and Possible Causes  It shows that self-similarity in Web traffic can be explained based on the underlying distribution of transferred document sizes, the effects of caching and user preference in file transfer, the effect of user ``think time'', and the superimposition of many such transfers in a local area network. A. Feldmann, A. C. Gilbert, W. Willinger, and T. G. Kurtz, The Changing Nature of Network Traffic: Scaling Phenomena,The Changing Nature of Network Traffic: Scaling Phenomena Mark Garrett and Walter Willinger, Analysis, Modeling and Generation of Self- Similar VBR Video TrafficAnalysis, Modeling and Generation of Self- Similar VBR Video Traffic  The paper shows that the marginal bandwidth distribution can be described as being heavy- tailed and that the video sequence itself is long-range dependent and can be modeled using a self-similar process  The paper presents a new source model for VBR video traffic and describes how it may be used to generate VBR traffic synthetically.

65 65 Heavy tailed distributions in network traffic Gordon Irlam, Unix File Size Survey,Unix File Size Survey Will Leland and Teun Ott, Load-balancing Heuristics and Process Behavior, Mor Harchol-Balter and Allen Downey, Exploiting Process Lifetime Distributions for Dynamic Load BalancingExploiting Process Lifetime Distributions for Dynamic Load Balancing Carlos Cunha, Azer Bestavros, Mark Crovella, Characteristics of WWW Client-based TracesCharacteristics of WWW Client-based Traces  This paper presents some of the first Web client measurement ever made. It characterizes traces taken using an instrumented version of Mosaic from a university computer lab and shows that a number of Web properties can be modeled using heavy tailed distributions.  These properties include document size, user requests for a document, and document popularity.

66 Section 4: Impact of Self Similarity

67 67 Comparison

68 68 Easy Modeling: Noah Effect Questions related to self-similarity can be reduced to practical implications of Noah Effect  Queuing and Network performance  Protocol Analysis  Network Congestion Controls

69 69 Queuing Performance The queue length distribution  Traditional (Markovian) traffic: decreases exponentially fast  Self-similar traffic: decreases much more slowly Not accounting for Joseph Effect can lead to overly optimistic performance Effect of H (Burstiness)

70 70 Queuing Performance Gives rise to infinite mean waiting time: Queue length distributions themselves exhibit Noah Effect  Buffer requirements can be overwhelming -> Large delays Traffic Shaping may be infeasible. Why ?

71 71 Protocol design Protocol design should be expected to take into account knowledge about network traffic such as the presence or absence of the Noah Effect.

72 Thanks !


Download ppt "1 Interesting Links"

Similar presentations


Ads by Google