Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina.

Similar presentations


Presentation on theme: "1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina."— Presentation transcript:

1 1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina February 9, 2016

2 2 A Menu of Interesting Issues Bin Count Time Series Long Range dependence? Point Process of Flow Start Times Duration Distributions (heavy tails) Heavy tail Durations LRD Relationship between Size and Duration? Time series of packets within flows?

3 3 Long Range Dependence Controversy Initial Models: Queuing Theory Short Range Dep’ce Aggregation of Mice and Elephant: Heavy tail Durations LRD Mandelbrot, Taqqu, Paxson, Willinger,… More recently: Aggregation of point Processes Poisson Cleveland, et al.

4 4 Explanation of Controversy – Zooming Autocorrelation Depends on “Scale” (i.e. binwidth, m) Fine Scales: (< 1 ms) ~ White Noise – Poisson Medium Scales: (~ 10 ms) Dependence “lifts up” Coarse Scale: (> 1 sec) Consistent with L. R. D.

5 5 Long Range Dependence Theory Self-Similar: H: Hurst parameter Increments: LRD: If

6 6 Drawbacks of Conventional Time Series Methods Clumsy at Modeling L. R. D. E.g. ARMA etc. all S. R. D. Assumption of Stationarity Really need “local stationarity” Assumption of Linear Processes Doesn’t make physical sense Instead have “aggregation of flows” “Correlation” for heavy tailed distn’s?

7 7 An H estimation approach: Wavelets : wavelet coefficients

8 8 Estimation of H, Based on Wavelet Spectrum Properties Weighted linear regression on Estimation of H: Abry & Veitch (1998) Robust to nonstationarities (linear trend) : uncorrelated

9 9 Example Wavelet Spectrum (FGN, H=0.9)

10 10 Experience with Hurst parameter estimation Toy Data: Excellent, (Poisson Data is flat, FGN linear) Real Data: More challenging Studied ~30 two hour time blocks, 2002 H Estimation makes sense (~ 0.8 – 0.9) for many cases i.e. FGN is a reasonable model But some there were very strange cases (H >> 1)

11 11 Real Data (“nice”): 2002 Apr 13 Sat 19:30 – 21:30

12 12 Real Data (“ugly”): 2002 Apr 13 Sat 1 pm – 3 pm

13 13 Explanatory Tool: SiZer SIgnificance of ZERo crossings of the derivative of the smooths in scale space: Chaudhuri and Marron (1999) Exploratory smoothing method Are bumps really there? Consider all smoothing levels Study (simultaneous) C. I.s for slope (derivative) of smooth Combine with statistical inference and visualization Blue: slope significantly upwards Red: slope significantly downwards Purple: insignificant slope

14 14 SiZer Example British Incomes Data Kernel Density Estimation Two modes “really there”! Bralower’s Fossil Data Local Linear Regression Smaller valley “not there”

15 15 Dependent SiZer Park, Marron, and Rondonotti (2004) SiZer compares data with white noise Inappropriate in time series Dependent SiZer compares data with an assumed model Goodness of fit test

16 16 Dep’ent SiZer : 2002 Apr 13 Sat 1 pm – 3 pm

17 17 Zoomed view (to red region, i.e. “flat top”)

18 18 Further Zoom: finds very periodic behavior!

19 19 Revisit: 2002 Apr 13 Sat 1 pm – 3 pm

20 20 Quick Check: “Delete” periodic time block

21 21 Possible Physical Explanation IP “Port Scan” Common device of hackers Searching for “break in points” Send query to every possible (within UNC domain): IP address Port Number Replies can indicate system weaknesses Internet Traffic is hard to model

22 22 Experience with Hurst parameter estimation Studied ~30 two hour time blocks, 2002 H Estimation makes sense (~ 0.8 – 0.9) for many cases i.e. FGN is a reasonable model But some there were very strange cases (H >> 1) Studied ~30 two hour time blocks, 2003 Traffic appears “similar”, using e.g. Dependent SiZer But H estimates much smaller (~ 0.7), across all time blocks Why???

23 23 Wavelet Spectrum: 2003 Sat 9:30 – 11:30 pm

24 24 Explanation of Shoulder: different protocols Major Components of Traffic: Transmission Control Protocol (TCP), often ~80% “Acknowledges packets” for “sure transfer” Web browsing (HTTP), FTP, email, … User Datagram Protocol (UDP), often ~15% Unacknowledged for “data streaming” Video, music, …

25 25 Wavelet Spectra: all 2003 packet TCP vs. UDP Overlay all time blocks, and sub-spectra for TCP and UDP In 2002 TCP Dominated Now UDP creates major hump at medium scales Scale ~ 1 sec

26 26 Explanation of UDP Bump “Blubster” - File Sharing Application A replacement for Napster Transfers big files by TCP Does “handshaking” by UDP Work around for server (could be shut down) Huge fraction of traffic (just to “stay in touch”)?!?

27 27 Blubster sub-spectrum: 2003 Sat 9:30

28 28 Zoomed (convent’al) SiZer View of Blubster

29 29 Final Blubster Oddity Effect shows up for “packet counts” Not for “byte counts” Reason: Blubster handshake packets are small Thus not significant fraction of total bytes Violation of “conventional wisdom” Usually “byte behavior” ~ “packet behavior”

30 30 Wavelet Spectrum : packet vs. byte

31 31 A deeper look at sampling Revisit Mice-Elephant Sampling,Mice-Elephant Sampling Over wide range of scales: Random Sampling… But “not representative”… Artifact of: Huge Sample Size Very Heavy Tails


Download ppt "1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina."

Similar presentations


Ads by Google