Download presentation

Presentation is loading. Please wait.

Published byPerla Eld Modified over 3 years ago

1
CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos

2
CMU SCS 15-826(c) C. Faloutsos, 20132 Must-Read Material Byong-Kee Yi, Nikolaos D. Sidiropoulos, Theodore Johnson, H.V. Jagadish, Christos Faloutsos and Alex Biliris, Online Data Mining for Co-Evolving Time Sequences, ICDE, Feb 2000. Chungmin Melvin Chen and Nick Roussopoulos, Adaptive Selectivity Estimation Using Query Feedbacks, SIGMOD 1994

3
CMU SCS 15-826(c) C. Faloutsos, 20133 Thanks Deepay Chakrabarti (CMU) Spiros Papadimitriou (CMU) Prof. Byoung-Kee Yi (Pohang U.)

4
CMU SCS 15-826(c) C. Faloutsos, 20134 Outline Motivation Similarity search – distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

5
CMU SCS 15-826(c) C. Faloutsos, 20135 Problem definition Given: one or more sequences x 1, x 2, …, x t, … (y 1, y 2, …, y t, … … ) Find –similar sequences; forecasts –patterns; clusters; outliers

6
CMU SCS 15-826(c) C. Faloutsos, 20136 Motivation - Applications Financial, sales, economic series Medical –ECGs +; blood pressure etc monitoring –reactions to new drugs –elderly care

7
CMU SCS 15-826(c) C. Faloutsos, 20137 Motivation - Applications (cont’d) ‘Smart house’ –sensors monitor temperature, humidity, air quality video surveillance

8
CMU SCS 15-826(c) C. Faloutsos, 20138 Motivation - Applications (cont’d) civil/automobile infrastructure –bridge vibrations [Oppenheim+02] – road conditions / traffic monitoring

9
CMU SCS 15-826(c) C. Faloutsos, 20139 Motivation - Applications (cont’d) Weather, environment/anti-pollution –volcano monitoring –air/water pollutant monitoring

10
CMU SCS 15-826(c) C. Faloutsos, 201310 Motivation - Applications (cont’d) Computer systems –‘Active Disks’ (buffering, prefetching) –web servers (ditto) –network traffic monitoring –...

11
CMU SCS 15-826(c) C. Faloutsos, 201311 Stream Data: Disk accesses time #bytes

12
CMU SCS 15-826(c) C. Faloutsos, 201312 Problem #1: Goal: given a signal (e.g.., #packets over time) Find: patterns, periodicities, and/or compress year count lynx caught per year (packets per day; temperature per day)

13
CMU SCS 15-826(c) C. Faloutsos, 201313 Problem#2: Forecast Given x t, x t-1, …, forecast x t+1 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick Number of packets sent ??

14
CMU SCS 15-826(c) C. Faloutsos, 201314 Problem#2’: Similarity search E.g.., Find a 3-tick pattern, similar to the last one 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick Number of packets sent ??

15
CMU SCS 15-826(c) C. Faloutsos, 201315 Problem #3: Given: A set of correlated time sequences Forecast ‘Sent(t)’

16
CMU SCS 15-826(c) C. Faloutsos, 201316 Important observations Patterns, rules, forecasting and similarity indexing are closely related: To do forecasting, we need –to find patterns/rules –to find similar settings in the past to find outliers, we need to have forecasts –(outlier = too far away from our forecast)

17
CMU SCS 15-826(c) C. Faloutsos, 201317 Outline Motivation Similarity Search and Indexing Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

18
CMU SCS 15-826(c) C. Faloutsos, 201318 Outline Motivation Similarity search and distance functions –Euclidean –Time-warping...

19
CMU SCS 15-826(c) C. Faloutsos, 201319 Importance of distance functions Subtle, but absolutely necessary: A ‘must’ for similarity indexing (-> forecasting) A ‘must’ for clustering Two major families –Euclidean and Lp norms –Time warping and variations

20
CMU SCS 15-826(c) C. Faloutsos, 201320 Euclidean and Lp... L 1 : city-block = Manhattan L 2 = Euclidean L

21
CMU SCS 15-826(c) C. Faloutsos, 201321 Observation #1 Time sequence -> n-d vector... Day-1 Day-2 Day-n

22
CMU SCS 15-826(c) C. Faloutsos, 201322 Observation #2 Euclidean distance is closely related to –cosine similarity –dot product –‘cross-correlation’ function... Day-1 Day-2 Day-n

23
CMU SCS 15-826(c) C. Faloutsos, 201323 Time Warping allow accelerations - decelerations –(with or w/o penalty) THEN compute the (Euclidean) distance (+ penalty) related to the string-editing distance

24
CMU SCS 15-826(c) C. Faloutsos, 201324 Time Warping ‘stutters’:

25
CMU SCS 15-826(c) C. Faloutsos, 201325 Time warping Q: how to compute it? A: dynamic programming D( i, j ) = cost to match prefix of length i of first sequence x with prefix of length j of second sequence y

26
CMU SCS 15-826(c) C. Faloutsos, 201326 Thus, with no penalty for stutter, for sequences x 1, x 2, …, x i,; y 1, y 2, …, y j x-stutter y-stutter no stutter Time warping

27
CMU SCS 15-826(c) C. Faloutsos, 201327 VERY SIMILAR to the string-editing distance x-stutter y-stutter no stutter Time warping

28
CMU SCS 15-826(c) C. Faloutsos, 201328 Time warping Complexity: O(M*N) - quadratic on the length of the strings Many variations (penalty for stutters; limit on the number/percentage of stutters; …) popular in voice processing [Rabiner + Juang]

29
CMU SCS 15-826(c) C. Faloutsos, 201329 Other Distance functions piece-wise linear/flat approx.; compare pieces [Keogh+01] [Faloutsos+97] ‘cepstrum’ (for voice [Rabiner+Juang]) –do DFT; take log of amplitude; do DFT again! Allow for small gaps [Agrawal+95] See tutorial by [Gunopulos + Das, SIGMOD01]

30
CMU SCS 15-826(c) C. Faloutsos, 201330 Other Distance functions In [Keogh+, KDD’04]: parameter-free, MDL based

31
CMU SCS 15-826(c) C. Faloutsos, 201331 Conclusions Prevailing distances: –Euclidean and –time-warping

32
CMU SCS 15-826(c) C. Faloutsos, 201332 Outline Motivation Similarity search and distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

33
CMU SCS 15-826(c) C. Faloutsos, 201333 Linear Forecasting

34
CMU SCS 15-826(c) C. Faloutsos, 201334 Forecasting "Prediction is very difficult, especially about the future." - Nils Bohr http://www.hfac.uh.edu/MediaFutures/t houghts.html

35
CMU SCS 15-826(c) C. Faloutsos, 201335 Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

36
CMU SCS 15-826(c) C. Faloutsos, 201336 Reference [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE 2000. (Describes MUSCLES and Recursive Least Squares)

37
CMU SCS 15-826(c) C. Faloutsos, 201337 Problem#2: Forecast Example: give x t-1, x t-2, …, forecast x t 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick Number of packets sent ??

38
CMU SCS 15-826(c) C. Faloutsos, 201338 Forecasting: Preprocessing MANUALLY: remove trends spot periodicities time 7 days

39
CMU SCS 15-826(c) C. Faloutsos, 201339 Problem#2: Forecast Solution: try to express x t as a linear function of the past: x t-2, x t-2, …, (up to a window of w) Formally: 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick ??

40
CMU SCS 15-826(c) C. Faloutsos, 201340 (Problem: Back-cast; interpolate) Solution - interpolate: try to express x t as a linear function of the past AND the future: x t+1, x t+2, … x t+wfuture; x t-1, … x t-wpast (up to windows of w past, w future ) EXACTLY the same algo’s 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick ??

41
CMU SCS 15-826(c) C. Faloutsos, 201341 Linear Regression: idea 40 45 50 55 60 65 70 75 80 85 15253545 Body weight express what we don’t know (= ‘dependent variable’) as a linear function of what we know (= ‘indep. variable(s)’) Body height

42
CMU SCS 15-826(c) C. Faloutsos, 201342 Linear Auto Regression:

43
CMU SCS 15-826(c) C. Faloutsos, 201343 Linear Auto Regression: 40 45 50 55 60 65 70 75 80 85 15253545 Number of packets sent (t-1) Number of packets sent (t) lag w=1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1]) ‘lag-plot’

44
CMU SCS 15-826(c) C. Faloutsos, 201344 Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

45
CMU SCS 15-826(c) C. Faloutsos, 201345 More details: Q1: Can it work with window w>1? A1: YES! x t-2 xtxt x t-1

46
CMU SCS 15-826(c) C. Faloutsos, 201346 More details: Q1: Can it work with window w>1? A1: YES! (we’ll fit a hyper-plane, then!) x t-2 xtxt x t-1

47
CMU SCS 15-826(c) C. Faloutsos, 201347 More details: Q1: Can it work with window w>1? A1: YES! (we’ll fit a hyper-plane, then!) x t-2 x t-1 xtxt

48
CMU SCS 15-826(c) C. Faloutsos, 201348 More details: Q1: Can it work with window w>1? A1: YES! The problem becomes: X [N w] a [w 1] = y [N 1] OVER-CONSTRAINED –a is the vector of the regression coefficients –X has the N values of the w indep. variables –y has the N values of the dependent variable

49
CMU SCS 15-826(c) C. Faloutsos, 201349 More details: X [N w] a [w 1] = y [N 1] Ind-var1 Ind-var-w time

50
CMU SCS 15-826(c) C. Faloutsos, 201350 More details: X [N w] a [w 1] = y [N 1] Ind-var1 Ind-var-w time

51
CMU SCS 15-826(c) C. Faloutsos, 201351 More details Q2: How to estimate a 1, a 2, … a w = a? A2: with Least Squares fit (Moore-Penrose pseudo-inverse) a is the vector that minimizes the RMSE from y a = ( X T X ) -1 (X T y)

52
CMU SCS 15-826(c) C. Faloutsos, 201352 More details Q2: How to estimate a 1, a 2, … a w = a? A2: with Least Squares fit Identical to earlier formula (proof?) Where a = ( X T X ) -1 (X T y) a = V x x U T y X = U x x V T

53
CMU SCS 15-826(c) C. Faloutsos, 201353 More details Straightforward solution: Observations: –Sample matrix X grows over time –needs matrix inversion –O(N w 2 ) computation –O(N w) storage a = ( X T X ) -1 (X T y) a : Regression Coeff. Vector X : Sample Matrix XN:XN: w N

54
CMU SCS 15-826(c) C. Faloutsos, 201354 Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion! (How is that possible?!)

55
CMU SCS 15-826(c) C. Faloutsos, 201355 Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion! (How is that possible?!) A: our matrix has special form: (X T X)

56
CMU SCS 15-826(c) C. Faloutsos, 201356 More details XN:XN: w N X N+1 At the N+1 time tick: x N+1

57
CMU SCS 15-826(c) C. Faloutsos, 201357 More details Let G N = ( X N T X N ) -1 (``gain matrix’’) G N+1 can be computed recursively from G N GNGN w w

58
CMU SCS 15-826(c) C. Faloutsos, 201358 EVEN more details: 1 x w row vector Let’s elaborate (VERY IMPORTANT, VERY VALUABLE!)

59
CMU SCS 15-826(c) C. Faloutsos, 201359 EVEN more details:

60
CMU SCS 15-826(c) C. Faloutsos, 201360 EVEN more details: [w x 1] [w x (N+1)] [(N+1) x w] [w x (N+1)] [(N+1) x 1]

61
CMU SCS 15-826(c) C. Faloutsos, 201361 EVEN more details: [w x (N+1)] [(N+1) x w]

62
CMU SCS 15-826(c) C. Faloutsos, 201362 EVEN more details: 1 x w row vector ‘gain matrix’

63
CMU SCS 15-826(c) C. Faloutsos, 201363 EVEN more details:

64
CMU SCS 15-826(c) C. Faloutsos, 201364 EVEN more details: wxw wx1 1xw wxw 1x1 SCALAR!

65
CMU SCS 15-826(c) C. Faloutsos, 201365 Altogether:

66
CMU SCS 15-826(c) C. Faloutsos, 201366 Altogether: where I: w x w identity matrix : a large positive number (say, 10 4 ) IMPORTANT!

67
CMU SCS 15-826(c) C. Faloutsos, 201367 Comparison: Straightforward Least Squares –Needs huge matrix (growing in size) O(N×w) –Costly matrix operation O(N×w 2 ) Recursive LS –Need much smaller, fixed size matrix O(w×w) –Fast, incremental computation O(1×w 2 ) –no matrix inversion N = 10 6, w = 1-100

68
CMU SCS 15-826(c) C. Faloutsos, 201368 Pictorially: Given: Independent Variable Dependent Variable

69
CMU SCS 15-826(c) C. Faloutsos, 201369 Pictorially: Independent Variable Dependent Variable. new point

70
CMU SCS 15-826(c) C. Faloutsos, 201370 Pictorially: Independent Variable Dependent Variable RLS: quickly compute new best fit new point

71
CMU SCS 15-826(c) C. Faloutsos, 201371 Even more details Q4: can we ‘forget’ the older samples? A4: Yes - RLS can easily handle that [Yi+00]:

72
CMU SCS 15-826(c) C. Faloutsos, 201372 Adaptability - ‘forgetting’ Independent Variable eg., #packets sent Dependent Variable eg., #bytes sent

73
CMU SCS 15-826(c) C. Faloutsos, 201373 Adaptability - ‘forgetting’ Independent Variable eg. #packets sent Dependent Variable eg., #bytes sent Trend change (R)LS with no forgetting

74
CMU SCS 15-826(c) C. Faloutsos, 201374 Adaptability - ‘forgetting’ Independent Variable Dependent Variable Trend change (R)LS with no forgetting (R)LS with forgetting RLS: can *trivially* handle ‘forgetting’

75
CMU SCS 15-826(c) C. Faloutsos, 201375 How to choose ‘w’? goal: capture arbitrary periodicities with NO human intervention on a semi-infinite stream Details

76
CMU SCS 15-826(c) C. Faloutsos, 201376 Reference [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin, Germany, Sept. 2003 Details

77
CMU SCS 15-826(c) C. Faloutsos, 201377 Answer: ‘AWSOM’ (Arbitrary Window Stream fOrecasting Method) [Papadimitriou+, vldb2003] idea: do AR on each wavelet level in detail: Details

78
CMU SCS 15-826(c) C. Faloutsos, 201378 AWSOM xtxt t t W 1,1 t W 1,2 t W 1,3 t W 1,4 t W 2,1 t W 2,2 t W 3,1 t V 4,1 time frequency = Details

79
CMU SCS 15-826(c) C. Faloutsos, 201379 AWSOM xtxt t t W 1,1 t W 1,2 t W 1,3 t W 1,4 t W 2,1 t W 2,2 t W 3,1 t V 4,1 time frequency Details

80
CMU SCS 15-826(c) C. Faloutsos, 201380 AWSOM - idea W l,t W l,t-1 W l,t-2 W l,t l,1 W l,t-1 l,2 W l,t-2 … W l’,t’-1 W l’,t’-2 W l’,t’ W l’,t’ l’,1 W l’,t’-1 l’,2 W l’,t’-2 … Details

81
CMU SCS 15-826(c) C. Faloutsos, 201381 More details… Update of wavelet coefficients Update of linear models Feature selection –Not all correlations are significant –Throw away the insignificant ones (“noise”) (incremental) (incremental; RLS) (single-pass) Details

82
CMU SCS 15-826(c) C. Faloutsos, 201382 Results - Synthetic data Triangle pulse Mix (sine + square) AR captures wrong trend (or none) Seasonal AR estimation fails AWSOMARSeasonal AR Details

83
CMU SCS 15-826(c) C. Faloutsos, 201383 Results - Real data Automobile traffic –Daily periodicity –Bursty “noise” at smaller scales AR fails to capture any trend Seasonal AR estimation fails Details

84
CMU SCS 15-826(c) C. Faloutsos, 201384 Results - real data Sunspot intensity –Slightly time-varying “period” AR captures wrong trend Seasonal ARIMA –wrong downward trend, despite help by human! Details

85
CMU SCS 15-826(c) C. Faloutsos, 201385 Complexity Model update Space: O lgN + mk 2 O lgN Time: O k 2 O 1 Where –N: number of points (so far) –k:number of regression coefficients; fixed –m:number of linear models; O lgN Details

86
CMU SCS 15-826(c) C. Faloutsos, 201386 Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

87
CMU SCS 15-826(c) C. Faloutsos, 201387 Co-Evolving Time Sequences Given: A set of correlated time sequences Forecast ‘Repeated(t)’ ??

88
CMU SCS 15-826(c) C. Faloutsos, 201388 Solution: Q: what should we do?

89
CMU SCS 15-826(c) C. Faloutsos, 201389 Solution: Least Squares, with Dep. Variable: Repeated(t) Indep. Variables: Sent(t-1) … Sent(t-w); Lost(t-1) …Lost(t-w); Repeated(t-1),... (named: ‘MUSCLES’ [Yi+00])

90
CMU SCS 15-826(c) C. Faloutsos, 201390 Forecasting - Outline Auto-regression Least Squares; recursive least squares Co-evolving time sequences Examples Conclusions

91
CMU SCS 15-826(c) C. Faloutsos, 201391 Examples - Experiments Datasets –Modem pool traffic (14 modems, 1500 time- ticks; #packets per time unit) –AT&T WorldNet internet usage (several data streams; 980 time-ticks) Measures of success –Accuracy : Root Mean Square Error (RMSE)

92
CMU SCS 15-826(c) C. Faloutsos, 201392 Accuracy - “Modem” MUSCLES outperforms AR & “yesterday”

93
CMU SCS 15-826(c) C. Faloutsos, 201393 Accuracy - “Internet” MUSCLES consistently outperforms AR & “yesterday”

94
CMU SCS 15-826(c) C. Faloutsos, 201394 Linear forecasting - Outline Auto-regression Least Squares; recursive least squares Co-evolving time sequences Examples Conclusions

95
CMU SCS 15-826(c) C. Faloutsos, 201395 Conclusions - Practitioner’s guide AR(IMA) methodology: prevailing method for linear forecasting Brilliant method of Recursive Least Squares for fast, incremental estimation. See [Box-Jenkins] (AWSOM: no human intervention)

96
CMU SCS 15-826(c) C. Faloutsos, 201396 Resources: software and urls free-ware: ‘R’ for stat. analysis (clone of Splus) http://cran.r-project.org/ python script for RLS http://www.cs.cmu.edu/~christos/SRC/rls-all.tar

97
CMU SCS 15-826(c) C. Faloutsos, 201397 Books George E.P. Box and Gwilym M. Jenkins and Gregory C. Reinsel, Time Series Analysis: Forecasting and Control, Prentice Hall, 1994 (the classic book on ARIMA, 3rd ed.) Brockwell, P. J. and R. A. Davis (1987). Time Series: Theory and Methods. New York, Springer Verlag.

98
CMU SCS 15-826(c) C. Faloutsos, 201398 Additional Reading [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin, Germany, Sept. 2003 [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE 2000. (Describes MUSCLES and Recursive Least Squares)

99
CMU SCS 15-826(c) C. Faloutsos, 201399 Outline Motivation Similarity search and distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

100
CMU SCS 15-826(c) C. Faloutsos, 2013100 Bursty Traffic & Multifractals

101
CMU SCS 15-826(c) C. Faloutsos, 2013101 SKIP, if you have done HW2, Q3: Foils use D1 (‘information fractal dimension’), While HW2-Q3 uses D2 (‘correlation’ f.d.)

102
CMU SCS 15-826(c) C. Faloutsos, 2013102 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Results HW2-Q3

103
CMU SCS 15-826(c) C. Faloutsos, 2013103 Reference: [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang Chang, Spiros Papadimitriou and Christos Faloutsos, Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic, ICDE 2002, San Jose, CA, 2/26/2002 - 3/1/2002. Full thesis: CMU-CS-05-185 Performance Modeling of Storage Devices using Machine Learning Mengzhi Wang, Ph.D. Thesis Abstract,.ps.gz,.pdf Abstract.ps.gz.pdf HW2-Q3

104
CMU SCS 15-826(c) C. Faloutsos, 2013104 Recall: Problem #1: Goal: given a signal (eg., #bytes over time) Find: patterns, periodicities, and/or compress time #bytes Bytes per 30’ (packets per day; earthquakes per year) HW2-Q3

105
CMU SCS 15-826(c) C. Faloutsos, 2013105 Problem #1 model bursty traffic generate realistic traces (Poisson does not work) time # bytes Poisson HW2-Q3

106
CMU SCS 15-826(c) C. Faloutsos, 2013106 Motivation predict queue length distributions (e.g., to give probabilistic guarantees) “learn” traffic, for buffering, prefetching, ‘active disks’, web servers HW2-Q3

107
CMU SCS 15-826(c) C. Faloutsos, 2013107 Q: any ‘pattern’? time # bytes Not Poisson spike; silence; more spikes; more silence… any rules? HW2-Q3

108
CMU SCS 15-826(c) C. Faloutsos, 2013108 Solution: self-similarity # bytes time # bytes HW2-Q3

109
CMU SCS 15-826(c) C. Faloutsos, 2013109 But: Q1: How to generate realistic traces; extrapolate; give guarantees? Q2: How to estimate the model parameters? HW2-Q3

110
CMU SCS 15-826(c) C. Faloutsos, 2013110 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Results HW2-Q3

111
CMU SCS 15-826(c) C. Faloutsos, 2013111 Approach Q1: How to generate a sequence, that is –bursty –self-similar –and has similar queue length distributions HW2-Q3

112
CMU SCS 15-826(c) C. Faloutsos, 2013112 Approach A: ‘binomial multifractal’ [Wang+02] ~ 80-20 ‘law’: –80% of bytes/queries etc on first half –repeat recursively b: bias factor (eg., 80%) HW2-Q3

113
CMU SCS 15-826(c) C. Faloutsos, 2013113 Binary multifractals 20 80 HW2-Q3

114
CMU SCS 15-826(c) C. Faloutsos, 2013114 Binary multifractals 20 80 HW2-Q3

115
CMU SCS 15-826(c) C. Faloutsos, 2013115 Parameter estimation Q2: How to estimate the bias factor b? HW2-Q3

116
CMU SCS 15-826(c) C. Faloutsos, 2013116 Parameter estimation Q2: How to estimate the bias factor b? A: MANY ways [Crovella+96] –Hurst exponent –variance plot –even DFT amplitude spectrum! (‘periodogram’) –More robust: ‘entropy plot’ [Wang+02] HW2-Q3

117
CMU SCS 15-826(c) C. Faloutsos, 2013117 Entropy plot Rationale: – burstiness: inverse of uniformity –entropy measures uniformity of a distribution –find entropy at several granularities, to see whether/how our distribution is close to uniform. HW2-Q3

118
CMU SCS 15-826(c) C. Faloutsos, 2013118 Entropy plot Entropy E(n) after n levels of splits n=1: E(1)= - p1 log 2 (p1)- p2 log 2 (p2) p1p2 % of bytes here HW2-Q3

119
CMU SCS 15-826(c) C. Faloutsos, 2013119 Entropy plot Entropy E(n) after n levels of splits n=1: E(1)= - p1 log(p1)- p2 log(p2) n=2: E(2) = - p 2,i * log 2 (p 2,i ) p 2,1 p 2,2 p 2,3 p 2,4 HW2-Q3

120
CMU SCS 15-826(c) C. Faloutsos, 2013120 Real traffic Has linear entropy plot (-> self-similar) # of levels (n) Entropy E(n) 0.73 HW2-Q3

121
CMU SCS 15-826(c) C. Faloutsos, 2013121 Observation - intuition: intuition: slope = intrinsic dimensionality = info-bits per coordinate-bit –unif. Dataset: slope =? –multi-point: slope = ? # of levels (n) Entropy E(n) HW2-Q3

122
CMU SCS 15-826(c) C. Faloutsos, 2013122 Observation - intuition: intuition: slope = intrinsic dimensionality = info-bits per coordinate-bit –unif. Dataset: slope =1 –multi-point: slope = 0 # of levels (n) Entropy E(n) HW2-Q3

123
CMU SCS 15-826(c) C. Faloutsos, 2013123 Entropy plot - Intuition Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Pick a point; reveal its coordinate bit-by-bit - how much info is each bit worth to me? HW2-Q3

124
CMU SCS 15-826(c) C. Faloutsos, 2013124 Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? ‘info’ value = E(1): 1 bit HW2-Q3

125
CMU SCS 15-826(c) C. Faloutsos, 2013125 Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? Is next MSB =0? HW2-Q3

126
CMU SCS 15-826(c) C. Faloutsos, 2013126 Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? Is next MSB =0? Info value =1 bit = E(2) - E(1) = slope! HW2-Q3

127
CMU SCS 15-826(c) C. Faloutsos, 2013127 Entropy plot Repeat, for all points at same position: Dim=0 HW2-Q3

128
CMU SCS 15-826(c) C. Faloutsos, 2013128 Entropy plot Repeat, for all points at same position: we need 0 bits of info, to determine position -> slope = 0 = intrinsic dimensionality Dim=0 HW2-Q3

129
CMU SCS 15-826(c) C. Faloutsos, 2013129 Entropy plot Real (and 80-20) datasets can be in- between: bursts, gaps, smaller bursts, smaller gaps, at every scale Dim = 1 Dim=0 0

130
CMU SCS 15-826(c) C. Faloutsos, 2013130 (Fractals, again) What set of points could have behavior between point and line? HW2-Q3

131
CMU SCS 15-826(c) C. Faloutsos, 2013131 Cantor dust Eliminate the middle third Recursively! HW2-Q3

132
CMU SCS 15-826(c) C. Faloutsos, 2013132 Cantor dust HW2-Q3

133
CMU SCS 15-826(c) C. Faloutsos, 2013133 Cantor dust HW2-Q3

134
CMU SCS 15-826(c) C. Faloutsos, 2013134 Cantor dust HW2-Q3

135
CMU SCS 15-826(c) C. Faloutsos, 2013135 Cantor dust HW2-Q3

136
CMU SCS 15-826(c) C. Faloutsos, 2013136 Dimensionality? (no length; infinite # points!) Answer: log2 / log3 = 0.6 Cantor dust HW2-Q3

137
CMU SCS 15-826(c) C. Faloutsos, 2013137 Some more entropy plots: Poisson vs real Poisson: slope = ~1 -> uniformly distributed 1 0.73 HW2-Q3

138
CMU SCS 15-826(c) C. Faloutsos, 2013138 b-model b-model traffic gives perfectly linear plot Lemma: its slope is slope = -b log 2 b - (1-b) log 2 (1-b) Fitting: do entropy plot; get slope; solve for b E(n) n HW2-Q3

139
CMU SCS 15-826(c) C. Faloutsos, 2013139 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Experiments - Results HW2-Q3

140
CMU SCS 15-826(c) C. Faloutsos, 2013140 Experimental setup Disk traces (from HP [Wilkes 93]) web traces from LBL http://repository.cs.vt.edu/ lbl-conn-7.tar.Z HW2-Q3

141
CMU SCS 15-826(c) C. Faloutsos, 2013141 Model validation Linear entropy plots Bias factors b: 0.6-0.8 smallest b / smoothest: nntp traffic HW2-Q3

142
CMU SCS 15-826(c) C. Faloutsos, 2013142 Web traffic - results LBL, NCDF of queue lengths (log-log scales) (queue length l) Prob( >l) How to give guarantees? HW2-Q3

143
CMU SCS 15-826(c) C. Faloutsos, 2013143 Web traffic - results LBL, NCDF of queue lengths (log-log scales) (queue length l) Prob( >l) 20% of the requests will see queue lengths <100 HW2-Q3

144
CMU SCS 15-826(c) C. Faloutsos, 2013144 Conclusions Multifractals (80/20, ‘b-model’, Multiplicative Wavelet Model (MWM)) for analysis and synthesis of bursty traffic

145
CMU SCS 15-826(c) C. Faloutsos, 2013145 Books Fractals: Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

146
CMU SCS 15-826(c) C. Faloutsos, 2013146 Further reading: Crovella, M. and A. Bestavros (1996). Self- Similarity in World Wide Web Traffic, Evidence and Possible Causes. Sigmetrics. [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

147
CMU SCS 15-826(c) C. Faloutsos, 2013147 Further reading [Riedi+99] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk, A Multifractal Wavelet Model with Application to Network Traffic, IEEE Special Issue on Information Theory, 45. (April 1999), 992-1018. [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang Chang, Spiros Papadimitriou and Christos Faloutsos, Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic, ICDE 2002, San Jose, CA, 2/26/2002 - 3/1/2002. Entropy plots

148
CMU SCS 15-826(c) C. Faloutsos, 2013148 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

149
CMU SCS 15-826(c) C. Faloutsos, 2013149 Chaos and non-linear forecasting

150
CMU SCS 15-826(c) C. Faloutsos, 2013150 Reference: [ Deepay Chakrabarti and Christos Faloutsos F4: Large-Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002.]

151
CMU SCS 15-826(c) C. Faloutsos, 2013151 Detailed Outline Non-linear forecasting –Problem –Idea –How-to –Experiments –Conclusions

152
CMU SCS 15-826(c) C. Faloutsos, 2013152 Recall: Problem #1 Given a time series {x t }, predict its future course, that is, x t+1, x t+2,... Time Value

153
CMU SCS 15-826(c) C. Faloutsos, 2013153 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot ARIMA: fails

154
CMU SCS 15-826(c) C. Faloutsos, 2013154 How to forecast? ARIMA - but: linearity assumption Lag-plot ARIMA: fails

155
CMU SCS 15-826(c) C. Faloutsos, 2013155 How to forecast? ARIMA - but: linearity assumption ANSWER: ‘Delayed Coordinate Embedding’ = Lag Plots [Sauer92] ~ nearest-neighbor search, for past incidents

156
CMU SCS 15-826(c) C. Faloutsos, 2013156 General Intuition (Lag Plot) x t-1 xtxtxtxt 4-NN New Point Interpolate these… To get the final prediction Lag = 1, k = 4 NN

157
CMU SCS 15-826(c) C. Faloutsos, 2013157 Questions: Q1: How to choose lag L? Q2: How to choose k (the # of NN)? Q3: How to interpolate? Q4: why should this work at all?

158
CMU SCS 15-826(c) C. Faloutsos, 2013158 Q1: Choosing lag L Manually (16, in award winning system by [Sauer94])

159
CMU SCS 15-826(c) C. Faloutsos, 2013159 Q2: Choosing number of neighbors k Manually (typically ~ 1-10)

160
CMU SCS 15-826(c) C. Faloutsos, 2013160 Q3: How to interpolate? How do we interpolate between the k nearest neighbors? A3.1: Average A3.2: Weighted average (weights drop with distance - how?)

161
CMU SCS 15-826(c) C. Faloutsos, 2013161 Q3: How to interpolate? A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition) X t-1 xtxt

162
CMU SCS 15-826(c) C. Faloutsos, 2013162 Q4: Any theory behind it? A4: YES!

163
CMU SCS 15-826(c) C. Faloutsos, 2013163 Theoretical foundation Based on the ‘Takens theorem’ [Takens81] which says that long enough delay vectors can do prediction, even if there are unobserved variables in the dynamical system (= diff. equations)

164
CMU SCS 15-826(c) C. Faloutsos, 2013164 Theoretical foundation Example: Lotka-Volterra equations dH/dt = r H – a H*P dP/dt = b H*P – m P H is count of prey (e.g., hare) P is count of predators (e.g., lynx) Suppose only P(t) is observed (t=1, 2, …). H P Skip

165
CMU SCS 15-826(c) C. Faloutsos, 2013165 Theoretical foundation But the delay vector space is a faithful reconstruction of the internal system state So prediction in delay vector space is as good as prediction in state space Skip H P P(t-1) P(t)

166
CMU SCS 15-826(c) C. Faloutsos, 2013166 Detailed Outline Non-linear forecasting –Problem –Idea –How-to –Experiments –Conclusions

167
CMU SCS 15-826(c) C. Faloutsos, 2013167 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot

168
CMU SCS 15-826(c) C. Faloutsos, 2013168 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot ARIMA: fails

169
CMU SCS 15-826(c) C. Faloutsos, 2013169 Logistic Parabola Timesteps Value Our Prediction from here

170
CMU SCS 15-826(c) C. Faloutsos, 2013170 Logistic Parabola Timesteps Value Comparison of prediction to correct values

171
CMU SCS 15-826(c) C. Faloutsos, 2013171 Datasets LORENZ: Models convection currents in the air dx / dt = a (y - x) dy / dt = x (b - z) - y dz / dt = xy - c z Value

172
CMU SCS 15-826(c) C. Faloutsos, 2013172 LORENZ Timesteps Value Comparison of prediction to correct values

173
CMU SCS 15-826(c) C. Faloutsos, 2013173 Datasets Time Value LASER: fluctuations in a Laser over time (used in Santa Fe competition)

174
CMU SCS 15-826(c) C. Faloutsos, 2013174 Laser Timesteps Value Comparison of prediction to correct values

175
CMU SCS 15-826(c) C. Faloutsos, 2013175 Conclusions Lag plots for non-linear forecasting (Takens’ theorem) suitable for ‘chaotic’ signals

176
CMU SCS 15-826(c) C. Faloutsos, 2013176 References Deepay Chakrabarti and Christos Faloutsos F4: Large- Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002. Sauer, T. (1994). Time series prediction using delay coordinate embedding. (in book by Weigend and Gershenfeld, below) Addison-Wesley. Takens, F. (1981). Detecting strange attractors in fluid turbulence. Dynamical Systems and Turbulence. Berlin: Springer-Verlag.

177
CMU SCS 15-826(c) C. Faloutsos, 2013177 References Weigend, A. S. and N. A. Gerschenfeld (1994). Time Series Prediction: Forecasting the Future and Understanding the Past, Addison Wesley. (Excellent collection of papers on chaotic/non-linear forecasting, describing the algorithms behind the winners of the Santa Fe competition.)

178
CMU SCS 15-826(c) C. Faloutsos, 2013178 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs

179
CMU SCS 15-826(c) C. Faloutsos, 2013179 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool

180
CMU SCS 15-826(c) C. Faloutsos, 2013180 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM

181
CMU SCS 15-826(c) C. Faloutsos, 2013181 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM Bursty traffic: multifractals (80-20 ‘law’)

182
CMU SCS 15-826(c) C. Faloutsos, 2013182 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM Bursty traffic: multifractals (80-20 ‘law’) Non-linear forecasting: lag-plots (Takens)

Similar presentations

OK

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on current account deficit meaning Ppt on business environment nature concept and significance of colors Ppt on sea level rise 2016 Ppt on wireless sensor networks applications Previewing in reading ppt on ipad Ppt on business plan presentation Ppt on national education day 2016 Ppt on camera technology Ppt on panel discussion questions Ppt on traffic light controller using fpgas