Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos.

Similar presentations


Presentation on theme: "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos."— Presentation transcript:

1 CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos

2 CMU SCS 15-826(c) C. Faloutsos, 20132 Must-Read Material Byong-Kee Yi, Nikolaos D. Sidiropoulos, Theodore Johnson, H.V. Jagadish, Christos Faloutsos and Alex Biliris, Online Data Mining for Co-Evolving Time Sequences, ICDE, Feb 2000. Chungmin Melvin Chen and Nick Roussopoulos, Adaptive Selectivity Estimation Using Query Feedbacks, SIGMOD 1994

3 CMU SCS 15-826(c) C. Faloutsos, 20133 Thanks Deepay Chakrabarti (CMU) Spiros Papadimitriou (CMU) Prof. Byoung-Kee Yi (Pohang U.)

4 CMU SCS 15-826(c) C. Faloutsos, 20134 Outline Motivation Similarity search – distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

5 CMU SCS 15-826(c) C. Faloutsos, 20135 Problem definition Given: one or more sequences x 1, x 2, …, x t, … (y 1, y 2, …, y t, … … ) Find –similar sequences; forecasts –patterns; clusters; outliers

6 CMU SCS 15-826(c) C. Faloutsos, 20136 Motivation - Applications Financial, sales, economic series Medical –ECGs +; blood pressure etc monitoring –reactions to new drugs –elderly care

7 CMU SCS 15-826(c) C. Faloutsos, 20137 Motivation - Applications (cont’d) ‘Smart house’ –sensors monitor temperature, humidity, air quality video surveillance

8 CMU SCS 15-826(c) C. Faloutsos, 20138 Motivation - Applications (cont’d) civil/automobile infrastructure –bridge vibrations [Oppenheim+02] – road conditions / traffic monitoring

9 CMU SCS 15-826(c) C. Faloutsos, 20139 Motivation - Applications (cont’d) Weather, environment/anti-pollution –volcano monitoring –air/water pollutant monitoring

10 CMU SCS 15-826(c) C. Faloutsos, 201310 Motivation - Applications (cont’d) Computer systems –‘Active Disks’ (buffering, prefetching) –web servers (ditto) –network traffic monitoring –...

11 CMU SCS 15-826(c) C. Faloutsos, 201311 Stream Data: Disk accesses time #bytes

12 CMU SCS 15-826(c) C. Faloutsos, 201312 Problem #1: Goal: given a signal (e.g.., #packets over time) Find: patterns, periodicities, and/or compress year count lynx caught per year (packets per day; temperature per day)

13 CMU SCS 15-826(c) C. Faloutsos, 201313 Problem#2: Forecast Given x t, x t-1, …, forecast x t+1 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick Number of packets sent ??

14 CMU SCS 15-826(c) C. Faloutsos, 201314 Problem#2’: Similarity search E.g.., Find a 3-tick pattern, similar to the last one 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick Number of packets sent ??

15 CMU SCS 15-826(c) C. Faloutsos, 201315 Problem #3: Given: A set of correlated time sequences Forecast ‘Sent(t)’

16 CMU SCS 15-826(c) C. Faloutsos, 201316 Important observations Patterns, rules, forecasting and similarity indexing are closely related: To do forecasting, we need –to find patterns/rules –to find similar settings in the past to find outliers, we need to have forecasts –(outlier = too far away from our forecast)

17 CMU SCS 15-826(c) C. Faloutsos, 201317 Outline Motivation Similarity Search and Indexing Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

18 CMU SCS 15-826(c) C. Faloutsos, 201318 Outline Motivation Similarity search and distance functions –Euclidean –Time-warping...

19 CMU SCS 15-826(c) C. Faloutsos, 201319 Importance of distance functions Subtle, but absolutely necessary: A ‘must’ for similarity indexing (-> forecasting) A ‘must’ for clustering Two major families –Euclidean and Lp norms –Time warping and variations

20 CMU SCS 15-826(c) C. Faloutsos, 201320 Euclidean and Lp... L 1 : city-block = Manhattan L 2 = Euclidean L 

21 CMU SCS 15-826(c) C. Faloutsos, 201321 Observation #1 Time sequence -> n-d vector... Day-1 Day-2 Day-n

22 CMU SCS 15-826(c) C. Faloutsos, 201322 Observation #2 Euclidean distance is closely related to –cosine similarity –dot product –‘cross-correlation’ function... Day-1 Day-2 Day-n

23 CMU SCS 15-826(c) C. Faloutsos, 201323 Time Warping allow accelerations - decelerations –(with or w/o penalty) THEN compute the (Euclidean) distance (+ penalty) related to the string-editing distance

24 CMU SCS 15-826(c) C. Faloutsos, 201324 Time Warping ‘stutters’:

25 CMU SCS 15-826(c) C. Faloutsos, 201325 Time warping Q: how to compute it? A: dynamic programming D( i, j ) = cost to match prefix of length i of first sequence x with prefix of length j of second sequence y

26 CMU SCS 15-826(c) C. Faloutsos, 201326 Thus, with no penalty for stutter, for sequences x 1, x 2, …, x i,; y 1, y 2, …, y j x-stutter y-stutter no stutter Time warping

27 CMU SCS 15-826(c) C. Faloutsos, 201327 VERY SIMILAR to the string-editing distance x-stutter y-stutter no stutter Time warping

28 CMU SCS 15-826(c) C. Faloutsos, 201328 Time warping Complexity: O(M*N) - quadratic on the length of the strings Many variations (penalty for stutters; limit on the number/percentage of stutters; …) popular in voice processing [Rabiner + Juang]

29 CMU SCS 15-826(c) C. Faloutsos, 201329 Other Distance functions piece-wise linear/flat approx.; compare pieces [Keogh+01] [Faloutsos+97] ‘cepstrum’ (for voice [Rabiner+Juang]) –do DFT; take log of amplitude; do DFT again! Allow for small gaps [Agrawal+95] See tutorial by [Gunopulos + Das, SIGMOD01]

30 CMU SCS 15-826(c) C. Faloutsos, 201330 Other Distance functions In [Keogh+, KDD’04]: parameter-free, MDL based

31 CMU SCS 15-826(c) C. Faloutsos, 201331 Conclusions Prevailing distances: –Euclidean and –time-warping

32 CMU SCS 15-826(c) C. Faloutsos, 201332 Outline Motivation Similarity search and distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

33 CMU SCS 15-826(c) C. Faloutsos, 201333 Linear Forecasting

34 CMU SCS 15-826(c) C. Faloutsos, 201334 Forecasting "Prediction is very difficult, especially about the future." - Nils Bohr http://www.hfac.uh.edu/MediaFutures/t houghts.html

35 CMU SCS 15-826(c) C. Faloutsos, 201335 Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

36 CMU SCS 15-826(c) C. Faloutsos, 201336 Reference [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE 2000. (Describes MUSCLES and Recursive Least Squares)

37 CMU SCS 15-826(c) C. Faloutsos, 201337 Problem#2: Forecast Example: give x t-1, x t-2, …, forecast x t 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick Number of packets sent ??

38 CMU SCS 15-826(c) C. Faloutsos, 201338 Forecasting: Preprocessing MANUALLY: remove trends spot periodicities time 7 days

39 CMU SCS 15-826(c) C. Faloutsos, 201339 Problem#2: Forecast Solution: try to express x t as a linear function of the past: x t-2, x t-2, …, (up to a window of w) Formally: 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick ??

40 CMU SCS 15-826(c) C. Faloutsos, 201340 (Problem: Back-cast; interpolate) Solution - interpolate: try to express x t as a linear function of the past AND the future: x t+1, x t+2, … x t+wfuture; x t-1, … x t-wpast (up to windows of w past, w future ) EXACTLY the same algo’s 0 10 20 30 40 50 60 70 80 90 1357911 Time Tick ??

41 CMU SCS 15-826(c) C. Faloutsos, 201341 Linear Regression: idea 40 45 50 55 60 65 70 75 80 85 15253545 Body weight express what we don’t know (= ‘dependent variable’) as a linear function of what we know (= ‘indep. variable(s)’) Body height

42 CMU SCS 15-826(c) C. Faloutsos, 201342 Linear Auto Regression:

43 CMU SCS 15-826(c) C. Faloutsos, 201343 Linear Auto Regression: 40 45 50 55 60 65 70 75 80 85 15253545 Number of packets sent (t-1) Number of packets sent (t) lag w=1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1]) ‘lag-plot’

44 CMU SCS 15-826(c) C. Faloutsos, 201344 Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

45 CMU SCS 15-826(c) C. Faloutsos, 201345 More details: Q1: Can it work with window w>1? A1: YES! x t-2 xtxt x t-1

46 CMU SCS 15-826(c) C. Faloutsos, 201346 More details: Q1: Can it work with window w>1? A1: YES! (we’ll fit a hyper-plane, then!) x t-2 xtxt x t-1

47 CMU SCS 15-826(c) C. Faloutsos, 201347 More details: Q1: Can it work with window w>1? A1: YES! (we’ll fit a hyper-plane, then!) x t-2 x t-1 xtxt

48 CMU SCS 15-826(c) C. Faloutsos, 201348 More details: Q1: Can it work with window w>1? A1: YES! The problem becomes: X [N  w]  a [w  1] = y [N  1] OVER-CONSTRAINED –a is the vector of the regression coefficients –X has the N values of the w indep. variables –y has the N values of the dependent variable

49 CMU SCS 15-826(c) C. Faloutsos, 201349 More details: X [N  w]  a [w  1] = y [N  1] Ind-var1 Ind-var-w time

50 CMU SCS 15-826(c) C. Faloutsos, 201350 More details: X [N  w]  a [w  1] = y [N  1] Ind-var1 Ind-var-w time

51 CMU SCS 15-826(c) C. Faloutsos, 201351 More details Q2: How to estimate a 1, a 2, … a w = a? A2: with Least Squares fit (Moore-Penrose pseudo-inverse) a is the vector that minimizes the RMSE from y a = ( X T  X ) -1  (X T  y)

52 CMU SCS 15-826(c) C. Faloutsos, 201352 More details Q2: How to estimate a 1, a 2, … a w = a? A2: with Least Squares fit Identical to earlier formula (proof?) Where a = ( X T  X ) -1  (X T  y) a = V x   x U T  y X = U x  x V T

53 CMU SCS 15-826(c) C. Faloutsos, 201353 More details Straightforward solution: Observations: –Sample matrix X grows over time –needs matrix inversion –O(N  w 2 ) computation –O(N  w) storage a = ( X T  X ) -1  (X T  y) a : Regression Coeff. Vector X : Sample Matrix XN:XN: w N

54 CMU SCS 15-826(c) C. Faloutsos, 201354 Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion! (How is that possible?!)

55 CMU SCS 15-826(c) C. Faloutsos, 201355 Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion! (How is that possible?!) A: our matrix has special form: (X T X)

56 CMU SCS 15-826(c) C. Faloutsos, 201356 More details XN:XN: w N X N+1 At the N+1 time tick: x N+1

57 CMU SCS 15-826(c) C. Faloutsos, 201357 More details Let G N = ( X N T  X N ) -1 (``gain matrix’’) G N+1 can be computed recursively from G N GNGN w w

58 CMU SCS 15-826(c) C. Faloutsos, 201358 EVEN more details: 1 x w row vector Let’s elaborate (VERY IMPORTANT, VERY VALUABLE!)

59 CMU SCS 15-826(c) C. Faloutsos, 201359 EVEN more details:

60 CMU SCS 15-826(c) C. Faloutsos, 201360 EVEN more details: [w x 1] [w x (N+1)] [(N+1) x w] [w x (N+1)] [(N+1) x 1]

61 CMU SCS 15-826(c) C. Faloutsos, 201361 EVEN more details: [w x (N+1)] [(N+1) x w]

62 CMU SCS 15-826(c) C. Faloutsos, 201362 EVEN more details: 1 x w row vector ‘gain matrix’

63 CMU SCS 15-826(c) C. Faloutsos, 201363 EVEN more details:

64 CMU SCS 15-826(c) C. Faloutsos, 201364 EVEN more details: wxw wx1 1xw wxw 1x1 SCALAR!

65 CMU SCS 15-826(c) C. Faloutsos, 201365 Altogether:

66 CMU SCS 15-826(c) C. Faloutsos, 201366 Altogether: where I: w x w identity matrix  : a large positive number (say, 10 4 ) IMPORTANT!

67 CMU SCS 15-826(c) C. Faloutsos, 201367 Comparison: Straightforward Least Squares –Needs huge matrix (growing in size) O(N×w) –Costly matrix operation O(N×w 2 ) Recursive LS –Need much smaller, fixed size matrix O(w×w) –Fast, incremental computation O(1×w 2 ) –no matrix inversion N = 10 6, w = 1-100

68 CMU SCS 15-826(c) C. Faloutsos, 201368 Pictorially: Given: Independent Variable Dependent Variable

69 CMU SCS 15-826(c) C. Faloutsos, 201369 Pictorially: Independent Variable Dependent Variable. new point

70 CMU SCS 15-826(c) C. Faloutsos, 201370 Pictorially: Independent Variable Dependent Variable RLS: quickly compute new best fit new point

71 CMU SCS 15-826(c) C. Faloutsos, 201371 Even more details Q4: can we ‘forget’ the older samples? A4: Yes - RLS can easily handle that [Yi+00]:

72 CMU SCS 15-826(c) C. Faloutsos, 201372 Adaptability - ‘forgetting’ Independent Variable eg., #packets sent Dependent Variable eg., #bytes sent

73 CMU SCS 15-826(c) C. Faloutsos, 201373 Adaptability - ‘forgetting’ Independent Variable eg. #packets sent Dependent Variable eg., #bytes sent Trend change (R)LS with no forgetting

74 CMU SCS 15-826(c) C. Faloutsos, 201374 Adaptability - ‘forgetting’ Independent Variable Dependent Variable Trend change (R)LS with no forgetting (R)LS with forgetting RLS: can *trivially* handle ‘forgetting’

75 CMU SCS 15-826(c) C. Faloutsos, 201375 How to choose ‘w’? goal: capture arbitrary periodicities with NO human intervention on a semi-infinite stream Details

76 CMU SCS 15-826(c) C. Faloutsos, 201376 Reference [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin, Germany, Sept. 2003 Details

77 CMU SCS 15-826(c) C. Faloutsos, 201377 Answer: ‘AWSOM’ (Arbitrary Window Stream fOrecasting Method) [Papadimitriou+, vldb2003] idea: do AR on each wavelet level in detail: Details

78 CMU SCS 15-826(c) C. Faloutsos, 201378 AWSOM xtxt t t W 1,1 t W 1,2 t W 1,3 t W 1,4 t W 2,1 t W 2,2 t W 3,1 t V 4,1 time frequency = Details

79 CMU SCS 15-826(c) C. Faloutsos, 201379 AWSOM xtxt t t W 1,1 t W 1,2 t W 1,3 t W 1,4 t W 2,1 t W 2,2 t W 3,1 t V 4,1 time frequency Details

80 CMU SCS 15-826(c) C. Faloutsos, 201380 AWSOM - idea W l,t W l,t-1 W l,t-2 W l,t   l,1 W l,t-1   l,2 W l,t-2  … W l’,t’-1 W l’,t’-2 W l’,t’ W l’,t’   l’,1 W l’,t’-1   l’,2 W l’,t’-2  … Details

81 CMU SCS 15-826(c) C. Faloutsos, 201381 More details… Update of wavelet coefficients Update of linear models Feature selection –Not all correlations are significant –Throw away the insignificant ones (“noise”) (incremental) (incremental; RLS) (single-pass) Details

82 CMU SCS 15-826(c) C. Faloutsos, 201382 Results - Synthetic data Triangle pulse Mix (sine + square) AR captures wrong trend (or none) Seasonal AR estimation fails AWSOMARSeasonal AR Details

83 CMU SCS 15-826(c) C. Faloutsos, 201383 Results - Real data Automobile traffic –Daily periodicity –Bursty “noise” at smaller scales AR fails to capture any trend Seasonal AR estimation fails Details

84 CMU SCS 15-826(c) C. Faloutsos, 201384 Results - real data Sunspot intensity –Slightly time-varying “period” AR captures wrong trend Seasonal ARIMA –wrong downward trend, despite help by human! Details

85 CMU SCS 15-826(c) C. Faloutsos, 201385 Complexity Model update Space: O  lgN + mk 2   O  lgN  Time: O  k 2   O  1  Where –N: number of points (so far) –k:number of regression coefficients; fixed –m:number of linear models; O  lgN  Details

86 CMU SCS 15-826(c) C. Faloutsos, 201386 Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

87 CMU SCS 15-826(c) C. Faloutsos, 201387 Co-Evolving Time Sequences Given: A set of correlated time sequences Forecast ‘Repeated(t)’ ??

88 CMU SCS 15-826(c) C. Faloutsos, 201388 Solution: Q: what should we do?

89 CMU SCS 15-826(c) C. Faloutsos, 201389 Solution: Least Squares, with Dep. Variable: Repeated(t) Indep. Variables: Sent(t-1) … Sent(t-w); Lost(t-1) …Lost(t-w); Repeated(t-1),... (named: ‘MUSCLES’ [Yi+00])

90 CMU SCS 15-826(c) C. Faloutsos, 201390 Forecasting - Outline Auto-regression Least Squares; recursive least squares Co-evolving time sequences Examples Conclusions

91 CMU SCS 15-826(c) C. Faloutsos, 201391 Examples - Experiments Datasets –Modem pool traffic (14 modems, 1500 time- ticks; #packets per time unit) –AT&T WorldNet internet usage (several data streams; 980 time-ticks) Measures of success –Accuracy : Root Mean Square Error (RMSE)

92 CMU SCS 15-826(c) C. Faloutsos, 201392 Accuracy - “Modem” MUSCLES outperforms AR & “yesterday”

93 CMU SCS 15-826(c) C. Faloutsos, 201393 Accuracy - “Internet” MUSCLES consistently outperforms AR & “yesterday”

94 CMU SCS 15-826(c) C. Faloutsos, 201394 Linear forecasting - Outline Auto-regression Least Squares; recursive least squares Co-evolving time sequences Examples Conclusions

95 CMU SCS 15-826(c) C. Faloutsos, 201395 Conclusions - Practitioner’s guide AR(IMA) methodology: prevailing method for linear forecasting Brilliant method of Recursive Least Squares for fast, incremental estimation. See [Box-Jenkins] (AWSOM: no human intervention)

96 CMU SCS 15-826(c) C. Faloutsos, 201396 Resources: software and urls free-ware: ‘R’ for stat. analysis (clone of Splus) http://cran.r-project.org/ python script for RLS http://www.cs.cmu.edu/~christos/SRC/rls-all.tar

97 CMU SCS 15-826(c) C. Faloutsos, 201397 Books George E.P. Box and Gwilym M. Jenkins and Gregory C. Reinsel, Time Series Analysis: Forecasting and Control, Prentice Hall, 1994 (the classic book on ARIMA, 3rd ed.) Brockwell, P. J. and R. A. Davis (1987). Time Series: Theory and Methods. New York, Springer Verlag.

98 CMU SCS 15-826(c) C. Faloutsos, 201398 Additional Reading [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin, Germany, Sept. 2003 [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE 2000. (Describes MUSCLES and Recursive Least Squares)

99 CMU SCS 15-826(c) C. Faloutsos, 201399 Outline Motivation Similarity search and distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

100 CMU SCS 15-826(c) C. Faloutsos, 2013100 Bursty Traffic & Multifractals

101 CMU SCS 15-826(c) C. Faloutsos, 2013101 SKIP, if you have done HW2, Q3: Foils use D1 (‘information fractal dimension’), While HW2-Q3 uses D2 (‘correlation’ f.d.)

102 CMU SCS 15-826(c) C. Faloutsos, 2013102 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Results HW2-Q3

103 CMU SCS 15-826(c) C. Faloutsos, 2013103 Reference: [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang Chang, Spiros Papadimitriou and Christos Faloutsos, Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic, ICDE 2002, San Jose, CA, 2/26/2002 - 3/1/2002. Full thesis: CMU-CS-05-185 Performance Modeling of Storage Devices using Machine Learning Mengzhi Wang, Ph.D. Thesis Abstract,.ps.gz,.pdf Abstract.ps.gz.pdf HW2-Q3

104 CMU SCS 15-826(c) C. Faloutsos, 2013104 Recall: Problem #1: Goal: given a signal (eg., #bytes over time) Find: patterns, periodicities, and/or compress time #bytes Bytes per 30’ (packets per day; earthquakes per year) HW2-Q3

105 CMU SCS 15-826(c) C. Faloutsos, 2013105 Problem #1 model bursty traffic generate realistic traces (Poisson does not work) time # bytes Poisson HW2-Q3

106 CMU SCS 15-826(c) C. Faloutsos, 2013106 Motivation predict queue length distributions (e.g., to give probabilistic guarantees) “learn” traffic, for buffering, prefetching, ‘active disks’, web servers HW2-Q3

107 CMU SCS 15-826(c) C. Faloutsos, 2013107 Q: any ‘pattern’? time # bytes Not Poisson spike; silence; more spikes; more silence… any rules? HW2-Q3

108 CMU SCS 15-826(c) C. Faloutsos, 2013108 Solution: self-similarity # bytes time # bytes HW2-Q3

109 CMU SCS 15-826(c) C. Faloutsos, 2013109 But: Q1: How to generate realistic traces; extrapolate; give guarantees? Q2: How to estimate the model parameters? HW2-Q3

110 CMU SCS 15-826(c) C. Faloutsos, 2013110 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Results HW2-Q3

111 CMU SCS 15-826(c) C. Faloutsos, 2013111 Approach Q1: How to generate a sequence, that is –bursty –self-similar –and has similar queue length distributions HW2-Q3

112 CMU SCS 15-826(c) C. Faloutsos, 2013112 Approach A: ‘binomial multifractal’ [Wang+02] ~ 80-20 ‘law’: –80% of bytes/queries etc on first half –repeat recursively b: bias factor (eg., 80%) HW2-Q3

113 CMU SCS 15-826(c) C. Faloutsos, 2013113 Binary multifractals 20 80 HW2-Q3

114 CMU SCS 15-826(c) C. Faloutsos, 2013114 Binary multifractals 20 80 HW2-Q3

115 CMU SCS 15-826(c) C. Faloutsos, 2013115 Parameter estimation Q2: How to estimate the bias factor b? HW2-Q3

116 CMU SCS 15-826(c) C. Faloutsos, 2013116 Parameter estimation Q2: How to estimate the bias factor b? A: MANY ways [Crovella+96] –Hurst exponent –variance plot –even DFT amplitude spectrum! (‘periodogram’) –More robust: ‘entropy plot’ [Wang+02] HW2-Q3

117 CMU SCS 15-826(c) C. Faloutsos, 2013117 Entropy plot Rationale: – burstiness: inverse of uniformity –entropy measures uniformity of a distribution –find entropy at several granularities, to see whether/how our distribution is close to uniform. HW2-Q3

118 CMU SCS 15-826(c) C. Faloutsos, 2013118 Entropy plot Entropy E(n) after n levels of splits n=1: E(1)= - p1 log 2 (p1)- p2 log 2 (p2) p1p2 % of bytes here HW2-Q3

119 CMU SCS 15-826(c) C. Faloutsos, 2013119 Entropy plot Entropy E(n) after n levels of splits n=1: E(1)= - p1 log(p1)- p2 log(p2) n=2: E(2) = -   p 2,i * log 2 (p 2,i ) p 2,1 p 2,2 p 2,3 p 2,4 HW2-Q3

120 CMU SCS 15-826(c) C. Faloutsos, 2013120 Real traffic Has linear entropy plot (-> self-similar) # of levels (n) Entropy E(n) 0.73 HW2-Q3

121 CMU SCS 15-826(c) C. Faloutsos, 2013121 Observation - intuition: intuition: slope = intrinsic dimensionality = info-bits per coordinate-bit –unif. Dataset: slope =? –multi-point: slope = ? # of levels (n) Entropy E(n) HW2-Q3

122 CMU SCS 15-826(c) C. Faloutsos, 2013122 Observation - intuition: intuition: slope = intrinsic dimensionality = info-bits per coordinate-bit –unif. Dataset: slope =1 –multi-point: slope = 0 # of levels (n) Entropy E(n) HW2-Q3

123 CMU SCS 15-826(c) C. Faloutsos, 2013123 Entropy plot - Intuition Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Pick a point; reveal its coordinate bit-by-bit - how much info is each bit worth to me? HW2-Q3

124 CMU SCS 15-826(c) C. Faloutsos, 2013124 Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? ‘info’ value = E(1): 1 bit HW2-Q3

125 CMU SCS 15-826(c) C. Faloutsos, 2013125 Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? Is next MSB =0? HW2-Q3

126 CMU SCS 15-826(c) C. Faloutsos, 2013126 Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? Is next MSB =0? Info value =1 bit = E(2) - E(1) = slope! HW2-Q3

127 CMU SCS 15-826(c) C. Faloutsos, 2013127 Entropy plot Repeat, for all points at same position: Dim=0 HW2-Q3

128 CMU SCS 15-826(c) C. Faloutsos, 2013128 Entropy plot Repeat, for all points at same position: we need 0 bits of info, to determine position -> slope = 0 = intrinsic dimensionality Dim=0 HW2-Q3

129 CMU SCS 15-826(c) C. Faloutsos, 2013129 Entropy plot Real (and 80-20) datasets can be in- between: bursts, gaps, smaller bursts, smaller gaps, at every scale Dim = 1 Dim=0 0 { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4184796/slides/slide_129.jpg", "name": "CMU SCS 15-826(c) C.", "description": "Faloutsos, 2013129 Entropy plot Real (and 80-20) datasets can be in- between: bursts, gaps, smaller bursts, smaller gaps, at every scale Dim = 1 Dim=0 0

130 CMU SCS 15-826(c) C. Faloutsos, 2013130 (Fractals, again) What set of points could have behavior between point and line? HW2-Q3

131 CMU SCS 15-826(c) C. Faloutsos, 2013131 Cantor dust Eliminate the middle third Recursively! HW2-Q3

132 CMU SCS 15-826(c) C. Faloutsos, 2013132 Cantor dust HW2-Q3

133 CMU SCS 15-826(c) C. Faloutsos, 2013133 Cantor dust HW2-Q3

134 CMU SCS 15-826(c) C. Faloutsos, 2013134 Cantor dust HW2-Q3

135 CMU SCS 15-826(c) C. Faloutsos, 2013135 Cantor dust HW2-Q3

136 CMU SCS 15-826(c) C. Faloutsos, 2013136 Dimensionality? (no length; infinite # points!) Answer: log2 / log3 = 0.6 Cantor dust HW2-Q3

137 CMU SCS 15-826(c) C. Faloutsos, 2013137 Some more entropy plots: Poisson vs real Poisson: slope = ~1 -> uniformly distributed 1 0.73 HW2-Q3

138 CMU SCS 15-826(c) C. Faloutsos, 2013138 b-model b-model traffic gives perfectly linear plot Lemma: its slope is slope = -b log 2 b - (1-b) log 2 (1-b) Fitting: do entropy plot; get slope; solve for b E(n) n HW2-Q3

139 CMU SCS 15-826(c) C. Faloutsos, 2013139 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Experiments - Results HW2-Q3

140 CMU SCS 15-826(c) C. Faloutsos, 2013140 Experimental setup Disk traces (from HP [Wilkes 93]) web traces from LBL http://repository.cs.vt.edu/ lbl-conn-7.tar.Z HW2-Q3

141 CMU SCS 15-826(c) C. Faloutsos, 2013141 Model validation Linear entropy plots Bias factors b: 0.6-0.8 smallest b / smoothest: nntp traffic HW2-Q3

142 CMU SCS 15-826(c) C. Faloutsos, 2013142 Web traffic - results LBL, NCDF of queue lengths (log-log scales) (queue length l) Prob( >l) How to give guarantees? HW2-Q3

143 CMU SCS 15-826(c) C. Faloutsos, 2013143 Web traffic - results LBL, NCDF of queue lengths (log-log scales) (queue length l) Prob( >l) 20% of the requests will see queue lengths <100 HW2-Q3

144 CMU SCS 15-826(c) C. Faloutsos, 2013144 Conclusions Multifractals (80/20, ‘b-model’, Multiplicative Wavelet Model (MWM)) for analysis and synthesis of bursty traffic

145 CMU SCS 15-826(c) C. Faloutsos, 2013145 Books Fractals: Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

146 CMU SCS 15-826(c) C. Faloutsos, 2013146 Further reading: Crovella, M. and A. Bestavros (1996). Self- Similarity in World Wide Web Traffic, Evidence and Possible Causes. Sigmetrics. [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

147 CMU SCS 15-826(c) C. Faloutsos, 2013147 Further reading [Riedi+99] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk, A Multifractal Wavelet Model with Application to Network Traffic, IEEE Special Issue on Information Theory, 45. (April 1999), 992-1018. [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang Chang, Spiros Papadimitriou and Christos Faloutsos, Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic, ICDE 2002, San Jose, CA, 2/26/2002 - 3/1/2002. Entropy plots

148 CMU SCS 15-826(c) C. Faloutsos, 2013148 Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

149 CMU SCS 15-826(c) C. Faloutsos, 2013149 Chaos and non-linear forecasting

150 CMU SCS 15-826(c) C. Faloutsos, 2013150 Reference: [ Deepay Chakrabarti and Christos Faloutsos F4: Large-Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002.]

151 CMU SCS 15-826(c) C. Faloutsos, 2013151 Detailed Outline Non-linear forecasting –Problem –Idea –How-to –Experiments –Conclusions

152 CMU SCS 15-826(c) C. Faloutsos, 2013152 Recall: Problem #1 Given a time series {x t }, predict its future course, that is, x t+1, x t+2,... Time Value

153 CMU SCS 15-826(c) C. Faloutsos, 2013153 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot ARIMA: fails

154 CMU SCS 15-826(c) C. Faloutsos, 2013154 How to forecast? ARIMA - but: linearity assumption Lag-plot ARIMA: fails

155 CMU SCS 15-826(c) C. Faloutsos, 2013155 How to forecast? ARIMA - but: linearity assumption ANSWER: ‘Delayed Coordinate Embedding’ = Lag Plots [Sauer92] ~ nearest-neighbor search, for past incidents

156 CMU SCS 15-826(c) C. Faloutsos, 2013156 General Intuition (Lag Plot) x t-1 xtxtxtxt 4-NN New Point Interpolate these… To get the final prediction Lag = 1, k = 4 NN

157 CMU SCS 15-826(c) C. Faloutsos, 2013157 Questions: Q1: How to choose lag L? Q2: How to choose k (the # of NN)? Q3: How to interpolate? Q4: why should this work at all?

158 CMU SCS 15-826(c) C. Faloutsos, 2013158 Q1: Choosing lag L Manually (16, in award winning system by [Sauer94])

159 CMU SCS 15-826(c) C. Faloutsos, 2013159 Q2: Choosing number of neighbors k Manually (typically ~ 1-10)

160 CMU SCS 15-826(c) C. Faloutsos, 2013160 Q3: How to interpolate? How do we interpolate between the k nearest neighbors? A3.1: Average A3.2: Weighted average (weights drop with distance - how?)

161 CMU SCS 15-826(c) C. Faloutsos, 2013161 Q3: How to interpolate? A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition) X t-1 xtxt

162 CMU SCS 15-826(c) C. Faloutsos, 2013162 Q4: Any theory behind it? A4: YES!

163 CMU SCS 15-826(c) C. Faloutsos, 2013163 Theoretical foundation Based on the ‘Takens theorem’ [Takens81] which says that long enough delay vectors can do prediction, even if there are unobserved variables in the dynamical system (= diff. equations)

164 CMU SCS 15-826(c) C. Faloutsos, 2013164 Theoretical foundation Example: Lotka-Volterra equations dH/dt = r H – a H*P dP/dt = b H*P – m P H is count of prey (e.g., hare) P is count of predators (e.g., lynx) Suppose only P(t) is observed (t=1, 2, …). H P Skip

165 CMU SCS 15-826(c) C. Faloutsos, 2013165 Theoretical foundation But the delay vector space is a faithful reconstruction of the internal system state So prediction in delay vector space is as good as prediction in state space Skip H P P(t-1) P(t)

166 CMU SCS 15-826(c) C. Faloutsos, 2013166 Detailed Outline Non-linear forecasting –Problem –Idea –How-to –Experiments –Conclusions

167 CMU SCS 15-826(c) C. Faloutsos, 2013167 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot

168 CMU SCS 15-826(c) C. Faloutsos, 2013168 Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot ARIMA: fails

169 CMU SCS 15-826(c) C. Faloutsos, 2013169 Logistic Parabola Timesteps Value Our Prediction from here

170 CMU SCS 15-826(c) C. Faloutsos, 2013170 Logistic Parabola Timesteps Value Comparison of prediction to correct values

171 CMU SCS 15-826(c) C. Faloutsos, 2013171 Datasets LORENZ: Models convection currents in the air dx / dt = a (y - x) dy / dt = x (b - z) - y dz / dt = xy - c z Value

172 CMU SCS 15-826(c) C. Faloutsos, 2013172 LORENZ Timesteps Value Comparison of prediction to correct values

173 CMU SCS 15-826(c) C. Faloutsos, 2013173 Datasets Time Value LASER: fluctuations in a Laser over time (used in Santa Fe competition)

174 CMU SCS 15-826(c) C. Faloutsos, 2013174 Laser Timesteps Value Comparison of prediction to correct values

175 CMU SCS 15-826(c) C. Faloutsos, 2013175 Conclusions Lag plots for non-linear forecasting (Takens’ theorem) suitable for ‘chaotic’ signals

176 CMU SCS 15-826(c) C. Faloutsos, 2013176 References Deepay Chakrabarti and Christos Faloutsos F4: Large- Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002. Sauer, T. (1994). Time series prediction using delay coordinate embedding. (in book by Weigend and Gershenfeld, below) Addison-Wesley. Takens, F. (1981). Detecting strange attractors in fluid turbulence. Dynamical Systems and Turbulence. Berlin: Springer-Verlag.

177 CMU SCS 15-826(c) C. Faloutsos, 2013177 References Weigend, A. S. and N. A. Gerschenfeld (1994). Time Series Prediction: Forecasting the Future and Understanding the Past, Addison Wesley. (Excellent collection of papers on chaotic/non-linear forecasting, describing the algorithms behind the winners of the Santa Fe competition.)

178 CMU SCS 15-826(c) C. Faloutsos, 2013178 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs

179 CMU SCS 15-826(c) C. Faloutsos, 2013179 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool

180 CMU SCS 15-826(c) C. Faloutsos, 2013180 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM

181 CMU SCS 15-826(c) C. Faloutsos, 2013181 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM Bursty traffic: multifractals (80-20 ‘law’)

182 CMU SCS 15-826(c) C. Faloutsos, 2013182 Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM Bursty traffic: multifractals (80-20 ‘law’) Non-linear forecasting: lag-plots (Takens)


Download ppt "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos."

Similar presentations


Ads by Google